Hi Barry, I appreciate that you took the time to read my question and
answer me. I really do! I also appreciate, that you want to safe me
from myself, that is very valiant of you! Itās just that Iām a bit
tired of the game: person asks question
ā person immediately has to justify why and gets lectured on how what they want is wrong
.
Barryās explaining that the purpose of encodings is embedding the string
in some transport, and that choosing how to encode depending on your
objective.
Also, very often, people ask for a specific technical approach to some
larger undescribed problem which often has a better technical solution.
So we often ask about the context.
So please accept that I have that unusual need and donāt want something
else. [ā¦]
Surprisingly, it is really not about the bitzise to me, I actually care
about a visually compact representation of the data as a python unicode
literal. I should have made that clearer.
And hereās the larger context. Thank you.
Note that UTF-8 is a binary encoding with no relationship to your
āvisually compactā object. You just want āUnicode text legal in a Python
literalā.
The specification for a Python Unicode literal is here:
It suggests that you can possibly use any Unicode character except the
quote, the backslash and the newline. Youād probably do well with
something very simple which escaped (eg with a backslash) just those 3
characters.
Or you could get very fancy, and run zlib.compress on your data and
then encode as a string. That will often be smaller.
However, keep in mind that base64 and the like are chosen not just to
get through many email systems with varying character sets and 8-bit
cleanliness but also to be human readable. The more characters you use
beyond a core set, the more visual ambiguity there can be to someone
reading the text, and this can depend a lot on fonts too. Is that an
āiā, an āIā, an ālā, a ā1ā? A ā0ā or an āOā? And thatās without moving
beyond the ASCII Latin letters and Arabic numerals.
So: should your encoding be visually clear to a human reader, eg to
someone reading it aloud to another person or debugging an encoding
problem? Maybe not, but you should consider it.
Also, you will want to write some tests to check that things round
trip through your encoding and back to the original bytes, and also that
Python accepts the string literals youāre generating.
Cheers,
Cameron Simpson cs@cskk.id.au