A couple of notes:
-
str.translate()
exists to provide a very fast quick-and-dirty way of defining a character mapping (charmap
) codec and calling its .encode() method. -
The reason why the mapping is from
int
(source Unicode ordinal) toint
(target Unicode oridinal),bytes
orNone
was for performance reasons in the original implementation. Using the approach, the mapping could be defined as sequence (using the index position as ordinal) or dictionary. -
Python’s stdlib
charmap
codecs today use a more efficient way of definingcharmap
codecs based on a decoding table defined as a 256 charstr
(mapping bytes ordinals via their index position in the sequence to Unicode code points) and a fast lookup table calledEncodingMap
(a 3-level trie) which is generated from these decoding tables for encoding. -
For more complete definitions of
charmap
codecs, have a look at the modules in the stdlib encodings package (e.g.cp1252.py
). Those also allow decoding and are typically defined in terms of a decoding table, rather than an encoding table. -
The codec subsystem in Python 2.x (see PEP 100) did not mandate input or output types for codecs. The system was designed to have the codecs define the supported types, in order to have a rich codecs eco-system and allow for composable codecs as well. As such it was easily possible to write codecs going from bytes (
str
in Py2) to text (unicode
in Py2), bytes to bytes, text to text. To provide an easier way to access this functionality,.encode()
and.decode()
were made available on both str and unicode in Python 2. The term “encode” merely means: take the data and call the .encode() method on the referenced codec, nothing more (or less). Similar for “decode”. However, this generic approach via methods did not catch on and caused too much confusion, so it was dropped in Python 3 on thestr
(Unicode text in Py3) andbytes
(binary data in Py3) types, leaving only the pathsstr.encode()
→bytes
andbytes.decode()
→str
accessible via methods. The more generic interface is still available viacodecs.encode()
andcodecs.decode()
, though.
Given how easy it is to use the fast builtin charmap
codec directly (and without registering a complete codec), I’d recommend using this directly via codecs.charmap_encode()
in a helper function and in a similar way as is done in the full code page mapping codecs, rather than relying on str.translate()
.
PS: We should really document the codecs.charmap_build()
function used by those codecs.