Python is commonly used to manipulate PDF files and the file format has been for some time an open standard. Packages such as PyPDF2, PDFMiner and user written python code have taken different approaches to parsing these files and dealing with the encodings used. Although the Codecs package can be extended this is not trivial and has performance hits compared with using a built in codec so in practice many use Latin-1 which is slightly different to PDFDocEncoding i.e is wrong.
Python advertises it is “batteries included” and has built in support for other widely used formats. This change would go some way to supporting those processing PDFs using Python and would allow a canonical solution to be used across packages and code “one method to rule them all”.
There are already examples of supporting PDF artifacts in the standard library. For example, the base64 module supports ASCI85 encoded strings, optionally surrounded by the <~
and `~>’ tags since Python 3.4.
Such a change should be implemented in a way that is performant and compliant with the specification. This would be a major advantage compared to the current approaches.