I’m writing some code that reads package distributions and returns their metadata. I was somewhat surprised to find that I couldn’t locate anything that defines the encoding to be used for the METADATA
file in a wheel.
importlib.metadata
assumes UTF-8 for metadata files in installed .dist-info
directories, and I honestly can’t imagine wanting it to be anything other than UTF-8, but I think we should be explicit and mandate that somewhere.
Broadening the scope somewhat, I propose that we mandate that all files in .dist-info
directories must be UTF-8 encoded text files. If we make that a blanket rule, we don’t have to worry about it in future. (JSON and TOML count as “UTF-8 encoded text” so I don’t see there being an issue if we want structured formats). I believe this is simply codifying existing practice, so I hope it’s not controversial.
So, to be explicit, I would like to make a change to Binary distribution format — Python Packaging User Guide and Recording installed projects — Python Packaging User Guide to state that all files in the .dist-info
directory must contain UTF-8 encoded text.
Does anyone have any objection to this? Does anyone feel this needs a PEP?