The common format used in
WHEEL files is currently not well defined. PEP 241 written in 2001 referred to RFC 822, which specifies the email message format, for a description of the format.
The complexity of the
METADATA the contents often differ. This results sometimes in the generation of invalid metadata files. On the other hand parsers usually use the
Because of the earlier discussion in Core metadata email fields & Unicode with @takluyver and @uranusjr, I had a closer look at the metadata format and tried to come up with a solution to these issues. Python packages don’t make use of the complex features of email messages, so a replacement should be feasible, although some churn is inevitable if one wants to improve on the status-quo as a few published metadata files for popular packages are invalid (see below).
I’ve drafted a written specification of the format, that is compatible with the metadata files already deployed on PyPI but does not depend on the email RFCs for the message syntax. In addition I’ve implemented a parser and a serializer for the format using only the standard library. Currently a dict-like API for accessing the metadata fields is missing as well as additional message validation. To test my implementation I collected metadata files from the top 4000 packages on PyPI. I can parse and serialize again all files without problems, except those that contain errors and aren’t correctly parsed by the
Examples of invalid
PKG-INFO files found on PyPI:
- tendo-0.2.15: Each keyword on its own line, without leading whitespace. This breaks the message as each line should be a “key: value” pair, of if line folding is used start with a space to continue the previous value.
rstr-2.2.6: user put a long multi-line description in the
Summaryfield. Same issue as above.
- vaderSentiment-3.3.2: description contains for an unknown reason completely blank lines. A blank line without whitespace signals the end of the message header. The remainder of the message is erroneously considered to be the payload.
- additional errors in passlib-1.7.4 and win_inet_pton-1.1.0
METADATA files in the wheels for these packages were produced from the broken
PKG-INFO files, and while syntactically valid contain mangled or incomplete data from the
- Does anyone know of
PKG-INFOfiles containing a
Descriptionfield in the piped format from the standard?
- Is it currently possible to block the upload of invalid
PKG-INFOfiles to PyPI?