PEP 566 -- Metadata for Python Software Packages 2.1 | Python.org specifies how to encode core metadata in JSON. Is this just an FYI thing in case people need help encoding the data from e.g. METADATA, or is/was there a desire to try and slowly move the community over to JSON for storing this information?
I ask because I noticed two things. One, PEP 566 doesn’t specify a file name that could be used if a build tool chose to include a JSON version of METADATA or if an installer decided to add such data itself post-install (e.g. Flit and installer, respectively). Two, reading over Recording installed projects — Python Packaging User Guide made me notice that METADATA is the one file format that is intrinsically tied to Python’s stdlib (you could argue CSVs are not a standard, but the core metadata spec explicitly calls out email.parser as how you’re expected to support reading core metadata while I think how CSVs should be formatted is well understood).
IIRC, the goal was just to formalize how metadata could be turned into JSON, with the primary use cases being something like PyPI’s JSON API. There was never a plan to include JSON metadata in distribution files, hence nothing about filenames.
From what I recall, it was an experimental thing that wheel added, a METADATA.json. But it never got standardised. I do like having a formal mapping to JSON, though, even if it’s not used in any standard files.
That part of PEP 426 seems to also be about carrying forward PKG-INFO data through the build process, so it doesn’t seem to touch on what a potential wheel-related file would be named.
Anyway, thanks for the info, everyone! Sounds like the lack of file name is on purpose and no specific plans were missed by leaving it out.
Yeah, it is odd that we have specs for so many things, but not for the basic key/value format in METADATA files. If anyone does want to fix this at some point, there’s a first pass at a formal grammar for them here. (Though for a more serious attempt, you’d want to do a lot more extensive testing against PyPI.)
I also recall experiencing immense difficulty just trying to reliably fix the single concrete example because the email message parsing module is extremely poorly documented and some comments are subtly wrong or incomplete.
Does PyPI perform any sanitization of uploaded METADATA? As OP notes, it is very surprising that METADATA is not structured, particularly since PEP 658 establishes a specific URL to request metadata for packaging purposes. I recall it took multiple years for the backfilling process to complete, and uv’s recent CVE advisory from August indicated Astral remains unaware that zip file tricks haven’t been necessary for resolve performance on PyPI for multiple years.
(As a side note, I’ve been trying to find a contact at fastly/PyPI for years because I am pretty sure pip can save them a lot in hosting costs by leveraging existing HTTP caching standards: GitHub · Where software is built)
I would love to propose a PEP for this together @brettcannon. I think the email format happens to produce the kind of human-readability you describe as a design goal of PEP 751. One way to avoid choosing either/or structured vs readable would be to (as proposed itt) produce a bijective transform from plain text <=> json.
I also have several drastic performance improvements to components of the packaging library in a branch, and after the uv CVE I have been thinking that a standardized test suite/fuzzing harness might be more useful than I’d realized. I’ll break out another issue for that, but necrobumped this one because it retains useful context.
Note that because Fastly donates 100% of PyPI’s bandwidth, and because PyPI’s total bandwidth is a drop in the bucket for Fastly, this may not be as high-priority as you might expect, but I do think a goal of making end-user installations faster is a good one.