PEP 658: Static Distribution Metadata in the Simple Repository API

As there is another metadata file format metadata.json PEP 426 -- Metadata for Python Software Packages 2.0 | Python.org
how would this PEP handle it?

PEP 426 has been withdrawn.

Tzu-pingā€™s proposal is flexible. We can easily add additional files later, e.g. x-42-py4-none-any.whl.metadata now and x-42-py4-none-any.whl.metadata.json or x-42-py4-none-any.whl.metadata.yml later.

I think I meant to write Since tools generally only need dependency information (the to is redundant). Thanks for catching this!

Yes, thatā€™s the idea. (Donald came up with that.) I also included dist-info in the attribute name to avoid possible confusion in the future if we ever have another file named METADATA thatā€™s not in the .dist-info directory.

2 Likes

How about something like this

  • x-42-py4-none-any.whl
  • x-42-py4-none-any.dist-info/METADATA

x-42-py4-none-any.dist-info is a naming scheme not used anywhere else so I wonder how people feel. Personally I like though, Iā€™ll propose this instead if others are fine it.

1 Like

I imagine that this will only work for wheels, not sdists,
and cutting off the file extension will make it more ambiguous
without context.

Agreed. I suggest we keep it simple, and just say that the metadata for file xxxxx is at xxxxx.METADATA. The PEP is solely about exposing the metadata, so letā€™s not over-generalise the solution, and by just appending a suffix to the filename weā€™re sure that we can support metadata for any file, with the only limitation being that a file can only have one set of metadata (which is true by definition).

3 Likes

sdists have a different naming scheme than wheels so there should be no ambiguity.

Anyway, this was just a proposal, reacting to the doubts about the naming schemes expressed before. Iā€™ve no strong opinion on this.

1 Like

One additional small suggestion/thought that you should feel free to completely ignore:

It seems like there are two approaches to go down in terms of naming.

One is to try to match the name as closely as possible to the name inside of the archive file. This pushes the handling of naming collisions onto the standards that define the files. e.g. what does METADATA mean. Unfortunately that also means that we might have to tweak things if, say, we do a wheel 2.0 that made METADATA a JSON file.

The other one is to just define our own filenames, and not try to match the in file naming scheme. If we go this route, it might be useful to include some extra information though. For instance, if we did foo.whl.metadata, and we upgraded to a new metadata version that wasnā€™t compatible, what would we do? We could do .json or .yml if we used json or yaml, but what if we used the same format? Would it make sense to do something like foo.whl.metadata.v1 to denote itā€™s v1 style metadata?

Alternatively we could just punt on it, call it foo.whl.metadata, and say weā€™ll figure out the best name if we ever need a second name.

Maybe we should include some of the content in WHEEL in the tag as well. For example:

<a href="...."
    data-dist-format="Wheel-Version: 1.0"
    data-dist-info-metadata="sha256:0123456789abcdef">
  x-42-py4-none-any.whl
</a>

The content of data-dist-format can only be Wheel-Version: 1.0 for now (same as the Wheel-Version line in the wheelā€™s WHEEL file), and weā€™ll designate a value for distribution formats that provide static metadata in the future.

Any thoughts on this? I think Iā€™ll add an attribute to indicate distribution format (maybe not the exact format above but something like wheel:1.0) to the PEP.

Whatā€™s the use case? As a general principle, Iā€™m -1 on bloating APIs ā€œjust in caseā€ something might be useful.

I know simple API pages mostly arenā€™t that big, but I just checked out of curiosity, and thereā€™s one (pyagrum-nightly) thatā€™s 6M in size, which isnā€™t exactly trivial. As thatā€™s 17296 links, all of which seem to be wheels, weā€™d be adding quite a lot of extra data. Obviously thatā€™s an extreme outlier, and weā€™d already be adding metadata links for every one of these, so itā€™s already going to add a lot of extra content, so maybe we really donā€™t care that much. But still, whatā€™s the gain?

1 Like

I think itā€™s for future compatibility, in case in the future we change the format of metadata (not adding fields etc., but e.g. use JSON instead). This can be handled in the distribution by bumping the version in the WHEEL file, but canā€™t be handled with the current proposal.

With that said, itā€™s also OK to not have that field now, and if we ever need it, define the absence of data-dist-format as the initial format version. So say if weā€™re ever to have a wheel 2.0, the tag will need to say data-dist-format="wheel:2.0", but a lack of

Iā€™m not convinced including the metadata version is helpful - what am I (as a resolver tool) supposed to do with it? Reject a package entirely? As soon as I decide to get the metadata, Iā€™m going to find out the format/version, and I canā€™t think of anything useful to do any earlier.

3 Likes

This seems like an odd requirement:

The metadata served must be completely static, i.e. identical to the METADATA file in the .dist-info directory [dist-info] if the distribution is installed. The repository can provide this for any distributions, but it is expected they will only provide them for wheels [wheel] at the current time, since an sdist [sdist] does not yet have a way to promise the metadata will stay the same after it is built.

The METADATA file in a wheel is necessarily static, by the definition of the wheel format installation protocol (since anything installing wheels is supposed to just copy over the metadata). Is this intending to explicitly rule out serving PEP 643 metadata files? If so, why? I would expect that a PEP 643 metadata file for an sdist would, on average, be much more useful than nothing. Even metadata files with Dynamic dependencies can be useful for things like pre-warming a cache when traversing a dependency graph.

Can we simply say that the metadata file must contain the same metadata that the relevant file contains? Alternatively, we can say that sdists must be core metadata >= 2.2.

1 Like

Itā€™s not the metadata version, but the distribution format version. This determines how a tool can actually make sense of the bytes sent by the server thatā€™s supposed to be the metadata file.

But if tool authors are having trouble understanding its use, thatā€™s a strong sign itā€™s not only not (yet) needed, but also wonā€™t be correctly used. Since that attribute can be retrospectively defined anyway (as mentioned above), Iā€™ll leave it out :slightly_smiling_face:

I was actually trying to rule out PEP 621 so people donā€™t get the wrong idea and start exposing pyproject.toml (which is not distribution metadata, but many people including tool authors confuse them). Youā€™re right, PEP 621 should be allowed. Any suggestions how I can improve the wording to include the right things (and only them)?

2 Likes

How about:

The metadata served must be specified in the Core Metadata Specification format. Metadata must only be served for standards-compliant build artifacts that expose their metadata in a canonical location (i.e. PKG-INFO for sdists and {distribution}-{version}.dist-info/METADATA for wheels). The data served must be identical to the data found in the built artifactā€™s canonical location.

Possibly you can track down canonical links for where it says where the canonical locations are. Possibly this is the one for sdists, though it also says pyproject.toml is required, which I didnā€™t think was the case, so I dunno.

2 Likes

Thatā€™s the correct link. It technically only covers new style sdists as defined in PEP 517. Thatā€™s because the older sadist format was never standardised and we didnā€™t attempt to retroactively standardise it. The metadata in older sdists is pretty much useless anyway (thereā€™s another thread here about that but Iā€™m on mobile right now so I canā€™t find the link).

Very good typo here :rofl:

17 Likes

I have submitted an edit to the Rationale for this. The change should reflect in the rendered PEP when someone reads this, but in case itā€™s not, hereā€™s the PR: PEP 658: Rationale edits by uranusjr Ā· Pull Request #1972 Ā· python/peps Ā· GitHub

1 Like