Provide distribution metadata as 'data' attributes on links in simple index

Where does the data-requires-python information shown in the attributes of links to distributions on PyPI (PEP 503) come from? Does warehouse read this bit of information from the metadata inside the distribution file? Is it delivered by the client on upload? From a quick research, seems like it comes from the client (here in twine).

In a similar way, could all the metadata relevant to dependency resolution (name, version, dependency requirements, Python requirements, is there more?) be published in similar data attributes (maybe Base64 encoded, maybe all merged into 1 attribute). The goal would be that in a best case scenario the dependency resolution could be done completely without downloading any distribution (only by parsing the HTML pages).

For wheels, this metadata is (as far as I can tell) static. Work is currently being done to specify a way to declare this metadata in a static way (where possible) for source distributions as well.

Tools such as pip could cache this information (the HTML pages) and potentially go as far as doing most of the dependency resolution offline.

Does it seem feasible, reasonable, and meaningful?

1 Like

You might be interested in https://github.com/pypa/warehouse/issues/8254

1 Like

Thanks for the link! Indeed this is interesting to me, this would help reach the same goal: faster dependency resolution.

My suggestion here is obviously much more lightweight (hackish, less robust?), and I won’t try to defend it too hard (especially in regard to the whole TUF topic, I only have vaguely read about it), but I will still mention a couple of advantages:

  • 1 HTTP request per project, should give all the info (but growth of the size of these pages)
  • most of the work is already done in warehouse, in PEP 503, in pip, it just needs to be extended from 1 attribute (data-requires-python) to 3 or 4 (name, version, dependencies, extras, platform)
  • no need to read the distributions server-side, twine already seems to provide the info (reliable?) (but I assume warehouse reads the distributions anyway to build the project pages)

Any link to the discussion that led to the decision of adding data-requires-python to PEP 503 (but not the other metadata)?

1 Like

The discussions regarding backtracking in pip’s dependency resolution (and distantly related: sdist metadata) are showing up in my feed again (such as this one). So I thought I would give this idea a fresh boost, maybe catch some new eyes.

I think the advantages are obvious.

If data-requires-python is effectively used reliably by pip to exclude some distributions, why not extend the concept? But maybe pip does not actually uses this attribute, in which case this idea has not much value.

My impression is that PEP 503 is somehow on the way out, so people might be reluctant to add new data-attributes to it, if the whole thing is gonna be scraped soon anyway. Is that the case? What would be the other drawbacks of such a solution?

1 Like

@sinoroc This is basically the same proposal as what the issue that @dstufft has linked to. I’d suggest moving the discussion there.

OK, will do. I was initially put off by the whole TUF-thing at the time. And also I was waiting for some kind of hint that it was an idea worth discussing further.

No. There is no plan to kill PEP 503 right now, and all proposals are built upon it, not to replace it. The reluctance you observe against putting things on that page is due to the Simple Repository API lacking a versioning mechanism, thus most additions would involve serious backwards incompatibility considerations. The amendment (PEP 629) to this would improve the situation.

The TUF stuff isn’t really important for designing this API extension, other that knowing it exists.

I think it’s something that has broad support, and would be relatively simple thing to add. I think we’re just missing someone with the time to push forward on it currently.

I opened a warehouse ticket.