To my knowledge, there isn’t an easier way to access this type of information. I figure that this dataset would be useful for folks looking to answer questions on an ecosystem-scale such as use or adoption of packaging features, metadata fields and values, build backends, etc. This lets PEP authors and packaging maintainers make more data-driven decisions about the current state of Python packages both historically and in the current day-to-day.
I hope everyone that is interested finds the information and guide useful, happy to answer questions.
You can find the content of individual files in a distribution, so look for all sdists with a pyproject.toml and parse that for the backend information. Any sdist without a pyproject.toml is setuptools based.
@ofek You would need to grab every pyproject.toml and WHEEL and do some post-processing, but you can do that by downloading those individual files instead of whole archives.
I actually was going to ask about this separately (also related to the debate around Python version upper-caps), but.
Would it be possible for dependency resolvers (especially installers that have to resolve dependencies) to query a dataset like this, such that they don’t have to repeatedly download entire wheels just to check metadata?
Would it be possible to keep that data up to date automatically when wheels are published?
Does the (official) metadata really have to live exclusively inside wheels?
PEP 643 would allow similar optimisations for some sdists (ones that don’t compute their dependency data dynamically) - note that PEP 658 isn’t limited to wheel metadata (although the PyPI implementation might be, currently).
Edit: Backfilling wouldn’t be possible, though, as sdists have to choose to publish metadata 2.2, and older sdists won’t have done so.
I think it’s basically waiting on someone to have the time to take the work to completion. Once PyPI supports metadata 2.2, then build backends like setuptools, hatch, etc., can start generating it.