Asking for clarification about the status of the APIs

(Reposting from Asking for clarification about the status of the APIs · Discussion #12131 · pypi/warehouse · GitHub for wider attention)

image

Iteratively parsing the list of package files from the Simple API to match a package version raises the barrier to entry for someone writing a package manager, package version lock manager, etc…

  • However, Return "versions" on legacy JSON api by dimbleby · Pull Request #12079 · pypi/warehouse · GitHub got me completely lost: there is a legacy/api/test_json.py file (which I suppose tests the Simple API, previously known as the Legacy API) but the JSON response seems to include ["info", "releases", "urls"], which is not what PEP 691 specifies.
  • The problem with "files" is that I don’t see the "upload_time_iso_8601" key from "releases", and therefore I wouldn’t know how to extract release dates.

So I have tons of questions, but the most important one to me is: What is the supported way of extracting release dates of each version of a package?


By the way, the problem I was trying to solve is solved by pypi-timemachine but as far as I understand, it’s using a deprecated API at the moment:

We need to rework the API Reference portions of the Warehouse documentation, but the “Simple API” is documented here, but that documentation is also somewhat out of date now. The name “legacy” there is misleading.

More details are in the various PEPs that define it (linked from Simple repository API — Python Packaging User Guide, but that also is missing PEP 691 which adds JSON support).

Thanks for the clarification @dstufft.

If I understand correctly then, for now the "releases" key is going away, and we should parse the filenames (or even download, unpack, and inspect them - still not sure if the filename would give me 100 % of the metadata I’d need).

The releases key is going away in the per-version endpoint, at pypi/packagename/version/json endoiints. It’ll still be there at the top level, in pypi/packagename/json.

That’s not what the admonition on top of the docs say:

(Introduced in Deprecate the releases key on the non version json too by dstufft · Pull Request #11777 · pypi/warehouse · GitHub)

Or am I misunderstanding something?

IMO, the releases key can’t be removed from the project page until the simple API has a list of releases. I’m planning on writing a PEP to cover this:

1 Like

We have no specific plans to remove the releases key at this time, but we do want people to move away from it (and TBH, the legacy JSON API entirely) if possible [1].

If you need the releases key today and it’s the only thing that provides the information you need, then you’re OK to use it, with the understanding that if it becomes problematic like the same key on the versioned URLs did, then we will remove it.

The “Simple” API (I really hate this name, we should really think of a better name for it) is standards based, and thus we cannot remove things from it without going through the PEP process AND we’ve put a lot more effort into scaling it out (and continue to do so) so if you’re able to use that, then that is better option.

If you’re not able to use that, but there’s something we could add to it that would make it possible to use it, then writing a PEP (like @pf_moore is planning to do) to add it so we can discuss it and hopefully add it, is a great option, and the primary option (besides writing a PEP to standardize an additional API) if you don’t want to rely on something that may disappear without warning.


  1. This is honestly true for the entirety of the legacy JSON API, but we don’t want to just replicate that API into the simple API because fundamentally the structure of the legacy JSON API is particularly bad for scaling it out and we’d rather make targeted additions than just wholesale copying. But in general, anything on the legacy JSON API is subject to removal or alteration without warning if the PyPI admins need to do so, but we generally try not to do that unless we really need to. ↩︎

1 Like

IMO, the biggest remaining item that is only available via the “legacy” JSON API is the project metadata. There are already standards for exposing this, in the sense that PEP 658 standardises how the “simple” API can expose distribution metadata for wheels (and, in conjunction with PEP 643, for sdists).

As far as I can see, support for these two PEPs is being tracked by

It seems to me that the best approach would be to retain the JSON API (at least the “versioned” part) until these two issues are resolved, rather than divert effort into writing and/or debating new APIs/standards. I know the current metadata exposed in the JSON API isn’t directly equivalent to the distribution metadata exposed by PEP 658, but in reality, it’s unreliable to use the per-project metadata anyway, precisely because metadata can vary by distribution file.

I don’t know what the long-term fate of the per-project metadata is likely to be. It’s exposed in places like the “Filter by Classifier” capability of the PyPI website, which simply can’t work without some concept of the metadata of a project as a whole. I imagine people could want access to some form of “usually good enough” data like this for programmatic use, and given that Warehouse stores it, maybe exposing it would be worthwhile. But I don’t see a viable way that we would ever standardise something like this, so a PyPI-only API to access the (also PyPI-only) per-project metadata might be of use longer term. It would presumably take project name and version, and return the metadata (much like the JSON API does now).

Outside of the JSON API, the only other legacy APIs that I am aware of which have no useful replacement are all in the XML-RPC API:

  1. The search and browse APIs. The search API at least has been disabled for some time, and while there’s no replacement for it, people seem to have coped. A package search mechanism would likely be good to have, but it could probably be built as a 3rd party product, by mirroring the metadata, if it’s not possible to implement a scalable replacement in PyPI. And a 3rd party tool might be better anyway.
  2. The user/role APIs. I’ve no feel for what might be needed here, as I don’t know of any users of these APIs. Maybe they can just be dropped?
  3. The mirroring APIs. For mirroring, the basics are present in the simple API in the form of (not standardised) last-serial fields. But I could see a standard for a mirroring capability as something that would be useful.
  4. The changelog API. I’m personally quite fond of the changelog data, and I’d like to see it remain available. There’s a load of interesting data in there about project activity, lifecycles, etc. But honestly, I can’t see that being standardised - it will always be a Warehouse-specific API. Maybe it could be moved to a JSON-style API, though (a simple pageable list of records would probably be sufficient)

Sorry, that turned into a bit of a brain dump of “where I see things being up to”. Hopefully, it was useful :slightly_smiling_face: