Backwards Incompatible Change to PyPI JSON API

I’d argue that dependency information (or the METADATA being implemented on PyPI) is also a very valuable addition to have here. :slight_smile:

Metadata is already standardised, and will be present as soon as Warehouse implements the appropriate PEPs. I’d love that to happen sooner, but that’s up to the Warehouse developers’ workload (and volunteer contributions) and there’s nothing more from a standardisation point needed here.

That’s fair. I was going off what the PEPs said, but real-world use cases are more important. (I’m just hesitant about pushing my use case too much, because “mirror all the metadata you can get your hands on” is by definition going to be in favour of adding everything regardless!)

We’ve had feature requests in pip for “don’t install things older than X”.

OK, consider me convinced (although I still think we should sanity-check, and I still think there’s a possibility of cases where “we should keep this available but the simple API isn’t the right place” is the right choice).

Understood (and I agree in principle). I’m just concerned that if we tie replacements to standardisation, there’s an unhelpful pressure on “we need to standardise quickly before the current solution gets removed”. Just something to be aware of, I guess.

… I do hope that wasn’t me :worried: My scraping app is pretty naive in some ways - I know I need to be careful of resource usage, but I don’t honestly know if what I’m doing is right.

Thanks, I thought there was something like that but couldn’t find it.

OK, that’s fair. As long as we don’t end up adding stuff to the simple API that means it becomes a problem like the existing APIs, then I’m fine with that. I’ll trust you to keep us honest on that score, though. I don’t have enough knowledge to do that.

So thanks for your response. I guess that puts us back at a point where I’ll create a PEP adding a versions key to the project page, and upload-time and size keys to the file elements. I’ll try to get that done in the next few weeks.

It’s a fair question! And to be honest, if you had said something like “hey can we add the author, maintainer, description, etc” kind of thing, I would have pushed back on those particular pieces of metadata because they don’t fit great here.

I doubt it, they were issuing something like 600 requests per second and we ended up blocking the IP doing that Fastly, so if you can still access PyPI you’re good.

I would probably summarize one of the bigger problems with the JSON api is a combination of two things:

  • Because we have an URL per version number we have a very large number of unique cache keys with different content.
  • We include information inside of the responses that can grow unbounded [1], the releases key being one of the main culprits, which we have projects with thousands of entries in the releases key and querying/computing that took a big chunk of time. For example, we use UUIDs as primary keys, and simply producing uuid.UUID() objects when querying data from the database was taking ~200ms due to the sheer number of them on some of those urls.

It’s of course a dangerous power to have! Since you could imagine us adding say… a YAML serialization format and having some features only supported in JSON and some only supported in YAML, which would be an awful state to be in, but it’s a useful feature to let us not break backwards comparability with HTML but still evolve the API in JSON (or any future serializations).

Sounds good!


  1. The same is true of the simple API! Every file added to a project is added to the same response so it grows unbounded. Fortunately we’ve carefully limited the data we pull in about those files so our query is still relatively fast and there is only a single response per project that we can cache for a long time (with a plan in the future to pre-render and store it to make it even faster). ↩︎

I was very happy to stumble upon this thread and realize that there was ongoing discussion of many of the concerns we have from the Poetry side of things.

We are nervous as to the possibility of data being removed from the JSON API (or the JSON API being fully removed before PEP 658 and PEP 691 are fully implemented) – @pf_moore seems to have expressed all the same concerns however, and the hypothetical PEP he proposes would allow Poetry to migrate fully to PEP 691 once it is implemented.

I don’t think I have anything additional to add to the discussion, except that I’ll keep an eye on this and will speak up for Poetry if needed (or likewise, you can tag me if you need to communicate with the Poetry project).

1 Like

Sorry, this completely dropped off my radar. I’ve just created a PR - PEP 700: Additional Fields for the Simple API for Package Indexes by pfmoore · Pull Request #2840 · python/peps · GitHub