Backwards Incompatible Change to PyPI JSON API

dstufft · September 1, 2022, 1:41pm

It’s a fair question! And to be honest, if you had said something like “hey can we add the author, maintainer, description, etc” kind of thing, I would have pushed back on those particular pieces of metadata because they don’t fit great here.

I doubt it, they were issuing something like 600 requests per second and we ended up blocking the IP doing that Fastly, so if you can still access PyPI you’re good.

I would probably summarize one of the bigger problems with the JSON api is a combination of two things:

Because we have an URL per version number we have a very large number of unique cache keys with different content.
We include information inside of the responses that can grow unbounded ^[1], the releases key being one of the main culprits, which we have projects with thousands of entries in the releases key and querying/computing that took a big chunk of time. For example, we use UUIDs as primary keys, and simply producing uuid.UUID() objects when querying data from the database was taking ~200ms due to the sheer number of them on some of those urls.

It’s of course a dangerous power to have! Since you could imagine us adding say… a YAML serialization format and having some features only supported in JSON and some only supported in YAML, which would be an awful state to be in, but it’s a useful feature to let us not break backwards comparability with HTML but still evolve the API in JSON (or any future serializations).

Sounds good!

The same is true of the simple API! Every file added to a project is added to the same response so it grows unbounded. Fortunately we’ve carefully limited the data we pull in about those files so our query is still relatively fast and there is only a single response per project that we can cache for a long time (with a plan in the future to pre-render and store it to make it even faster). ↩︎