PEP RFC: Python Package Index (Warehouse) JSON API v1

Greetings!

@cooperlees @sumanah and I would like to propose a PEP that formalizes the existing JSON API.

The PEP introduces a JSON Schema and includes changes to the API URL structure

Non-goals

The following is not part of this proposal, but is likely to warrant subsequent PEPs:

  • Adding properties that aren’t already returned by the legacy JSON API endpoints
  • Removing properties that are already returned by the legacy JSON API endpoints
  • Adding discovery endpoints
  • Adding pagination capabilities
  • Adding authentication
  • Adding writeable endpoints
  • Supporting TUF (PEP 458): This version of the JSON API is not protected by TUF, and so should not be used for dependency resolution.
  • Deprecating XMLRPC API: The PEP lays out the foundation for the future deprecation of the XMLRPC API.

Proposed API structure changes

$root/pypi/$project_name/json          -> $root/api/v1/project/$project_name/latest
$root/pypi/$project_name/$version/json -> $root/api/v1/project/$project_name/$version

Help needed

  • Should X-PyPI-Last-Serial header be part of the spec?

  • I’d appreciate if someone with better understanding of the domain can verify for all properties:

    • is it nullable?
    • is it required?
    • is it deprecated?

Relevant background:

  • https://discuss.python.org/t/pep-for-the-python-package-index-json-api/5717
  • https://github.com/pypa/packaging-problems/issues/367
  • https://github.com/devpi/devpi/issues/801
2 Likes

I assume the existing URLs will redirect to the new scheme? I don’t see it mentioned in the PEP, although it is probably a warehouse implementation detail and out of scope.

1 Like

As a migration/transition point, can the spec be explicit that on PyPI the serial numbers will not be reset when moving to the new API, so that tools that use the last serial number to detect changes when calling the existing API will be able to change URLs transparently, and will not have to re-fetch data they already have just because all of the serial numbers have changed?

I like this, but am I reading the spec right and it’ll always be the same as the last_serial value in the JSON response? I can imagine using this to issue a HEAD request and skip requesting the body if there’s no serial number change. I don’t do that at the moment, but I certainly could (and given that I’m downloading a few thousand responses in a single run, it could be a worthwhile saving).

I can try and do that for you - if I get a chance this weekend I’ll take a look. I don’t work on warehouse itself, so I can’t check against the DB schema or anything like that, though.

3 Likes

Thanks for working on this!

I took a quick pass through the PEP (and found some typos, PR here: Update pep-9999.rst by di · Pull Request #1 · nchepanov/peps · GitHub). My main issue with this PEP is that it doesn’t really do what it says it’s going to do. In the “Motivation” section, it says:

This PEP aims to lock in the existing standard as a guarantee for consumers

But then later, one of the goals is:

Declare legacy JSON API endpoints deprecated

And instead this PEP describes how the new JSON API will work.

Given that all the indexes you mentioned will probably continue to support the “legacy” API for a long time, I think we actually need two PEPs:

  1. A PEP that defines the existing, legacy JSON API as a standard, that people can continue to use
  2. A PEP that declares that standard deprecated and provides the new standard JSON API

With the regards to the new standard, I’d like to see some of the comments that @dstufft raised around hypermedia-based APIs addressed as well (e.g. here and here). I think this is a really important part of the new API and it needs to be included from the very start for any proposal for a new API standard.

I agree, some discussion of how these APIs should be deprecated/discontinued in a way that won’t break downstream consumers should be included here.

As an implementation detail of PyPI, these are baked into our DB and won’t change when they are surfaced via any API. I’m curious though – does anyone know if any third-party private indexes provide this field as well? Or just mirrors?

Devpi includes X-Pypi-Last-Serial for mirrors, identical to the value sent by PyPI for the mirrored page. It also includes X-Devpi-Serial for both mirrors and private indexes.
(This is for the simple API, as devpi has no JSON api yet).

1 Like

If you wanted to include the JSON schema in the PEP, you can put it in an HTML collapsible

.. raw:: html

   <details>
   <summary><a>JSON schema</a></summary>

.. code-block:: json

   {}

.. raw:: html

   </details>

I would suggest converting the schema to YAML for readability


Also, there’s an opportunity to have an OpenAPI schema

And instead this PEP describes how the new JSON API will work.

Lets tune the wording here then. This PEP’s goal is to do minimal changes to start the deprecation of the old endpoints (e.g. that we can leave un-versioned and unchanged for a defined period with warnings to callers) and introduce the same API with better name-spacing, versioned and spec’d defined data offered so we have a record of what it offers and mirrors/other indexes can implement a JSON endpoint too (per index).

The data offered / schema changes here are very little of the actual JSON contents that are offered today. Due to this, I don’t really call this a new API. It’s more of a long overdue cleanup and defining what the “JSON” API actually offers.

How do Nikita and I move forward here? Is it a shared preferred view between thos who make the call here we make two PEPs for this? I feel the first PEP will be non beneficial due to:

  • Non friendly namespacing makes it hard to implement elsewhere outside of pypi/warehouse
  • The fact the majority want to move to a new or extended version of this api

Can we just modify this PEP to talk about the legacy {URL}/pypi/PKG_NAME/json and it being available for a period until we make /api mature and get all main callers using it? I’m happy to try identify via logs the callers and reach out / do PRs to move to the new versioned URL too.

1 Like