PEP 740: Index support for digital attestations

Update: @dstufft and I had a call earlier today to talk through some of the specifics in the PEP, and he pointed out that the current approach of embedding the provenance JSON into each file listing in the simple JSON API may scale poorly if (1) the attestations are large, (2) there are a lot of attestations per file, (3) there are a lot of files listed, or (4) all of the above.

So, I’ve done a bit of informal analysis using the numbers he gave me :slightly_smiling_face:

  1. First, a typical attestation will be approximately 5.3KB of JSON. This number comes from the example attestation we built for initial testing purposes.
  2. Initially, we expect to see 1 attestation per file per release per project, corresponding to the “publish” attestation that gets verified against the Trusted Publisher. Conservatively, we’ll estimate that PyPI may eventually host 3 attestations per file (one “publish”, one “build”, and one “third-party” attestation).
  3. The current average number of files per project is ~21.[1]

Given those numbers, we might reasonably expect a future average project to have 60-70 attestations, or ~318 KB of attestation JSON in its PEP 691 “project detail” endpoint. That’s a lot of JSON to push down the pipe, especially since we expect an installing client like pip to potentially only need/access a small fraction of all releases and their attestations :slightly_smiling_face:

Given the above, I’m going to change the PEP so that the suggested JSON API change does not embed the entire provenance object. Instead, the JSON API will behave like the simple index API and embed the digest of the provenance object, which can then be retrieved on-demand from an adjacent .provenance URL.

I’ll make the PR for that in a bit, along with a new appendix section summarizing the numbers above as rationale.


  1. Queried by Donald. ↩︎

11 Likes