PEP 807: Index support for Trusted Publishing

Draft PEP: PEP 807 – Index support for Trusted Publishing | peps.python.org

Pre-PEP thread: Pre-PEP: Trusted Publishing token exchange

Summary of the rationale and motivation

PyPI currently implements a technique that it calls Trusted Publishing, which is essentially a misuse-resistant token exchange mechanism that allows users to establish trust directly through a CI/CD or other identity provider instead of having to manually issue long-lived API credentials. Trusted Publishing has seen broad adoption on PyPI in the approximately ~2 years it’s been publicly available; some numbers are in the PEP itself.

While Trusted Publishing has been a success for PyPI itself, it isn’t a standard aspect of Python packaging. This means that third-party indices can’t easily implement it (without making implementation-specific assumptions from the Warehouse codebase), and that official PyPA tooling (like twine and gh-action-pypi-publish) are tied to Warehouse’s specific implementation decisions.

This PEP seeks to address both of these problems, and to make Trusted Publishing discovery and token exchange a fully-standard aspect of Python packaging (like other interoperability mechanisms). Specifically, this PEP seeks to standardize the Trusted Publishing discovery and token exchange flows so all parties (indices, including 3p indices, and clients) can implement them regardless of how similar their registry “topology” is to the PyPI topology (of a single registry on a domain).

Summary of the proposed changes

The bulk of the PEP’s standard language is a human description of the Trusted Publishing flow as currently implemented on PyPI. The main proposed changes are:

  1. A “discovery” flow that augments the current “implicit discovery” process used with PyPI, which currently makes service/topological assumptions about the registry that aren’t guaranteed to be true for other indices (namely that a single host only has a single registry, which isn’t true for many third-party registry hosts). The discovery flow also makes it possible for both PyPI and other hosts to make changes to their Trusted Publishing URLs without breaking existing clients.
  2. An “exchange” flow that closely mirrors the existing flow implemented on PyPI. This flow consists of two endpoints: an “audience” endpoint that tells the uploading client which OIDC audience they need to obtain, and a “token minting” endpoint that the uploading client must submit their OIDC credential to in order to return an (index-specific) upload credential.

As always, thanks in advance to everyone who provides feedback below! I look forward to hearing the community’s thoughts on this proposal.

CC @dstufft as sponsor/delegate

3 Likes

One thing I’m flagging for discussion/that I want feedback on: right now the error payload/model is a pretty bespoke one, and it mirrors what PyPI currently serves on its endpoints. I could see this being kind of annoying for integrators/third-party registries, which might prefer to use a more standard error response.

I only just today learned about RFC 9457 from this thread, so that’s one possible option! But I’m curious if others with more relevant experience here have other/better ideas too; I freely admit that HTTP API design is not my primary subject area :sweat_smile:

cc @kpfleming re RFC 9457 et al

3 Likes

Another thing I wanted to flag: right now the token exchange part of this PEP mirrors exactly what PyPI does. However, one bottleneck that PyPI has observed is with publishers that correspond to hundreds (or thousands) of packages: when that happens PyPI needs to issue a scoped credential for all packages that could be uploaded by the publisher, even though only a very small minority are likely being uploaded.

One potential solution to that would be to allow the token exchange endpoint to accept a set of packages that the publisher actually intends to upload to, which would then be intersected with the eligible set. This would both make life easier on the PyPI side (no gigantic issued API tokens) and would have a more general security benefit (in that we get automatic scoping with TP, but the user can also restrict beyond what the automatic scope would grant).

In practice, this would look something like this in the token minting endpoint:

POST /blahblah/mint-token

{
  "token": "oidc-cred-here",
  "packages": ["abc", "def"]
}

…so if the matching publisher was registered for abc, def, foo, and bar, it would only provide a scoped credential for abc and def instead of all four (which would be the default).

CC @miketheman and @dustin in particular for thoughts on the above :slightly_smiling_face:

1 Like

FWIW, I think the edge case that PyPI experiences here with a publisher w/ a large number of packages is more of a bug/implementation detail in PyPI than a general issue with indexes supporting trusted publishing (for context, https://github.com/pypi/warehouse/issues/18514).

I think slightly reducing the scope of an API token would be nice but I’m not totally convinced the incremental benefit is really worth the effort – given that the underlying identity would still have the ability to mint tokens for all the projects it’s configured for, this doesn’t protect a user with a compromised workflow, just in the narrow case where an API token leaks (within the expiry window, without the workflow also being compromised somehow).

If reducing the impact of API token leak is the ultimate goal, I would suggest instead that we find a way to make tokens single-use instead. This could even be a configurable setting that the index could support, which clients would detect based on the response from the token-minting endpoint, and handle refreshing the token automatically for uploads that need multiple separate requests.

1 Like