Pre-PEP: Trusted Publishing token exchange

(This is a new version of https://discuss.python.org/t/pre-pep-standardizing-trusted-publishing-token-exchange/103057/1, which I fat-fingered before my draft was ready…)

Hello all! I’m opening this up as a pre-PEP discussion thread, in advance of writing a full PEP. I’m hoping to get feedback on the thoughts/ideas above that will inform the ultimate PEP language.

TL;DR: Trusted Publishing has been a huge success on PyPI, which currently exposes it via PyPI-specific APIs. I propose that we standardize a variant of these APIs so that uploader tools can depend on them in a standards-compliant manner. Additionally, I believe a standard variant of these APIs will spur adoption of Trusted Publishing in the broader Python packaging ecosystem, e.g. on private indices that wish to be compatible with standard uploading tools.

Disclaimer

I am a contributor to PyPI, and am a paid employee of Astral, which is developing a service that includes a private registry. I disclaim that I’m interested in this PEP for both PyPI and work reasons :slightly_smiling_face:

Problem statement

Trusted Publishing has been a huge success on PyPI: over half a million nearly a million files have been uploaded with it (as of today), and it’s now the default upload flow in pypa/gh-action-pypi-publish as well as a first-class flow in twine, uv, pdm, maturin, and so forth. Additionally, Trusted Publishing is the “identity” foundation for Index hosted attestations (formerly known as PEP 740).

Despite this adoption and the fact that it’s referenced in packaging standards and standard (i.e. PyPA) tooling, Trusted Publishing isn’t itself a packaging standard: it’s currently implemented as a (small) set of PyPI-specific REST APIs that tools are consuming much like the unstandardized upload API.

I think this should be remedied with a standard!

Solution statement

I propose that we standardize a variant of the Trusted Publishing token exchange APIs. I think this will have two immediate benefits:

  • Upload tooling (twine, uv publish, etc.) will be able to rely on these APIs as a packaging standard, rather than as a PyPI implementation detail. This would align with the general goal of reducing the number of non-standard interfaces present in standard Python packaging (with the upload API being a notable remaining one, at least until PEP 694 or similar is accepted).
  • Indices beyond the current implementation of PyPI will be able to implement these APIs in a standard manner. I believe this has virtuous implications for both PyPI and third-party indices:
    • For PyPI itself: an idea that’s been circulated before is supporting multiple “registries” on pypi.org itself, i.e. allowing users or organizations to define their own fully isolated registries. This use case would require PyPI to expose separate routes for each “registry,” which in turn means that per-registry uploads would need their own Trusted Publishing routes. Correspondingly, clients will need a standard way to discover those routes.
    • For third-party indices: I believe users of third-party indices (including the one I work on) would like to use Trusted Publishing like they do with PyPI. A standard that describes how these third-party indices should implement the relevant APIs would insure interoperability with Python’s standard and community-maintained tooling.

Proposal

In terms of concrete details: I believe this PEP should standardize the shape of the Trusted Publishing APIs, not their internal implementation details. This would be similar to other index-specific packaging PEPs, where PyPI’s implementation is not constrained so long as PyPI presents the appropriate shape of API for interoperation.

Roughly, here’s what I propose in the PEP:

  • Trusted Publishing support discovery: right now, upload clients can determine whether an upload-supporting index supports Trusted Publishing by hitting {domain}/_/oidc/audience and getting a successful response containing the index’s expected OIDC audience. I propose standardizing a variant of this as {upload_base}/_/oidc/audience (route name subject to bikeshedding!), where upload_base is the “parent” path for the upload endpoint itself. For example, PyPI serves its upload API as https://upload.pypi.org/legacy, so the appropriate audience endpoint would be https://upload.pypi.org/_/oidc/audience.

    The primary advantage of this is that it’s compatible with what PyPI currently does and allows future implementations more flexibility than would be possible with a domain-rooted route – for example, a hypothetical index host could offer audience discovery at https://example.com/blahblah/{foo,bar}/_/oidc/audience for two different registries that accept uploads at https://example.com/blahblah/{foo,bar}/upload respectively.

    This is also in-principle compatible with a future standard upload API (e.g. in PEP 694).

  • Trusted Publishing token exchange: I propose standardizing {upload_base}/_/oidc/mint-token as a generalization of PyPI’s current {domain}/_/oidc/mint-token, similar to above.

  • Token Publishing token “burning”: I propose this as an optional extension to the current PyPI-specific APIs: {upload_base}/_/oidc/burn-token would accept a POST containing a JSON payload containing the temporary API token, which instructs the index to burn (i.e. deactivate) the token. This allows supporting upload clients to slightly enhance Trusted Publishing’s (already) short-lived token model by forcefully deactivating temporary tokens instead of letting them naturally expire.

I propose that the above (except for the “burning” endpoint, which doesn’t exist yet) take the general shape of the current Trusted Publishing APIs on PyPI, which are documented under “the manual way” here at the moment. However, I think there’s area for improvement in these APIs as well, so I welcome vigorous feedback on tweaking these for standards purposes :slightly_smiling_face:

Other considerations

As proposed above, Trusted Publishing remains fully agnostic to Trusted Publishing providers. In other words, it continues to be up to each uploadable index to determine which OIDC providers it wants to support. I think this would be virtuous to preserve, as OIDC itself is fully federated and IMO a standard here shouldn’t pick “winner” vendors or service providers.

Another open consideration I had is whether Trusted Publishing “discovery” in a standard here should be a bit more structured – two options come to mind:

  • We could standardize a .well-known endpoint (e.g. {domain}/.well-known/trusted-publishing) which would then advertise templated endpoints. However, I think this retains some of the downside of having things grounded at the domain level and potentially over-complicates the implementation for PyPI itself.
  • We could standardize a new discovery endpoint with the same scheme, e.g. {upload_base}/_/oidc/discover, which would then advertise appropriate audience, mint-token, etc. URLs in its response. This has the benefit of making discovery more explicit, rather than being a “hope and pray” action on the audience endpoint itself. It would also allow flexibility in the audience, mint-token, etc. URLs, i.e. would allow the upload-supporting index to place them somewhere other than {upload_base} while preserving isolation of different upload endpoints.

Finally, I think {upload_base}/_/oidc/* is a pretty confusing set of URL paths, and I’m very open to feedback on making them into something much more suitable for a public and standard API :sweat_smile:. Some random ideas I had:

  • {upload-base}/trusted-publishing/* – reuses our standard terminology, avoids the magic underscore
  • {upload-base}/publishing/* – too generic?

CCing a few people who I expect will be interested in this: @miketheman @dustin @dstufft @sethmlarson @EWDurbin

5 Likes

Generally very supportive of an effort to standardize Trusted Publishing and bring this to other indexes.

Currently 642K files published. :slightly_smiling_face:

N.B., this kind of already exists via Secret reporting API - PyPI Docs (albeit also un-standardized), we might want to think about how this would interoperate with that as secret disclosure has very similar, but slightly different interface/behavior than “burning”.

I think you’re probably implying this, but it would be good to be explicit that this is not an attempt to standardize the token itself (i.e. the index can use some format different than what PyPI uses) and just standardizes the request/response format.

I’ll give it some more thought, but think I would lean towards this a bit, as this would solve both the discovery problem, and also let registries host the actual exchange endpoints anywhere they want. I’m thinking that a single response that combines the audience with pointers to the expected endpoints would probably make a lot of sense, so this doesn’t in practice require an extra request/response for clients.

1 Like

I’m very supportive of this proposal, thanks for writing it!

While it’s probably out of scope for this proposal, and I don’t disagree with the principle, I do think it would be helpful to document the requirements on the service providers for interoperating with TP-compliant indexes. I’m thinking specifically about private CI/CD services or even non-GitHub/GitLab based services. If for example such a checklist or document existed[1], then that service could potentially petition to be supported by the index by affirming that all the requirements are met.


  1. possibly even with test scripts? ↩︎

1 Like

Thanks @dustin and @barry!

Yeah, this is an interesting point – the two are mechanically almost identical, with only slightly different semantics in terms of how the “intent” is communicated (“this token was leaked and I want to neutralize it” versus “this token is still secured by I want to neutralize it because it won’t be used again”).

Given that these overlap so much, maybe it makes sense for me to defer the “burning” idea here for a different PEP that standardizes credential revocation more generally? I think having a single endpoint (rather than up to N potential endpoints) would be nice in terms of simplicity.

Yes! I’ll make that explicit in the ultimate PEP language.

Agreed with these benefits! I’m curious if you have thoughts about how to reconcile this with how the upload API (and standard index APIs) are currently “discovered”: right now both standard and non-standard APIs are identified by their base URLs, meaning that none of them require the index host to actually have full control over its domain.

This contrasts with .well-known, which is only defined at the “root” of the path, i.e. {domain}/.well-known/. So implementers would have to effectively demonstrate full domain control, which might be a stronger form of control than they currently have.

Another concern I have with a single .well-known discovery is that we’d effectively need to offer “templated” discovery, but installers may not have an obvious host-agnostic way to interpret those templates.

For example, imagine that foo.example.com offers two (or N) uploadable registries: https://foo.example.com/index/a/upload and https://foo.example.com/index/b/upload. Then, the .well-known discovery endpoint would be https://foo.example.com/.well-known/trusted-publishing (or whatever), and might contain something roughly like:

{
    "audience": "foo.example.com",
    "mint-token-endpoint": "https://foo.example.com/index/{registry}/mint-token"
}

…so the uploading client would effectively need to know how to interpolate {registry} (or possible even multiple keys), which might demands too much in terms of uploading clients understanding the structure of destination repositories.

I think an advantage to the “{upload_base} discovery” approach is that it starts with something the uploading client already demonstrably knows (the upload URL), so there’s no disambiguation needed. To use the example above, https://foo.example.com/index/a/upload would have a corresponding TP discovery URL at https://foo.example.com/index/a/trusted-publishing-discovery (or whatever), which would then contain:

{
    "audience": "foo.example.com",
    "mint-token-endpoint": "https://foo.example.com/index/a/mint-token"
}

…and similar for https://foo.example.com/index/b/upload. So no ambiguity, although at the cost of not getting to use the .well-known standard (which I love dearly) :sweat_smile:

Yes, totally agreed. PyPI actually has its list of requirements documented here:

It might be hard to make this consistent across all possible implementers of Trusted Publishing, however – I can imagine that some third-party indices will want to use their own internal IdP as an OIDC provider, which is fantastic but probably doesn’t mean that same provider is suitable for use on PyPI or a different 3p index.

1 Like

Like this? https://docs.pypi.org/trusted-publishers/internals/#how-do-i-become-a-trusted-publishing-provider

2 Likes

Yes! I wasn’t are of that page. Thanks @woodruffw and @dustin

1 Like

Correction: 984K files published (h/t @miketheman for catching my mistake!)

3 Likes

Is part of the PEP’s intent to shift currently-implemented guidance away {domain}/_/oidc/… to {upload_base}/_/...? In concrete terms, PyPI implementations are currently pointing at https://pypi.org/_/oidc/… and it seems like this is recommending that under the new guidance this would move to either:

  • https://upload.pypi.org/_/oidc…
  • https://upload.pypi.org/legacy/_/oidc…

Can you help clarify the intent of what upload_base would resolve to in its current form, and whether it would be a change for those already implemented TP on the current paths?

I’m not sure where the underscore came from, but PyPI has been using it for a while to manage machine-to-machine style integrations.

I appreciate the desire to have something more human-friendly, but maybe that was the original point? These aren’t for human (browser) consumption.

My preference is keep the PyPI URLs already implemented for consistency, over adding more URL handlers to support more renames, or managing the change through a deprecation cycle/brownout/etc. if there isn’t a clear benefit to the change.

RubyGems has implemented theirs at /api/v1/oidc/trusted_publisher/exchange_token, and npmjs has done theirs at /-/npm/v1/oidc/token/exchange/package/${escapedPackageName} - so is there actually a need for the URL/paths style be consistent?

Yeah, the former (https://upload.pypi.org/_/oidc…) was the idea – {upload_base} was meant to refer to the “parent URL of the upload URL”, i.e. https://upload.pypi.org/ for https://upload.pypi.org/legacy.

The “trick” here is that we already serve https://upload.pypi.org/_/oidc/..., although @dustin notes that this is likely a quirk/historical accident due to Pyramid routing across domains :sweat_smile:. So this was my attempt to do something cheeky and technically preserve route compatibility while changing how the route is actually “defined.” However, this is probably mooted by us wanting to formalize discovery more generally, so the compatibility points aren’t super relevant.

To clarify, I meant that I thought /oidc/ in the route was the confusing part! My rationale there is that we really don’t refer to Trusted Publishing as OIDC in most public-facing docs, so having it ossified into a PEP might cause confusion (and also means that we’re formally committing to Trusted Publishing being built solely on OIDC, rather than OIDC being a kind of publishing identity source we currently support).

With all that being said: per @dustin’s point about discovery, we could definitely do this in a way that doesn’t require PyPI’s URLs to change at all – the PEP could specify only the discovery URL route format, which would then tell the installer where to find the audience and mint-token URLs. That would be the best of both worlds, I think: it’d require only a small route addition to PyPI itself (for discovery), while still achieving the main objectives of the PEP (standardizing everything, and giving multi-index registries the ability to isolate different trusted publishing endpoints).

2 Likes

I like the idea of having a standard. When you mentioned .well-known/, it ocurred to me that since other ecosystems (rubygems, npmjs) picked up the idea of TP, it would probably be good to involve them in the process somehow.

2 Likes

Agreed, although per above .well-known might actually not be the best fit for this :sweat_smile: – the .well-known RFC stipulates that well-known paths are always anchored to the domain root, whereas existing patterns for Python indices (and probably other languages) allow for multiple conceptually independent upload services to be present on the same domain.

(In other words there might be foo.corp/index/a/upload and foo.corp/index/b/upload, so starting from foo.corp/.well-known may not be super helpful to a discovering client. I’m going to work through this some more in the PEP draft as well.)

Presumably that would just mean that the .well-known/ data structure we use needs to cope with pointing to multiple sub-paths?

Yes, although I’m concerned that that kind of multi-pathed structure won’t scale super well to some private index host “shapes”:

  1. An index host that provides dozens (or even hundreds/thousands) of logically separate registries would need to present each in the .well-known structure, which means a lot of data pushed over the wire for a client that ultimately only needs to select just one of those registries.
  2. There’s also a privacy/confidentiality concern with the above: the existence of a registry on a given host isn’t necessarily public information, so having a single global discovery may disclose more information about e.g. registry names than users necessarily want.

I can think of some workarounds for these problems – for example the discovery mechanism could be sharded within .well-known and also “blinded” through hashing. For example, to discover Trusted Publishing endpoints for https://HOST/legacy/, we could support something (spitballing) like:

GET /.well-known/python-tp/af030c06750716b1b35852298fe852b90def13dcbd012a5fe5148470f1206bfc

i.e. sha256(b"/legacy/"). From there, the response could be (spitballing):

{
  "audience-endpoint": "/_/oidc/audience",
  "mint-token-endpoint": "/_/oidc/mint-token",
}

Any thoughts on the above? I think this would allow us to use .well-known/ while also addressing my concerns above :slightly_smiling_face:

That seems wholly reasonable to me.

1 Like

Cool! I’m working on the PEP draft now, I’ll hopefully have something public by EOD.

1 Like