New OIDC providers for Trusted Publishing

Due to a variety of events in recent years that I think this venue is not an appropriate place to recount, GitHub has lost a lot of the trust placed in it by the community. (GitLab shares many of the same issues.) I, like many other people, have been slowly migrating my infrastructure to Codeberg, which is a non-profit whose values and goals align more strongly with that of the open source community.

One obstacle to doing so is that Codeberg is not a supported Trusted Publisher on PyPI. If I were to migrate my CI infrastructure to Codeberg, I will have to switch to using API tokens; moreover, since I maintain dozens of individual packages, I will have to use a user-scoped API token, since organization-scoped tokens don’t seem to exist (or at least I couldn’t find a way to create one in the UI), and individually creating an API token for each package, including for workflows that publish a ~dozen different packages, is simply not practical.

Also, Forgejo Actions is a significantly less mature CI engine, which makes it ever more important to restrict the scope of packaging-related actions authorized on CI.

I would really like to use the Trusted Publisher workflow with PyPI and Codeberg. Is this something that the PyPI administrators would consider? If yes, what would be the technical steps to achieving it? If no, would it be possible to improve non Trusted Publisher flows, such as adding organization-scoped API tokens?

20 Likes

I’m similarly a root sysadmin for the OpenDev Collaboratory, and the open source CI/CD system we developed there (Zuul) is capable of OIDC as well. Our communities are responsible for around a thousand packages on PyPI, which is small compared to GitHub, but being able to assert provenance through Trusted Publishing would still be nice.

For now our release automation publishes detached OpenPGP signatures of all packages to a dedicated web site instead, ever since PyPI dropped support for publishing self-supplied signatures there. It’s good enough for our downstream consumers, but unfortunate that open source code hosting platforms are essentially second-class citizens next to proprietary commercial products like GitHub when it comes to these sorts of security features.

6 Likes

Docs for becoming a trusted publisher for PyPI are in the FAQ section of Internals and Technical Details - PyPI Docs


The most important aspect for me is that I can trust (via PyPI) that a package file was built from repo. GitHub, GitLab and GCS have demonstrated through time that their CD does this, and I’ve never heard of Codeberg or Zuul.

3 Likes

Codeberg e.V. is a European non-profit with over 1300 sponsoring members, maintaining a code forge with over 400k projects, over 250k users. As far as I know it is the largest code forge run by a non-profit organization, with some high-profile projects using it, including the Zig programming language for example. (Matter of factly Zig does publish PyPI packages via Trusted Publishers currently, although this isn’t my main motivation here.)


The most important aspect for me is […]

Are you saying this in your personal capacity or do you hold a formal position in the PyPI project?

GitHub, GitLab and GCS have demonstrated through time that their CD does this […]

Does PyPI currently accept Trusted Publisher authorizations from workflows executing on self-hosted GitHub Actions runners?

3 Likes

Not speaking authoritatively here, just my personal opinion, though I am a PyPI administrator so I’ve got some sense of what the PyPI admins might think :wink: . To be clear though, this is a quick look and not any sort of official Yes or No.

I believe the key requirement beyond the technical, is the reliability and security of a given OIDC provider. These providers effectively have the ability to release as any project, and thus their ability to secure and reliably operate the software and infrastructure is of upmost importance.

I’ve never looked at Codeberg before, though I have vaguely heard of Forgejo (which appears to be forked from Gitea, which I have heard of).

I have a number of concerns though from a quick search.

  • Codeberg looks pretty new (2019 it appears?), there’s nothing inherently wrong with that, but 7 years isn’t a large amount of time to be able to look back on how well they’ve handled issues in the past.
  • I can’t figure out how to actually contact Codeberg itself about any security issues that I might have. The codeberg docs suggest reporting security issues to Forgejo directly rather than to them, but that seems to presume the issue is related to Forgejo directly and isn’t an issue with the Codeberg instance specifically. Codeberg also seems to run a number of pieces of software that are not Forgejo, so it’s unclear how to handle security issues with the non Forgejo components. [1]
  • It appears that Codeberg has two different CI options they’re running. One is based on Forgejo Actions and the other is based on Woodpecker. The Woodpecker CI appears to be their primary CI system.
    • The Woodpecker option seems to be the primary option, as the Forgejo action one is billed as “very alpha”. As far as I can tell Woodpecker does not provide a machine identity to each CI execution like GitHub/GitLab does, and it only supports static secrets stored in a job. That eliminates Woodpecker as an option for Trusted Publishing on a technical basis before we can even look at non-technical basis.
    • Forgejo Actions do seem to support machine identity that we would need, but support for that landed about a month ago and has yet to be released, and Codeberg appears to track the stable release. So at the moment Forgejo actions on Codeberg also don’t provide what we need on a technical level, but will in the future.
  • Looking the Codeberg documentation on Forgejo actions, it says “Due to outstanding security issues and bus factor (we need more maintainers for our CI service), we are currently providing hosted Actions in limited fashion”, which does not inspire a lot of confidence in it.

So for me, my gut is that it feels like Codeberg is too new and hasn’t yet established themselves well enough to integrate, particularly given that the technical features we’d even need to do that integration don’t exist on Codeberg yet and have only existed in the software itself for about a month. The fact that I cannot find a way to contact Codeberg with security issues other than the general help@codeberg.org email address (which is marked as responses might take a long time), also does not inspire a lot of confidence in the security maturity of the org.

I think the mission here of Codeberg is great, and I assume things will mature over time, if for no other reason than some of these things are so new they don’t even exist yet :wink: .

This is an interesting statement though, because it suggests you don’t think that Forgejo Actions is mature enough to trust it with your own API tokens, but that seems like a smaller amount of trust from the PyPI side than integrating as a trusted publisher would be.

It’s not well documented, but something that’s non-obvious is that PyPI’s API tokens are actually quite powerful primitives to build this kind of system on top of externally to PyPI itself.

PyPI’s API tokens are Macaroons, and one of the big benefits of Macaroons is that if you possess a macaroon, you can further restrict that API token by adding additional “caveats” [2] to that API token, which creates what is effectively a sub token of the original one. This idea of generating sub tokens can recurse effectively infinitely [3], so if you create a sub token and give that out, then someone who has that sub token can create a sub sub token with further restrictions.

Creating these restricted sub tokens does not require talking to PyPI at all or doing any sort of network request, it’s entirely done locally, in memory. It’s honestly quite cool!

The only thing that the PyPI UI for creating tokens is able to do that you, as a person who possesses a token, isn’t able to do, is create a completely brand new token from scratch [4]. Limiting a token to a project in the PyPI UI is just taking a freshly minted API token that is bound to your user, and adding some caveats to it for you before handing it to you. In the case of a User scoped token, the caveat that we add is basically a no-op (it just asserts that the user that is linked to the macaroon in the PyPI DB hasn’t changed), so it’s effectively an “empty” token with full permissions of your user account to upload– but again you can add your own caveats locally to restrict it further!

We don’t support a ton of caveats currently, basically only 2 (sorta 3?) useful ones:

  • Expiration: Not Before/Not After, pretty basic caveat for time boxing a sub token’s lifetime.
  • Project Names: This caveat takes a list of project names, and the sub token is limited to only one of the projects named in that caveat.
  • Project IDs: Basically the same thing as Project Names, but instead of names it uses the UUID of the Project from PyPI’s database, main benefit is this prevents rebinding attacks where a project gets deleted and recreated by someone else. Not super relevant for a user token since a user token is also bound to only projects your user has, and I’m not sure that it’s possible for a user to get the UUID for a project other than by inspecting an existing token. [5]

Adding more to PyPI is pretty easy, you just implement a dataclass with a verify() method, and then once that lands PyPI would support that caveat type as well.

If you want to inspect a token or restrict it with additional caveats, there’s an unofficial library called pypitoken that makes that pretty easy to do.

A few random important tidbits about Macaroons:

  • The design of Macaroons prevents someone from removing existing caveats on a token, they can only append.
  • All caveats (even the same one repeated multiple times) are independently checked and they all must evaluate to True.
  • Caveats can only restrict the powers of a token, never increase them, so arbitrarily appending caveats is always safe and can only reduce the scope of a given token.
  • Macaroons further divides caveats into “first party” and “third party”, currently PyPI only supports first party (which means caveats that PyPI has to natively add support for). It’d be nice to support third party caveats too but it hasn’t been a priority [6].

So you can actually get pretty close to Trusted Publishing from an arbitrary platform without any support from PyPI itself:

  1. Create an account scoped token (either for your own user, or a robot account that has permission to upload to all projects).
  2. Create a web service that holds that account scoped token, that accepts OIDC authentication from Codeberg (or any other platform), verifies the claims in the JWT, and then takes the account scoped token and adds a caveat that creates a sub token limited to only the project (or projects) that the JWT claims give permission to do and returns it.
  3. Have the CI platform pass that token from (2) as the API token for the upload.

And that’s basically it :wink: this is pretty much exactly how trusted publishing works on PyPI, except PyPI gets to skip (1) because PyPI has the ability to mint arbitrary tokens for any project and some minor differences like on PyPI it will appear as if your user uploaded the project whereas PyPI has special support for treating Trusted Publishing as a user-less action.

The only thing you really lose out on is that upload provenance attestations require trusted publishing and are not supported from arbitrary uploads. You can of course have your web service publish their own upload attestations, but they won’t appear on PyPI so they’re not as useful. I’d love to find a way to reasonably support arbitrary sources for upload attestations, but that’s also not been a priority for me.

Anyways, Macaroons are great, and I love talking about them :wink: , but hopefully even if Codeberg isn’t yet ready for being trusted for trusted publishing, you can see how you can get pretty close using the primitives that PyPI already has!


  1. Ironically the security.txt on Codeberg says only to contact Forgejo for Forgejo issues, and to contact the administrators for issues specific to any individual instance for other issues– but I can’t figure out how to do that! ↩︎

  2. Just a fancy Macaroon word for restriction. ↩︎

  3. Although there are practical limits, such as the size of the API token, the number of unique caveat types that are supported, etc. ↩︎

  4. Macaroons start off with a secret known only to PyPI. Some implementations of Macaroons use a single secret to create a “omnipotent” token and then use caveats to restrict that to individual users, etc. That’s a tiny bit simpler and more “pure”, but has problems with revocation, since sub tokens can be minted at will, PyPI has no idea what sub tokens exist, so revocation is difficult and typically requires setting up some sort of revocation service. When we implemented this for PyPI we instead chose to create a fresh secret for each new API token we created, and strongly link that secret to an individual identity (either a user or a OIDC identity) and instead of an omnipotent token, it starts out limited to only what that identity can do. ↩︎

  5. This is used by trusted publishing to prevent rebinding attacks since the tokens minted for trusted publishing aren’t constrained by a user. ↩︎

  6. Third party caveats provide a way for a user to add a restriction to a token to say that the user also has to provide an additional (and specific) token from a third party service. This could be used to say, make a service that wouldn’t give the user that additional token unless it could find the files uploaded to a Code Forge’s releases with signatures or build provenance. ↩︎

12 Likes

@dstufft Thank you for the very long and comprehensive reply, I really appreciate the explanation!

I admit that I have not considered the amount of trust that PyPI places in the CI engine. In retrospect, it makes sense given the data available to PyPI, this is just not something I thought about as a consumer. (Can any CI engine really publish any project at all, not just any project with Trusted Publishing configured for that particular CI provider? I would have hoped that the latter reduces the blast radius at least somewhat, not that it fundamentally changes much.)

I broadly agree with your evaluation of Codeberg’s CI service. There are some things that surprise me (as an e.V. member and contributor but not someone with a formal position), like the lack of security@, which I will likely raise internally, but overall and with some knowledge of how it is operated, I would say this is more or less accurate. I think evolving the CI service in a direction where it would be stable and secure enough to consider Trusted Publisher support would be excellent in general, although I don’t think there is currently enough staffing to do that.

Thank you for bringing up the fact that the API tokens are Macaroons! I have a passing familiarity with Macaroons (I’ve never used them but I’ve skimmed the paper before); it should not be too difficult to write a bit of code to add the caveats I need. This will fulfill essentially all of my personal requirements for Codeberg/PyPI integration. While losing attestation is a bit unfortunate, I have to admit that I’ve never been able figure out how to verify the signatures, and I feel that the technology is not mature enough for widespread use anyway (as in: I don’t expect any of my downstream users to bother verifying attestation).

I will post the code I’ll use to add caveats here when I come up with it.

4 Likes

Correct. I would clarify Donald’s statement to say “These providers effectively have the ability to release as any project configured on PyPI to trust that specific provider”. What he’s saying is we (PyPI) trust an identity provider to sufficiently restrict what OIDC tokens it generates to the correct token consumers (similar to how you trust PyPI to not give API tokens that work for your account to other users).

An untrustworthy or vulnerable identity provider would be able to ‘forge’ it’s own OIDC tokens, which would essentially give it the ability to publish any project that trusts that identity provider, but a compromised Codeberg identity provider would not give it the ability to publish to projects configured to trust GitHub’s identity provider, etc. (This is because OIDC tokens are signed with keypairs specific to each identity provider, and these signatures are validated by PyPI prior to trusting the OIDC token).

4 Likes

Sorry, PyPI doesn’t allow a trusted publisher to publish for just anyone, they work as you’d expect, that when you setup trusted publishing for a project that it binds that project to only allow trusted publishing from a given CI provider (or more specifically, from a given OIDC provider, but that will generally map 1:1 with CI provider).

So on a technical level the trust isn’t much different between standard API tokens and trusted publishing. If a project is publishing from a given CI provider, that CI provider effectively can publish as all projects that choose to publish via that provider.

It’s more of a social (or maybe political is the right word) question. Integration with PyPI infers a certain amount of “this platform is trustworthy in the eyes of PyPI”, but more importantly if a CI provider ends up having issues and isn’t trustworthy, it puts PyPI in an awkward spot where we have to choose between distrusting that CI Provider (and breaking everyone using it) or just accepting/allowing the integration to continue even knowing it’s not safe. Of course the same problem exists with API tokens, but at that point it’s “the author has chosen to trust or distrust this provider” and not “PyPI has chosen to trust or distrust this provider”.

I’d have to think about it more if tying the provenance to trusted publishing means it’s also a technical problem too. I think the answer is sort of yes but also sort of no? All the attestations really need, from a technical level, is a trusted key to sign them. The hard part is trying to decide what key to trust, and where Trusted Publishing providers overlap with Sigstore Providers, we can just trust the same identity and it all just works. If we add a CI Provider that isn’t also trusted by sigstore, then we can’t turn that into a key pair we trust so you lose attestations anyways.

Since I last looked, it appears like Sigstore supports SPIFFE now, though it’s not clear if that’s for the public instance or not, so that may end up being a non factor?

If you find other caveats you’d like to add, feel free to open an issue or PR on Warehouse :slight_smile: Most of them are pretty easy to add, depending on what you want to assert against.

EDIT: Ahh @dustin just beat me :wink: that’s what I get for typing too much!

1 Like

Yeah, the security properties of the Trusted Publishers are clear now although it really wasn’t obvious while I was still using them as a consumer!

I am actually quite happy to hear that PyPI API tokens provide more-or-less the same degree of security as Trusted Publishers; there are important concerns around vendor lock-in (for example, npm’s API token policies mean you are essentially forced to use GitHub unless you’re willing to embed your account password and OTP key in the build machines, which is what I will have to do to exit GitHub) and being able to say “I will be responsible for the confidentiality of the publishing token and if something happens to the package that’s on me, not PyPI or [CI provder]” and use the token directly means these concerns are alleviated. It’s a tradeoff I want to be able to make even if PyPI had first-class support for more CI providers.

1 Like

There is two interesting points of difference between Trusted Publishing and Sigstore, which both can use machine identity via OIDC to do something “trusted” bound to that identity, and thus are at risk if that OIDC provider is compromised.

A sigstore certificate is bound to a given machine identity (essentially the claims in the OIDC token), and verifying an arbitrary signature doesn’t tell you anything useful. A Sigstore signature is only useful if you have something that is configured to say “I trust this machine identity”. So in theory there’s little risk for sigstore supporting arbitrary OIDC providers, because it’s on the consumers to configure their systems to trust the correct machine identity.

For Trusted Publishing, PyPI is the thing that trusts the correct machine identity, and that machine identity is largely opaque to end users. They can kind of see it through attestations, but that’s not actually useful for something like pip install $project, because while pip could, in theory, mechanically verify those machine identities somehow (either through attestations or another process), it wouldn’t know what machine identities are trusted to publish for $project without either end users configuring them or relying on PyPI to tell it (and if they trust PyPI to give them the correct identity, why not trust PyPI to give them the correct artifacts to begin with).

Whenever sigstore creates a certificate (that can be used for signing), it puts it in a CT log, and clients are supposed to verify inclusion in that log before trusting them. So sigstore has some additional protection from a malicious or compromised OIDC provider in that they can’t covertly generate sigstore certificates that are valid, they have to publish them to the public CT log, which at least makes the hypothetical attack public (and people could run monitors on that CT log to make sure that nobody is minting sigstore certificates for them).

I could maybe see a world where we allow arbitrary OIDC providers to be configured in PyPI (probably using something like SPIFFE, though tbh I just came across SPIFFE so I’d have to look more into it, but the fact sigstore supports it is a positive), but with the stipulation that those providers have to generate upload attestations signed with the same identity via sigstore, and maybe that installers verify them? [1]

From a technical level, that’s strictly better than plain API tokens, since it becomes impossible to upload without an immutable public record of the certificate used, and that might relax the social constraints enough to make that PyPI feels comfortable allowing it (I don’t personally have a strong opinion, and I have no idea what the rest of the admins would feel since I haven’t talked to them about it).

There’s also the concept of Binary Transparency that could also possibly play a role here, but there’s no defined answer, this is all a new frontier so any path that we pick to go down would involve figuring out what we want to protect against and what the technical and social constraints are there.


  1. This wouldn’t allow installers to know what identity should be trusted, but it means that you wouldn’t be able to upload, via trusted publishing, a package covertly and have installers just arbitrarily use it. ↩︎

2 Likes

I can pretty confidently say that PyPI will never require someone to use a specific CI provider or other commercial entity, at least as long as I have any say in the matter, and I’m pretty sure (but I don’t want to put words in their mouths) that all of the other PyPI admins feel the same.

When we design things like Trusted Publishing or attestations, we try pretty hard to either design them so anyone can implement them or we make sure that they are, as much as possible, optional add ons that don’t provide a large negative impact to people who don’t want to use them. That’s part of why Trusted Publishing is optional [1] and why Trusted Publishing and Attestations don’t give you any sort of “hey this project has extra security” or some sort of green checkmark or something, because those things provide more community pressure to use the platforms that getting those things require, and we don’t want to force people into that [2]. You can access that information, but it’s buried pretty deeply in the UX and it’s presented pretty neutrally (example).

Something like sigstore I could maybe see us requiring (but honestly even then, I’m personally a little hesitant to do that on) since they are an OSS project backed by the OpenSSF under the Linux Foundation.

Partially that reluctance to require a specific platform (or set of platforms) also comes from the fact PyPI’s been around for a long time (over 20 years now!). If we had mandated a specific provider 15 years ago, it likely would have been SourceForge. If we had done that, that would have obviously been a mistake that we would have had to spend effort (as a community) to migrate away from. Who’s to say what providers are going to be dominate or even still exist in 20 more years? I personally have no idea!

The Python packaging community kind of designs for “decades” not for “right now”. For PyPI this tends to mean that we don’t expose or rely on things that we can’t replace without some sort of large, ecosystem wide migration, but allowing individual maintainers to choose to use them is fine, as long as it doesn’t impose that decision on end users. [3]

I’ve long wanted to do something like add caveats that allow you to attenuate a token to be targeting a specific version of a project, or a specific filename, or even a specific hash. In theory a tool like twine could automatically attenuate the token down to the specific hash so that the only useful thing you could do with the token that goes out on the wire is upload the very specific artifact that you were already trying to upload. The fly.io blog post talks about making a token that’s so safe that you could email it to someone– and that would actually achieve that I think!

I’ve also wanted to get support for third party caveats too, because I think they add another layer of power ontop of macaroons. For instance, in the “External Trusted Publishing” system you’re making, obviously the API token that your service holds has a lot of power, so if you wanted to stand that up for other people they’d all have to trust you.

Instead, with third party caveats, that system could be designed so that it doesn’t need to have a PyPI API token at all. Individual users could add a third party caveat to their own tokens that says “this token is only valid if you get another token from XYZ system”. Then that system could accept the OIDC machine identity from a CI Provider, verify the claims, and mint that second token, and when you upload the artifact to PyPI you have to include both and both would be verified.

Doing that… I think you could, in theory, just treat the token with that third party caveat as a non secret, and it could even be just committed to the repository and be fully public, because it’s only valid with the second token that comes from XYZ system that verifies the OIDC identity– but XYZ system itself never has the ability to upload as you!

Of course if you do keep it secret, then that makes an attacker’s job even harder, because they would need to compromise both the secret store for the PyPI API token and the XYZ service (or the OIDC Provider) to get a working token.

Sorry, I’m a nerd and I love the design behind Macaroons, so I can nerd out about them all day!


  1. Plus it came later, so mandating it after the fact is hard :wink: ↩︎

  2. Plus the UX around green check marks is horrid. ↩︎

  3. This is partially why we don’t support federated auth! We used to a long time ago, but that got ripped out because some of the OpenID providers we supported went away, and we decided that relying on a third party as part of our public “interface” like that wasn’t a sustainable practice long term. ↩︎

5 Likes

Personally, as an employee (of a company which doesn’t really interact with open source, except for usage), and as developer of open source projects. I don’t represent any PyPI projects.


I view Sigstore attestations with a reference to the specific VCS commit as fairly trustworthy. It’s usually easy to verify the exact tree used to build a package (there are some exceptions, such as separate builder projects).

Has anyone thought of Codeberg (I’m actually one of its sponsoring members) running their own Python packages index? They could be some funding/hosting in Europe/EU available for this, too.

Each Codeberg user/organization already has a built-in Python packaging index. Unfortunately, the ecosystem support for this is still quite poor (--extra-index-url does not cut it).

Are you saying --extra-index-url doesn’t work with Codeberg, or are you saying that --extra-index-url isn’t a great UX?

Can’t speak for Catherine, but I’d be saying this one. Additional index URLs are treated as equal priority mirrors, not alternatives, so it opens up a range of exploits if any of the indexes are not under your control/trustworthy (“dependency confusion” being the main risk, but there are others).

For Azure Artifacts, we made it so that your private package index is able to “upstream” to others - essentially, pull in new packages (and/or versions, optionally) from those if your own feed doesn’t have it yet. These are checked in priority order, not all mixed together, so you can reference a single index URL and get multiple, but safely.

If Codeberg doesn’t have a feature like this, then doing it in the client (a.k.a. --extra-index-url for pip) is your only choice.

3 Likes

Could perhaps you point out the corresponding Codeberg docs?

Yes, this is a big problem. The other big problem is that (as far as I know) you can only provide one, so if you need to pull in packages from two additional indexes you’re out of luck.

Considering how much storage bloat this can result in if you e.g. need a few Numpy binary wheels, I doubt Codeberg (which does not use cloud storage and relies entirely on colo servers with on-prem storage) will be willing to do this.

1 Like
1 Like

That’s not the case, --extra-index-url can be specified multiple times.

You can have a “layered” index, where you only host the files you need locally, and you simply re-present the upstream index on demand. There’s existing index servers (devpi and simpleindex) that do this.

Having said this, I agree that --extra-index-url isn’t an ideal UI. If you can choose your client, uv has better options, but if you can’t, then without some sort of standard (which doesn’t exist yet), --extra-index-url is the lowest common denominator you need to deal with for now.

1 Like