Due to a variety of events in recent years that I think this venue is not an appropriate place to recount, GitHub has lost a lot of the trust placed in it by the community. (GitLab shares many of the same issues.) I, like many other people, have been slowly migrating my infrastructure to Codeberg, which is a non-profit whose values and goals align more strongly with that of the open source community.
One obstacle to doing so is that Codeberg is not a supported Trusted Publisher on PyPI. If I were to migrate my CI infrastructure to Codeberg, I will have to switch to using API tokens; moreover, since I maintain dozens of individual packages, I will have to use a user-scoped API token, since organization-scoped tokens donât seem to exist (or at least I couldnât find a way to create one in the UI), and individually creating an API token for each package, including for workflows that publish a ~dozen different packages, is simply not practical.
Also, Forgejo Actions is a significantly less mature CI engine, which makes it ever more important to restrict the scope of packaging-related actions authorized on CI.
I would really like to use the Trusted Publisher workflow with PyPI and Codeberg. Is this something that the PyPI administrators would consider? If yes, what would be the technical steps to achieving it? If no, would it be possible to improve non Trusted Publisher flows, such as adding organization-scoped API tokens?
Iâm similarly a root sysadmin for the OpenDev Collaboratory, and the open source CI/CD system we developed there (Zuul) is capable of OIDC as well. Our communities are responsible for around a thousand packages on PyPI, which is small compared to GitHub, but being able to assert provenance through Trusted Publishing would still be nice.
For now our release automation publishes detached OpenPGP signatures of all packages to a dedicated web site instead, ever since PyPI dropped support for publishing self-supplied signatures there. Itâs good enough for our downstream consumers, but unfortunate that open source code hosting platforms are essentially second-class citizens next to proprietary commercial products like GitHub when it comes to these sorts of security features.
The most important aspect for me is that I can trust (via PyPI) that a package file was built from repo. GitHub, GitLab and GCS have demonstrated through time that their CD does this, and Iâve never heard of Codeberg or Zuul.
Codeberg e.V. is a European non-profit with over 1300 sponsoring members, maintaining a code forge with over 400k projects, over 250k users. As far as I know it is the largest code forge run by a non-profit organization, with some high-profile projects using it, including the Zig programming language for example. (Matter of factly Zig does publish PyPI packages via Trusted Publishers currently, although this isnât my main motivation here.)
The most important aspect for me is [âŚ]
Are you saying this in your personal capacity or do you hold a formal position in the PyPI project?
GitHub, GitLab and GCS have demonstrated through time that their CD does this [âŚ]
Does PyPI currently accept Trusted Publisher authorizations from workflows executing on self-hosted GitHub Actions runners?
Not speaking authoritatively here, just my personal opinion, though I am a PyPI administrator so Iâve got some sense of what the PyPI admins might think . To be clear though, this is a quick look and not any sort of official Yes or No.
I believe the key requirement beyond the technical, is the reliability and security of a given OIDC provider. These providers effectively have the ability to release as any project, and thus their ability to secure and reliably operate the software and infrastructure is of upmost importance.
Iâve never looked at Codeberg before, though I have vaguely heard of Forgejo (which appears to be forked from Gitea, which I have heard of).
I have a number of concerns though from a quick search.
Codeberg looks pretty new (2019 it appears?), thereâs nothing inherently wrong with that, but 7 years isnât a large amount of time to be able to look back on how well theyâve handled issues in the past.
I canât figure out how to actually contact Codeberg itself about any security issues that I might have. The codeberg docs suggest reporting security issues to Forgejo directly rather than to them, but that seems to presume the issue is related to Forgejo directly and isnât an issue with the Codeberg instance specifically. Codeberg also seems to run a number of pieces of software that are not Forgejo, so itâs unclear how to handle security issues with the non Forgejo components. [1]
It appears that Codeberg has two different CI options theyâre running. One is based on Forgejo Actions and the other is based on Woodpecker. The Woodpecker CI appears to be their primary CI system.
The Woodpecker option seems to be the primary option, as the Forgejo action one is billed as âvery alphaâ. As far as I can tell Woodpecker does not provide a machine identity to each CI execution like GitHub/GitLab does, and it only supports static secrets stored in a job. That eliminates Woodpecker as an option for Trusted Publishing on a technical basis before we can even look at non-technical basis.
Forgejo Actions do seem to support machine identity that we would need, but support for that landed about a month ago and has yet to be released, and Codeberg appears to track the stable release. So at the moment Forgejo actions on Codeberg also donât provide what we need on a technical level, but will in the future.
Looking the Codeberg documentation on Forgejo actions, it says âDue to outstanding security issues and bus factor (we need more maintainers for our CI service), we are currently providing hosted Actions in limited fashionâ, which does not inspire a lot of confidence in it.
So for me, my gut is that it feels like Codeberg is too new and hasnât yet established themselves well enough to integrate, particularly given that the technical features weâd even need to do that integration donât exist on Codeberg yet and have only existed in the software itself for about a month. The fact that I cannot find a way to contact Codeberg with security issues other than the general help@codeberg.org email address (which is marked as responses might take a long time), also does not inspire a lot of confidence in the security maturity of the org.
I think the mission here of Codeberg is great, and I assume things will mature over time, if for no other reason than some of these things are so new they donât even exist yet .
This is an interesting statement though, because it suggests you donât think that Forgejo Actions is mature enough to trust it with your own API tokens, but that seems like a smaller amount of trust from the PyPI side than integrating as a trusted publisher would be.
Itâs not well documented, but something thatâs non-obvious is that PyPIâs API tokens are actually quite powerful primitives to build this kind of system on top of externally to PyPI itself.
PyPIâs API tokens are Macaroons, and one of the big benefits of Macaroons is that if you possess a macaroon, you can further restrict that API token by adding additional âcaveatsâ [2] to that API token, which creates what is effectively a sub token of the original one. This idea of generating sub tokens can recurse effectively infinitely [3], so if you create a sub token and give that out, then someone who has that sub token can create a sub sub token with further restrictions.
Creating these restricted sub tokens does not require talking to PyPI at all or doing any sort of network request, itâs entirely done locally, in memory. Itâs honestly quite cool!
The only thing that the PyPI UI for creating tokens is able to do that you, as a person who possesses a token, isnât able to do, is create a completely brand new token from scratch [4]. Limiting a token to a project in the PyPI UI is just taking a freshly minted API token that is bound to your user, and adding some caveats to it for you before handing it to you. In the case of a User scoped token, the caveat that we add is basically a no-op (it just asserts that the user that is linked to the macaroon in the PyPI DB hasnât changed), so itâs effectively an âemptyâ token with full permissions of your user account to uploadâ but again you can add your own caveats locally to restrict it further!
We donât support a ton of caveats currently, basically only 2 (sorta 3?) useful ones:
Expiration: Not Before/Not After, pretty basic caveat for time boxing a sub tokenâs lifetime.
Project Names: This caveat takes a list of project names, and the sub token is limited to only one of the projects named in that caveat.
Project IDs: Basically the same thing as Project Names, but instead of names it uses the UUID of the Project from PyPIâs database, main benefit is this prevents rebinding attacks where a project gets deleted and recreated by someone else. Not super relevant for a user token since a user token is also bound to only projects your user has, and Iâm not sure that itâs possible for a user to get the UUID for a project other than by inspecting an existing token. [5]
Adding more to PyPI is pretty easy, you just implement a dataclass with a verify() method, and then once that lands PyPI would support that caveat type as well.
If you want to inspect a token or restrict it with additional caveats, thereâs an unofficial library called pypitoken that makes that pretty easy to do.
A few random important tidbits about Macaroons:
The design of Macaroons prevents someone from removing existing caveats on a token, they can only append.
All caveats (even the same one repeated multiple times) are independently checked and they all must evaluate to True.
Caveats can only restrict the powers of a token, never increase them, so arbitrarily appending caveats is always safe and can only reduce the scope of a given token.
Macaroons further divides caveats into âfirst partyâ and âthird partyâ, currently PyPI only supports first party (which means caveats that PyPI has to natively add support for). Itâd be nice to support third party caveats too but it hasnât been a priority [6].
So you can actually get pretty close to Trusted Publishing from an arbitrary platform without any support from PyPI itself:
Create an account scoped token (either for your own user, or a robot account that has permission to upload to all projects).
Create a web service that holds that account scoped token, that accepts OIDC authentication from Codeberg (or any other platform), verifies the claims in the JWT, and then takes the account scoped token and adds a caveat that creates a sub token limited to only the project (or projects) that the JWT claims give permission to do and returns it.
Have the CI platform pass that token from (2) as the API token for the upload.
And thatâs basically it this is pretty much exactly how trusted publishing works on PyPI, except PyPI gets to skip (1) because PyPI has the ability to mint arbitrary tokens for any project and some minor differences like on PyPI it will appear as if your user uploaded the project whereas PyPI has special support for treating Trusted Publishing as a user-less action.
The only thing you really lose out on is that upload provenance attestations require trusted publishing and are not supported from arbitrary uploads. You can of course have your web service publish their own upload attestations, but they wonât appear on PyPI so theyâre not as useful. Iâd love to find a way to reasonably support arbitrary sources for upload attestations, but thatâs also not been a priority for me.
Anyways, Macaroons are great, and I love talking about them , but hopefully even if Codeberg isnât yet ready for being trusted for trusted publishing, you can see how you can get pretty close using the primitives that PyPI already has!
Ironically the security.txt on Codeberg says only to contact Forgejo for Forgejo issues, and to contact the administrators for issues specific to any individual instance for other issuesâ but I canât figure out how to do that! âŠď¸
Just a fancy Macaroon word for restriction. âŠď¸
Although there are practical limits, such as the size of the API token, the number of unique caveat types that are supported, etc. âŠď¸
Macaroons start off with a secret known only to PyPI. Some implementations of Macaroons use a single secret to create a âomnipotentâ token and then use caveats to restrict that to individual users, etc. Thatâs a tiny bit simpler and more âpureâ, but has problems with revocation, since sub tokens can be minted at will, PyPI has no idea what sub tokens exist, so revocation is difficult and typically requires setting up some sort of revocation service. When we implemented this for PyPI we instead chose to create a fresh secret for each new API token we created, and strongly link that secret to an individual identity (either a user or a OIDC identity) and instead of an omnipotent token, it starts out limited to only what that identity can do. âŠď¸
This is used by trusted publishing to prevent rebinding attacks since the tokens minted for trusted publishing arenât constrained by a user. âŠď¸
Third party caveats provide a way for a user to add a restriction to a token to say that the user also has to provide an additional (and specific) token from a third party service. This could be used to say, make a service that wouldnât give the user that additional token unless it could find the files uploaded to a Code Forgeâs releases with signatures or build provenance. âŠď¸
@dstufft Thank you for the very long and comprehensive reply, I really appreciate the explanation!
I admit that I have not considered the amount of trust that PyPI places in the CI engine. In retrospect, it makes sense given the data available to PyPI, this is just not something I thought about as a consumer. (Can any CI engine really publish any project at all, not just any project with Trusted Publishing configured for that particular CI provider? I would have hoped that the latter reduces the blast radius at least somewhat, not that it fundamentally changes much.)
I broadly agree with your evaluation of Codebergâs CI service. There are some things that surprise me (as an e.V. member and contributor but not someone with a formal position), like the lack of security@, which I will likely raise internally, but overall and with some knowledge of how it is operated, I would say this is more or less accurate. I think evolving the CI service in a direction where it would be stable and secure enough to consider Trusted Publisher support would be excellent in general, although I donât think there is currently enough staffing to do that.
Thank you for bringing up the fact that the API tokens are Macaroons! I have a passing familiarity with Macaroons (Iâve never used them but Iâve skimmed the paper before); it should not be too difficult to write a bit of code to add the caveats I need. This will fulfill essentially all of my personal requirements for Codeberg/PyPI integration. While losing attestation is a bit unfortunate, I have to admit that Iâve never been able figure out how to verify the signatures, and I feel that the technology is not mature enough for widespread use anyway (as in: I donât expect any of my downstream users to bother verifying attestation).
I will post the code Iâll use to add caveats here when I come up with it.
Correct. I would clarify Donaldâs statement to say âThese providers effectively have the ability to release as any project configured on PyPI to trust that specific providerâ. What heâs saying is we (PyPI) trust an identity provider to sufficiently restrict what OIDC tokens it generates to the correct token consumers (similar to how you trust PyPI to not give API tokens that work for your account to other users).
An untrustworthy or vulnerable identity provider would be able to âforgeâ itâs own OIDC tokens, which would essentially give it the ability to publish any project that trusts that identity provider, but a compromised Codeberg identity provider would not give it the ability to publish to projects configured to trust GitHubâs identity provider, etc. (This is because OIDC tokens are signed with keypairs specific to each identity provider, and these signatures are validated by PyPI prior to trusting the OIDC token).
Sorry, PyPI doesnât allow a trusted publisher to publish for just anyone, they work as youâd expect, that when you setup trusted publishing for a project that it binds that project to only allow trusted publishing from a given CI provider (or more specifically, from a given OIDC provider, but that will generally map 1:1 with CI provider).
So on a technical level the trust isnât much different between standard API tokens and trusted publishing. If a project is publishing from a given CI provider, that CI provider effectively can publish as all projects that choose to publish via that provider.
Itâs more of a social (or maybe political is the right word) question. Integration with PyPI infers a certain amount of âthis platform is trustworthy in the eyes of PyPIâ, but more importantly if a CI provider ends up having issues and isnât trustworthy, it puts PyPI in an awkward spot where we have to choose between distrusting that CI Provider (and breaking everyone using it) or just accepting/allowing the integration to continue even knowing itâs not safe. Of course the same problem exists with API tokens, but at that point itâs âthe author has chosen to trust or distrust this providerâ and not âPyPI has chosen to trust or distrust this providerâ.
Iâd have to think about it more if tying the provenance to trusted publishing means itâs also a technical problem too. I think the answer is sort of yes but also sort of no? All the attestations really need, from a technical level, is a trusted key to sign them. The hard part is trying to decide what key to trust, and where Trusted Publishing providers overlap with Sigstore Providers, we can just trust the same identity and it all just works. If we add a CI Provider that isnât also trusted by sigstore, then we canât turn that into a key pair we trust so you lose attestations anyways.
Since I last looked, it appears like Sigstore supports SPIFFE now, though itâs not clear if thatâs for the public instance or not, so that may end up being a non factor?
If you find other caveats youâd like to add, feel free to open an issue or PR on Warehouse Most of them are pretty easy to add, depending on what you want to assert against.
EDIT: Ahh @dustin just beat me thatâs what I get for typing too much!
Yeah, the security properties of the Trusted Publishers are clear now although it really wasnât obvious while I was still using them as a consumer!
I am actually quite happy to hear that PyPI API tokens provide more-or-less the same degree of security as Trusted Publishers; there are important concerns around vendor lock-in (for example, npmâs API token policies mean you are essentially forced to use GitHub unless youâre willing to embed your account password and OTP key in the build machines, which is what I will have to do to exit GitHub) and being able to say âI will be responsible for the confidentiality of the publishing token and if something happens to the package thatâs on me, not PyPI or [CI provder]â and use the token directly means these concerns are alleviated. Itâs a tradeoff I want to be able to make even if PyPI had first-class support for more CI providers.
There is two interesting points of difference between Trusted Publishing and Sigstore, which both can use machine identity via OIDC to do something âtrustedâ bound to that identity, and thus are at risk if that OIDC provider is compromised.
A sigstore certificate is bound to a given machine identity (essentially the claims in the OIDC token), and verifying an arbitrary signature doesnât tell you anything useful. A Sigstore signature is only useful if you have something that is configured to say âI trust this machine identityâ. So in theory thereâs little risk for sigstore supporting arbitrary OIDC providers, because itâs on the consumers to configure their systems to trust the correct machine identity.
For Trusted Publishing, PyPI is the thing that trusts the correct machine identity, and that machine identity is largely opaque to end users. They can kind of see it through attestations, but thatâs not actually useful for something like pip install $project, because while pip could, in theory, mechanically verify those machine identities somehow (either through attestations or another process), it wouldnât know what machine identities are trusted to publish for $project without either end users configuring them or relying on PyPI to tell it (and if they trust PyPI to give them the correct identity, why not trust PyPI to give them the correct artifacts to begin with).
Whenever sigstore creates a certificate (that can be used for signing), it puts it in a CT log, and clients are supposed to verify inclusion in that log before trusting them. So sigstore has some additional protection from a malicious or compromised OIDC provider in that they canât covertly generate sigstore certificates that are valid, they have to publish them to the public CT log, which at least makes the hypothetical attack public (and people could run monitors on that CT log to make sure that nobody is minting sigstore certificates for them).
I could maybe see a world where we allow arbitrary OIDC providers to be configured in PyPI (probably using something like SPIFFE, though tbh I just came across SPIFFE so Iâd have to look more into it, but the fact sigstore supports it is a positive), but with the stipulation that those providers have to generate upload attestations signed with the same identity via sigstore, and maybe that installers verify them? [1]
From a technical level, thatâs strictly better than plain API tokens, since it becomes impossible to upload without an immutable public record of the certificate used, and that might relax the social constraints enough to make that PyPI feels comfortable allowing it (I donât personally have a strong opinion, and I have no idea what the rest of the admins would feel since I havenât talked to them about it).
Thereâs also the concept of Binary Transparency that could also possibly play a role here, but thereâs no defined answer, this is all a new frontier so any path that we pick to go down would involve figuring out what we want to protect against and what the technical and social constraints are there.
This wouldnât allow installers to know what identity should be trusted, but it means that you wouldnât be able to upload, via trusted publishing, a package covertly and have installers just arbitrarily use it. âŠď¸
I can pretty confidently say that PyPI will never require someone to use a specific CI provider or other commercial entity, at least as long as I have any say in the matter, and Iâm pretty sure (but I donât want to put words in their mouths) that all of the other PyPI admins feel the same.
When we design things like Trusted Publishing or attestations, we try pretty hard to either design them so anyone can implement them or we make sure that they are, as much as possible, optional add ons that donât provide a large negative impact to people who donât want to use them. Thatâs part of why Trusted Publishing is optional [1] and why Trusted Publishing and Attestations donât give you any sort of âhey this project has extra securityâ or some sort of green checkmark or something, because those things provide more community pressure to use the platforms that getting those things require, and we donât want to force people into that [2]. You can access that information, but itâs buried pretty deeply in the UX and itâs presented pretty neutrally (example).
Something like sigstore I could maybe see us requiring (but honestly even then, Iâm personally a little hesitant to do that on) since they are an OSS project backed by the OpenSSF under the Linux Foundation.
Partially that reluctance to require a specific platform (or set of platforms) also comes from the fact PyPIâs been around for a long time (over 20 years now!). If we had mandated a specific provider 15 years ago, it likely would have been SourceForge. If we had done that, that would have obviously been a mistake that we would have had to spend effort (as a community) to migrate away from. Whoâs to say what providers are going to be dominate or even still exist in 20 more years? I personally have no idea!
The Python packaging community kind of designs for âdecadesâ not for âright nowâ. For PyPI this tends to mean that we donât expose or rely on things that we canât replace without some sort of large, ecosystem wide migration, but allowing individual maintainers to choose to use them is fine, as long as it doesnât impose that decision on end users. [3]
Iâve long wanted to do something like add caveats that allow you to attenuate a token to be targeting a specific version of a project, or a specific filename, or even a specific hash. In theory a tool like twine could automatically attenuate the token down to the specific hash so that the only useful thing you could do with the token that goes out on the wire is upload the very specific artifact that you were already trying to upload. The fly.io blog post talks about making a token thatâs so safe that you could email it to someoneâ and that would actually achieve that I think!
Iâve also wanted to get support for third party caveats too, because I think they add another layer of power ontop of macaroons. For instance, in the âExternal Trusted Publishingâ system youâre making, obviously the API token that your service holds has a lot of power, so if you wanted to stand that up for other people theyâd all have to trust you.
Instead, with third party caveats, that system could be designed so that it doesnât need to have a PyPI API token at all. Individual users could add a third party caveat to their own tokens that says âthis token is only valid if you get another token from XYZ systemâ. Then that system could accept the OIDC machine identity from a CI Provider, verify the claims, and mint that second token, and when you upload the artifact to PyPI you have to include both and both would be verified.
Doing that⌠I think you could, in theory, just treat the token with that third party caveat as a non secret, and it could even be just committed to the repository and be fully public, because itâs only valid with the second token that comes from XYZ system that verifies the OIDC identityâ but XYZ system itself never has the ability to upload as you!
Of course if you do keep it secret, then that makes an attackerâs job even harder, because they would need to compromise both the secret store for the PyPI API token and the XYZ service (or the OIDC Provider) to get a working token.
Sorry, Iâm a nerd and I love the design behind Macaroons, so I can nerd out about them all day!
Plus it came later, so mandating it after the fact is hard âŠď¸
Plus the UX around green check marks is horrid. âŠď¸
This is partially why we donât support federated auth! We used to a long time ago, but that got ripped out because some of the OpenID providers we supported went away, and we decided that relying on a third party as part of our public âinterfaceâ like that wasnât a sustainable practice long term. âŠď¸
Personally, as an employee (of a company which doesnât really interact with open source, except for usage), and as developer of open source projects. I donât represent any PyPI projects.
I view Sigstore attestations with a reference to the specific VCS commit as fairly trustworthy. Itâs usually easy to verify the exact tree used to build a package (there are some exceptions, such as separate builder projects).
Has anyone thought of Codeberg (Iâm actually one of its sponsoring members) running their own Python packages index? They could be some funding/hosting in Europe/EU available for this, too.
Each Codeberg user/organization already has a built-in Python packaging index. Unfortunately, the ecosystem support for this is still quite poor (--extra-index-url does not cut it).
Canât speak for Catherine, but Iâd be saying this one. Additional index URLs are treated as equal priority mirrors, not alternatives, so it opens up a range of exploits if any of the indexes are not under your control/trustworthy (âdependency confusionâ being the main risk, but there are others).
For Azure Artifacts, we made it so that your private package index is able to âupstreamâ to others - essentially, pull in new packages (and/or versions, optionally) from those if your own feed doesnât have it yet. These are checked in priority order, not all mixed together, so you can reference a single index URL and get multiple, but safely.
If Codeberg doesnât have a feature like this, then doing it in the client (a.k.a. --extra-index-url for pip) is your only choice.
Yes, this is a big problem. The other big problem is that (as far as I know) you can only provide one, so if you need to pull in packages from two additional indexes youâre out of luck.
Considering how much storage bloat this can result in if you e.g. need a few Numpy binary wheels, I doubt Codeberg (which does not use cloud storage and relies entirely on colo servers with on-prem storage) will be willing to do this.
Thatâs not the case, --extra-index-url can be specified multiple times.
You can have a âlayeredâ index, where you only host the files you need locally, and you simply re-present the upstream index on demand. Thereâs existing index servers (devpi and simpleindex) that do this.
Having said this, I agree that --extra-index-url isnât an ideal UI. If you can choose your client, uv has better options, but if you canât, then without some sort of standard (which doesnât exist yet), --extra-index-url is the lowest common denominator you need to deal with for now.