Pre-PEP: Exposing Trusted Publisher provenance on PyPI

sethmlarson · January 3, 2024, 4:30pm

Thanks so much @woodruffw for opening this discussion!

So others know how I’m thinking of this proposal in particular, I am conceptualizing this more as “Trusted Publisher receipts”. PyPI already verifies this same information w/ Trusted Publishers and there are useful properties we can take advantage of today and build on by exposing that same information in a consumer-verifiable way. This shouldn’t exclude the creation of other mechanisms of verifying the integrity of a Python distribution, the end-goal isn’t to have an integrity mechanism that works for all use-cases or to push everyone to use Trusted Publishers.

I provided reviews already to get this proposal into its current state, so I’ll only copy things I think are important:

I don’t think we should copy the NPM publish provenance UI, I believe the framing there presents itself as providing integrity of the source commit which is true, but mostly for the workflow and not necessarily the package source code (you need to review the workflow itself to make that determination). We don’t want to give the indication that publish provenance means you can review the git tags safely without also reviewing the publishing workflow.
Will and I have discussed the build provenance point raised by @davidism, I believe this proposal can be gracefully expanded to also include build provenance when that is defined for Trusted Publisher platforms PyPI supports.

An aside, to show that I am thinking of the non-OIDC build integrity case (but don’t have the spoons to work on it now), there are some primitives getting worked on that could be used together to provide widespread build integrity without needing public source code or a platforms with a PKI team (but doesn’t have the properties described in this proposal, like linking source repository to an artifact). The primitives I’m thinking about are build reproducibility (either byte-for-byte or semantically equivalent) combined with third-party observations about releases/distributions on PyPI. This approach would be a completely different route with more dependencies and involved parties than this proposal, this section is only to show that this use-case is being thought about by someone.

dustin · January 3, 2024, 7:27pm

Just curious, why?

steve.dower · January 3, 2024, 8:25pm

We’re pretty flexible about letting teams set up their own repositories, and then reclaim control/oversight at the build and publish stage. So people can use Azure DevOps or GitHub public or GitHub enterprise however they want, and have as many or as few repos as their project needs, provided the release builds and artifacts flow through our central pipeline.

For publishing to public repositories, this scans, archives, and stores an auditable log of the sources/build/approvers/etc. that led to the package being published, and then pushes the result to PyPI using a user-wide access token on a single account. We already have internal authentication for pushing packages to the pipeline, so we’re not concerned about PyPI auth at that stage.

So OIDC isn’t relevant because we basically have our own internal equivalent that does a few additional steps we care about. There’s been some consideration of publishing package hashes as well, which we could automate without our publishers having to change anything, and we can certainly verify packages “from out in the wild” against our own archive.

dustin · January 3, 2024, 9:24pm

Sorry, when you said “actively discouraged” I thought you meant that OIDC is prohibited in some way, but it sounds like if a team wanted to use the Azure or GitHub Actions OIDC identity to sign things or publish provenance, they wouldn’t actually be discouraged from doing this as long as they still publish through your centralized pipeline, right?

steve.dower · January 3, 2024, 9:32pm

It’s an either/or, not both, so yes they would be discouraged from publishing directly to PyPI.^[1] We want them to “publish” to our internal service, which will then do the publish. From PyPI’s POV, all our packages will come from the same account, even though a variety of different teams may be producing them.

We also discourage using their own accounts or tokens to publish to PyPI, though that was historically how we set it up. It’s not OIDC specifically that we are avoiding. ↩︎

LtWorf · January 4, 2024, 10:06am

Personally I’m not very excited about this until there is a way to self host.

Seeing how it went with 2FA, supported, compulsory for some projects, compulsory, and it all seemed rushed and hacked, since a token was needed and twine still has no option to pass a token… a token is a username for some reason.

If you want to accept stuff signed by sigstore, I think you should allow signatures. They weren’t used because they were useless. Try doing like github and allowing people to upload their public key, and then show a “verified” badge. If you really hate pgp, use some other method. But the way I see it, this will basically be compulsory in a few years, and a github account will be required to publish on pypi.

Of course one can just set up github to automatically mirror whatever actual thing they use, so there would be no extra security over whatever the self hosted instance provides.

barry-scott · January 4, 2024, 11:02am

Yes that concerns me as well.

Is it right that the workflow must becomes commit → CI → published?
I really want to be able to have a manual step before publish where I do QA.

kpfleming · January 4, 2024, 11:20am

Definitely not. I have GitHub repos publishing to PyPI using the Trusted Publisher workflow, and the publication is triggered by pushing a tag to the repo, not by pushing commits to the main branch. If I wanted to, I could trigger the publication off of the creation of a release, which is a secondary step after pushing a tag.

pf_moore · January 4, 2024, 11:32am

I agree. Automated build/publish workflows are great when they work, but I’ve had enough “oops” moments during that final build that not having a manual checkpoint before publish is a worry.

Also, I don’t like the idea of not being able to do a release if Github is down (for example). Or not being able to prepare a release offline and then hit “publish” when I get back online. Or any of a multitude of other reasons why making github an essential^[1] part of my publishing workflow is a step too far.

optional is fine, and often very convenient ↩︎

hugovk · January 4, 2024, 11:40am

This is possible with Trusted Publishers by set up a dedicated “environment” on the repo as a checkpoint which requires manual approval before it publishes. You can do manual QA at this step.

And Trusted Publishing recommends such an environment: Security Model and Considerations - PyPI Docs

hugovk · January 4, 2024, 11:45am

You can always prepare a release offline and publish to PyPI manually using twine or another tool. Trusted Publishing doesn’t lock you in to only publish via GitHub.

For example, this week’s Pillow release was mostly published to PyPI via Trusted Publishing, and partly via uploading via twine for some wheels that were not built on GitHub.

pf_moore · January 4, 2024, 12:03pm

I had a quick read of that and honestly I glazed over less than a paragraph in. There’s a lot of ideas in that article that I simply don’t understand (or if I’m honest, care about). That’s fine - it’s addressing an issue that I don’t really want to become an expert in. But it also means that this isn’t a viable approach for me.

Yes, but in the context of this thread, a manual release wouldn’t have provenance data, and that could easily result in people raising issues saying that the release was a problem. The social pressure (and associated maintainer effort of having to correct people’s misconceptions) is more the issue here.

Anyway, this is probably not a productive direction for this thread. The main point, which is that not everyone wants^[1] to invest in automated publication workflows, or to tie their processes more closely to github, has been made. And as long as the PyPI team working on trusted publishers and provenance are taking this into account (which they are, based on what they’ve said here) we’re all good.

or has the resources ↩︎

LtWorf · January 4, 2024, 3:50pm

Fact remains that using a token is compulsory, and man twine mentions absolutely nothing on how to do that.

It’s not about adding more of them, my point is about self hosting. If my self hosted codeberg/gitlab instance isn’t supported, in a few months/years I’ll start getting bugreports.

woodruffw · January 4, 2024, 4:01pm

The lack of documentation isn’t ideal, but there isn’t much that can be done about that immediately. Still OT, but see Use API tokens by default for PyPI · Issue #561 · pypa/twine · GitHub for some proposed changes that will hopefully improve the situation there. But ultimately, twine is its own project with its own release schedule.

Sorry, could you clarify a bit here? I’m not sure I understand who will be sending you bug reports; with the proposed scope, users who install via pip and other downloading clients will neither be aware of nor ever retrieve the signed provenance.

(I suppose they could check the “provenance” tab or similar on PyPI and then complain, but you’d be right to just close those as “wontfix,” similar to anything else.)

LtWorf · January 4, 2024, 11:15pm

The users of my package, noticing I don’t have trusted provenance because I don’t publish via github.

I can chose wontfix but eventually this will lead to forks or other projects, just for the sake of not supporting self hosting, because they’ll probably be USA based and have SBOM and similar to think about.

woodruffw · January 5, 2024, 12:51am

I can understand this concern, but I don’t think it’s a package index’s responsibility to encourage or discourage forks; the index should be a dispassionate but secure host.

Separately: the goal is to accommodate your use case, per earlier mentions of needing help with a model for non-Trusted Publisher attestations/signatures. This proposal won’t, but it’s only one of several ways to achieve end-to-end provenance in Python, and won’t be the last of the work here

dstufft · January 5, 2024, 4:23pm

I can pretty definitively say that PyPI will never require you to use GitHub or another provider like that in order to publish to PyPI.

That being said, if there is something we can do to make things better on a common platform/provider like GitHub, we’re certainly not going to preclude that improvement because not everyone wants to use that platform/provider. In cases where that improvement is visible to end users, they may ask for you to support that, the same as they may any other feature request, and it would be up to individual projects to decide if they want to support that feature (and any requirements that feature imposes on them).

EpicWink · January 5, 2024, 11:24pm

Is there a place to discuss this yet?

woodruffw · January 8, 2024, 5:52pm

Not yet – my follow-ons from this thread are to open a new discussion thread, and to begin the PEP drafting process for the Trusted Publisher side.

If someone would like to pre-empt me open the thread for the former, please go ahead! Otherwise, I will likely do it after opening the draft PEP and an associated discussion thread for it.

woodruffw · February 1, 2024, 3:11pm

To round things out here: PEP 740 is now drafted, and I’ve opened a new discussion thread here: PEP 740: Index support for digital attestations

Topic		Replies	Views
RFC: improving pip security with package signing (PEP-458) Packaging	3	991	January 14, 2021
Draft PEP: Recording provenance of installed packages Packaging	21	1751	April 3, 2023
GPG Signature support removed from PyPI Packaging	8	1614	June 17, 2023
PEP 710 - Recording the provenance of installed packages Standards	30	3179	July 16, 2024
PEP 740: Index support for digital attestations Standards	25	2110	July 17, 2024

Pre-PEP: Exposing Trusted Publisher provenance on PyPI

Related Topics