Hello all! I’m opening this up as a pre-PEP discussion thread, with the goal of getting some additional attention on this problem and the proposed solution before creating an actual draft PEP.
TL;DR: “We can bootstrap cryptographic provenance on top of Trusted Publishing, thereby giving us provenance for a large number of PyPI packages (by total downloads). Users will be able to retrieve this provenance, allowing them to verify that a package originates from a particular source repository or CI system.”
Disclaimer
I am a contributor to PyPI, but not a maintainer. These are my opinions, do not reflect others’, do not reflect official positions, etc. etc.
Problem statement
As of 2024, there is currently no good way to deliver provenance for packages on PyPI. In other words: while PyPI itself offers transport security and strong hashes for downloaded distributions, there is no way to verify that a particular package came from a particular source repository or other signing identity.
Doing this poses significant usability and operational challenges: previous attempts (like PGP signatures) have relied on publishers to maintain long-lived signing keys, users to retrieve those keys (and rotate them correctly), as well as an external ecosystem of online keyservers for key distribution. Even with all of that, users still had to establish trust in specific key-identity pairings, since any PGP key can claim to represent any arbitrary human identity. In practice, this meant that only a tiny minority of PyPI packages were actually signed with PGP (both by package count and overall downloads), and that an even smaller minority of users actually verified those signatures.
Solution statement
A workable, lasting solution to this problem needs to sidestep the operational and usability issues that come with PGP and other manual identity binding layers.
We have a new technique for this available on PyPI, as of April 2023: Trusted Publishing. Under the hood, Trusted Publishing uses OpenID Connect to associate a PyPI project with a workflow that is trusted to publish the project, such as a GitHub Actions workflow on the project’s associated GitHub repository. The resulting association is cryptographically bound, meaning that no other user or GitHub repository can impersonate the Trusted Publisher. As a result, a Trusted Publisher can publish directly to PyPI without manual API token configuration.
Because Trusted Publishing is OIDC under the hood, any Trusted Publishing workflow can also become a provenance-generating workflow (with Sigstore) with no additional user configuration required.
So, the actual solution statement: for PyPI packages that are currently published with Trusted Publishing, we provide zero-configuration provenance without user effort by resulting the Trusted Publisher identity as a Sigstore codesigning identity.
In less jargon: a repository named github.com/pypa/sampleproject
that uses Trusted Publishing to upload to PyPI will also upload provenance that downstream users can verify to establish that each uploaded package genuinely comes from pypa/sampleproject
’s CI.
Components
Changes to gh-action-pypi-publish
(and other publishing workflows that use Trusted Publishing)
To make this work, publishing workflows like gh-action-pypi-publish
will need to re-use their pre-existing id-token: write
(or equivalent) permissions to obtain an OIDC credential with aud: sigstore
. That credential will then be bound to a short-lived signing key via Sigstore’s “keyless signing” mechanism, allowing the workflow to sign for each of the distributions to be uploaded. All of this can be abstracted behind sigstore-python
, which is a mature Sigstore implementation designed (in part) for exactly this purpose.
In effect: gh-action-pypi-publish
will produce {dist}.sigstore.json
for each dist
given to it. This requires no additional user configuration or interaction, since the permissions needed to produce {dist}.sigstore.json
are the same permissions needed to upload with Trusted Publishing.
Changes to twine
(and other uploading clients)
Once publishing workflows like gh-action-pypi-publish
begin producing {dist}.sigstore.json
for each dist
, uploading clients (like twine
) will need to become aware of these “sidecar” artifacts and include them with each uploaded distribution.
In effect: similar to how PGP signatures were handled ({dist}.asc
), clients like twine
will need to detect {dist}.sigstore.json
for each dist
and upload each’s contents as associated metadata.
Changes to PyPI
PyPI requires two sets of changes for this work:
-
Producing side: the upload endpoint will need to accept a
provenance
or similarPOST
field, containing the contents of{dist}.sigstore.json
as mentioned above. This field should have at least the following semantics:- It MUST be present if the uploader is a Trusted Publisher, and MUST NOT be present otherwise.
- This will require a deprecation/onboarding period, since there are existing Trusted Publishing workflows that will not immediately upgrade to the latest version of
gh-action-pypi-publish
.
- This will require a deprecation/onboarding period, since there are existing Trusted Publishing workflows that will not immediately upgrade to the latest version of
- It MUST be a valid Sigstore bundle (i.e. signature, signing certificate, and other metadata needed for a Sigstore verification)
- The signature MUST be valid for the given
dist
, which PyPI can verify by usingsigstore-python
’s verification APIs with the Trusted Publisher fordist
as the expected signing identity.
- It MUST be present if the uploader is a Trusted Publisher, and MUST NOT be present otherwise.
-
Consuming side: PyPI will need to decide how to expose the Trusted Publisher signatures uploaded to it. Some (not mutually exclusive) options:
- Make
{dist}.sigstore.json
available via an additionaldata-
attribute on the PEP 503 Simple Index - Make
{dist}.sigstore.json
available via additional attribures in the PEP 691 Simple JSON Index - Expose Trusted Publishing status (and associated verified signatures) on each release view in the Web UI, similar to what NPM does
- Make
PEP items
Not all of the above falls into the scope of a PEP, so I’ve broken out the specific things (possibly incomplete!) that I believe need to be standardized or included under a PEP here:
- Changes to the upload endpoint: I’m not sure if this requires a PEP (since the current endpoint isn’t specified by one), but if so: the addition of a
provenance
(or similar)POST
field. - Changes to the PEP 503 and PEP 691 indices: both index formats should reflect (1) the expected Trusted Publisher identity for the uploaded release, and (2) the Trusted Publisher signature for the release.
Other considerations
- As of writing this, PyPI’s Trusted Publishing is currently limited to just GitHub Actions. This covers a plurality (if not majority) of actively maintained projects, but is too narrow of a supported platform base to confidently build a stable, long-term code-signing scheme for PyPI on top of. Consequently, everything proposed above is blocked until PyPI supports at least one (and ideally more than one) additional Trusted Publisher (e.g. GitLab, Google Cloud Build, etc.).
- Because this bootstraps on top of Trusted Publishing, this work will be unable to provide signatures for packages that aren’t uploaded with Trusted Publishing. This is a tradeoff made for expedience and operational reasons: starting with Trusted Publishing avoids many of the hard PKI problems that code signing otherwise needs to handle, and allows PyPI to distribute signatures for a large percent of overall PyPI downloads (since many of the current top projects already use Trusted Publishing).
- This proposal isn’t meant to be the “final state” of codesigning on PyPI. Instead, it’s meant to be an early building block for later improvements, such as PyPI emitting counter-attestations/counter-signatures for each uploaded package.
- Because this proposal has PyPI distribute both the signatures and the identities needed to verify them, it isn’t intended to protect against threat models where PyPI itself is malicious. This is similarly done for expedience/operational reasons: PyPI is already the center of trust, and attempting to reduce that trust requires separate techniques (like TUF or mandatory asset transparency) that aren’t immediately practical to integrate.
What happens after this?
Everything above focuses on making codesigning as “no-touch” as possible, and giving PyPI the ability to verify (and redistribute) the signatures uploaded to it.
From there, there are a lot of things that could be done to further adopt codesigning in the Python ecosystem. These are outside of the immediate scope of the ideas above, but I think are worth discussing in this thread (as they’ll certainly inform the more concrete design decisions we propose):
- How does this interact with lockfiles (and upcoming lockfile standard proposals)? Being able to lock the identity associated with a PyPI project name is useful from a security perspective, since it reduces trust in PyPI itself.
- How do we integrate this into
pip
and/or other installing clients?pip
has vendoring constraints that make cryptographic dependencies a challenge, due to their native transitive requirements. - Similar to the first point: long term, how do we reduce the amount of trust placed in PyPI? This technical proposal doesn’t increase the amount of trust, but doesn’t decrease it either.
- How do we extend this to other publishing workflows, i.e. ones that won’t (or can’t) be moved to CI providers that support Trusted Publishing?
- Long term stability: how can we build this in a forwards-compatible way, ensuring that a different provenance technique can be inserted if Sigstore becomes unmaintained or otherwise inappropriate for PyPI’s needs?
CCing a few people who I know are interested in this design and conversation: @dustin @sethmlarson @dstufft @EWDurbin @miketheman