I’ve seen several interesting ideas go by in the PEP 752/755 discussions, so I wanted to summarise the various possibilities in one place. I’m also going to include ideas from PEP 752 in this list, since some of the options use different pieces of that as their foundation.
If folks think it would be worthwhile, I’d be willing to make this summary an Informational PEP, similar to those the CPython core dev team have sometimes used to put different competing PEPs into a shared context (see PEP 607 – Reducing CPython’s Feature Delivery Latency | peps.python.org for an example). The ideas are separable enough that I don’t believe it would be a good idea to pursue them all in a single standardisation proposal, even for the ideas that are mutually compatible. However, I don’t have any plans of my own to actually implement these ideas, I just wanted to summarise them in one place.
The recurring example I am going to use is from the azure-
namespace on PyPI, comparing a hypothetical azure-enhanced-cli
project (published by someone other than Microsoft, such as the some-publisher@domain.example
used in the examples below) and azure-cli
(an official Microsoft project published by the “azpycli” team).
This initial post is just aimed at describing the ideas, without attempting to assess their merits or go too far into the related technical details. I’ve only listed ideas that I think would be technically feasible.
Explicit provenance assertions
Explicit provenance assertions would use a new syntax like package-name from expected-source
to declare an expected publisher for a component (other potential syntaxes for this were suggested, such as expected-source::package-name
, but I personally prefer the keyword format).
For all of the ideas in this category, PEP 740 digital attestations could potentially be used to assert strong claims to the given discriminator without clients having to trust the index server.
While it likely makes the most sense to pick one provenance assertion mechanism and run with it, it would be possible to define marker characters that distinguished between the potential options at resolution time (@
for email addresses, trailing /
for URLs, .
for domain names, anything else is a user/org name).
Provenance assertions are NOT applicable to direct URL references (those should instead use hashes as a direct artifact level constraint, and otherwise rely on TLS certificates or SSH keys for domain validation).
Using email addresses
- Valid example:
pip install 'azure-cli from azpycli@microsoft.com'
- Valid example:
pip install 'azure-enhanced-cli from some-publisher@domain.example'
- Failing example:
pip install 'azure-enhanced-cli from azpycli@microsoft.com'
The idea behind using email addresses is that this is how the existing metadata standards identify package maintainers and authors.
Pursuing this idea may require tightening up how PyPI validates email address usage in new projects (or when changing the email address metadata on existing projects).
Using repository user and/or organisation names
- Valid example:
pip install 'azure-cli from microsoft'
- Valid example:
pip install 'azure-cli from azpycli'
- Valid example:
pip install 'azure-enhanced-cli from some-publisher'
- Failing example:
pip install 'azure-enhanced-cli from microsoft'
The idea behind using repository user and/or organisation names is that it avoids having to define a way for repositories to verify a publisher’s ownership of an external resource (whether that’s an email address or a domain name).
Pursuing this idea would need some form of enhancement to the repository JSON API standards to make the relevant repository account metadata available to API clients.
The main downside to this approach is that it doesn’t generalise nicely across different index servers (since the user/organisation names are repository dependent).
Using domain names
- Valid example:
pip install 'azure-cli from pyprojects.microsoft.com'
- Valid example:
pip install 'azure-enhanced-cli from some-publisher.domain.example'
- Failing example:
pip install 'azure-enhanced-cli from pyprojects.microsoft.com'
The idea behind using domain names is that they’re already a commonly validated external resource, and most major publishers of packages will already have one. If no external domain is specified for a project, then attempting to assert any specific provenance for that project will fail.
Pursuing this idea would need some mechanism to allow repositories to reliably assert that a particular user or organisation account is an authorised publisher for a given external domain and/or for clients to check that a given project is authorised for publication (see the previous thread for some ideas on that front)
PEP 708 (the tracking/alternate location metadata for projects) would also potentially be relevant to this approach.
Using HTTPS URLs
- Valid example:
pip install 'azure-cli from github.com/Azure/'
- Valid example:
pip install 'azure-enhanced-cli from domain.example/some-publisher/'
- Failing example:
pip install 'azure-enhanced-cli from github.com/Azure/'
Similar to the domain name idea, but allowing domains to be divided by URL paths, not just by subdomains.
Pursuing this idea would requiring some mechanism for verifying ownership of exact URLs rather than entire domains.
Implicit provenance constraints
All of the explicit provenance assertions share a common weakness: a naive user typing pip install azure-enhanced-cli
doesn’t receive any indication that this is an unofficial third party project rather than an official Azure client package published by Microsoft.
Implicit provenance constraints would be designed to avoid that weakness.
The provenance checking mechanisms in this section would likely work best if a single form of explicit provenance assertion was defined as the one to be used for implicit provenance constraints. The examples below assume the use of domain names for this purpose (since they’re likely to be the easiest mechanism for clients to be able to validate independently of any specific repository).
Implicit provenance constraints would only be used when there are no artifact level constraints defined (if you have an artifact hash to check against, you no longer care where that artifact came from).
Trust on first use
- Example prompt:
pip install azure-cli
→
'azure-cli 2.64.0' is published by 'microsoft.com', do you wish to trust this publisher for this project? [y/N]:
- Example prompt:
pip install azure-enhanced-cli
→
'azure-enhanced-cli 0.9' is published by 'some-publisher.domain.example', do you wish to trust this publisher for this project? [y/N]:
Given a defined form of provenance checking, clients would be prompted to accept projects with unknown provenance the first time they’re encountered. While potentially viable for experienced users, this would get very noisy and confusing for new users, hence the other ideas in this section.
When granting approval, users would be able to choose between “trust this project with this provenance” and “trust all projects with this provenance” (probably by asking a second question after project level approval is given, as it would be hard to make a combined question both clear and concise).
Sharing trusted provenance lists
Via some mechanism (most likely packaging entry points with a defined API), installed packages would be able to declare that they provided lists of trustworthy provenance details that mean (in the absence of a publishing account compromise) distribution packages with that provenance are unlikely to include malicious code.
This would allow organisations to publish lists of pre-approved provenance details as installable projects, rather than their users having to define all those trust rules individually.
Defining a default verified publisher list
TLS certificate checking relies heavily on the default CA lists shipped by browser and operating system vendors. Many Python projects directly or indirectly use the certifi
certificate bundle (built from the cert bundles published by Mozilla) as the trust root for their TLS validation.
This is the first idea on this list where we start getting into the same territory as PEP 752 and PEP 755: given an implicit provenance constraint mechanism at the technical level, it would likely be feasible for the PSF to build a “verified publisher” policy mechanism on top of that, where, in return for providing the PSF with additional identifying and contact information, as well as paying an administrative fee, the PSF would:
- perform sufficient checks to satisfy the PSF that the applicant is a legitimate publisher that isn’t aiming to distribute malware
- add the publisher’s provenance details to an installable PyPI package that package management clients can use a default
The administrative fee should be set high enough to cover both the actual administration costs, as well as the reputational risk if a verified publisher turns out to be less trustworthy than hoped. The fee could potentially be discounted and/or waived for PSF members and for other open source organisations, but the verification step should happen regardless.
Just as the trusted CA lists in browsers and operating systems evolve over time, so would the default verified publisher list.
Namespace prefix provenance contraints with open namespace grants
- Valid example:
pip install @azure/cli
(authorised member of theazure-
namespace) - Valid example:
pip install azure-enhanced-cli
(no prefix assertion specified) - Failing example:
pip install @azure/enhanced-cli
(NOT an authorised member of theazure-
namespace) - Potentially valid example:
pip install @azure-contrib/enhanced-cli
(could be an authorisedazure-contrib
project) - Failing example:
pip install @azure/contrib-enhanced-cli
(still not an authorised member of theazure-
namespace)
This idea builds on the PEP 752 concept of repository level open namespace prefix grants.
In this idea, assuming pyprojects.microsoft.com
is the registered owner of the azure-
prefix, then specifiying @azure/cli
is mostly equivalent to azure-cli from pyprojects.microsoft.com
, but with some additional flexibility as indicated by the “potentially valid” @azure-contrib/enhanced-cli
example.
If the asserted prefix has no owner, then the shorthand provenance assertion will always fail.
Otherwise, the namespace ownership will be checked against the implicit provenance constraints (that way an alert will still be given if namespace ownership changes), while the project API metadata will be checked to see if the namespace usage is authorized by the namespace owner. If the project is not an authorized member of the namespace, then the prefix assertion will fail. However, the package will still be installable if the prefix assertion syntax is not used.
As discussed in PEP 752, it is expected that owners of namespace grants would have the ability to designate projects from third party publishers as authorised namespace members, even if those projects are not published directly by the namespace owner.
Registering new projects within a filtered namespace grant would still be permitted, but such projects would typically be unauthorised by default.
Project registration prevention with restricted namespace grants
- Valid example:
pip install enhanced-azure-cli
(project name is outside the namespace grant) - Failing example:
pip install azure-enhanced-cli
(use of restricted prefix disallowed when registering project)
This idea covers the PEP 752 concept of repository level restricted namespace prefix grants.
The open vs restricted namespace grant distinction doesn’t affect the client side of things much at all (there’s a proposed API field to indicate the difference, but no real motivation for clients to ever check it).
Instead, it is more a process issue between the PSF and entities that have successfully become part of a verified publisher program, and would like to defensively register specific (most likely trademarked) prefixes to prevent future uploads to them without specific authorisation from the namespace grant owner.
Note: several of the ideas above were first suggested by other people, so if anyone has links handy, post them below and I’ll edit them into this initial post. Also feel free to mention other ideas that I missed when they came up in the discussion threads and I’ll see if it makes sense to add them to the overview.