Splitting this concept out into its own thread:
One of the key realisations I have taken from the PyPI prefix reservations threads is that they’re fundamentally about PyPI clients asking the question “Can the publisher of this package reasonably be trusted?”, with the goal of helping tools to detect typosquatting and similar styles of attack.
PEP 752 and PEP 755 put the burden of establishing trust onto the PyPI “organisations” mechanism, and a manual review process, and then attempting to convey the result of that review to clients via a prefix reservation mechanism. Despite @ofek’s valiant efforts, I’m starting to doubt it’s going to be possible to wrangle that into a form which is both effective and feasible for the PyPI admins to manage.
The suggestions from @takluyver and @petersuter in the linked thread put me in mind of an entirely different model for establishing trust: the automated challenges that Let’s Encrypt uses to confirm that a client controls a domain name before issuing certificates for that domain. (Presumably Thomas and Peter also had the DNS-01 challenge in mind when making the suggestion to do something similar). If it’s good enough for Let’s Encrypt to issue wildcard TLS certificates, presumably it’s good enough to trust for believing that PyPI packages come from an approved publisher.
As a completely unreviewed first sketch of that idea, I think it would need at least the following components:
- a repository API project metadata field. Let’s call it
domain-authority
: it references a domain name (or perhaps a full URL) representing the “publishing authority” for that project. - a protocol for clients and repositories to use to confirm that a
domain-authority
API entry in a Python package repository JSON response is valid (such queries would be sent to the domain authority URL, NOT to the repository server). Both HTTPS and DNS seem like plausible candidates here, but HTTPS would probably be simpler (and avoid various categories of attack that would otherwise arise due to these values being static metadata rather than the dynamic challenge tokens used in ACME).
Creating a project with a domain authority set would be a three step process:
- Create the project on the repository (so you can be sure the name is available)
- Add the entry for the project on the publisher controlled domain authority server
- Set the
domain-authority
field in the repository project metadata (which will now pass the repository server’s validation check due to step 2)
On the client side, there would be three options for handling domain authority validation:
- Ignore the new metadata field (this preserves the status quo)
- Accept a list of approved domain authorities, and warn for any packages that don’t come from an approved domain authority. Trust that the repository has already checked the validity of the domain authority entries in the API response rather than checking them independently.
- Similar to approach 2, but independently verify the domain authority claims. May fail the installation if any of the referenced domain authority servers are down.
While I’m sure there are devils lurking in the details, the basic idea feels plausible to me, and more compatible with the historical free-for-all that is PyPI package naming. It also avoids putting any additional publisher review burdens on the PyPI admins (since that part of the problem is offloaded to certificate authorities and DNS registrars).
(There are also some echoes of the “maximum trust” TUF model here, but accepting substantially weaker security guarantees as a result of avoiding the key management logistics involved in enabling end-to-end artifact signing)