Ideas for client side package provenance checks

ncoghlan · September 22, 2024, 6:09am

With the hopefully-at-least-somewhat-neutral summary completed, my own current thoughts:

I think starting with prefix validation (as PEP 752 does) is a bad idea. I see potential value in the general notion of supporting namespace prefix grants on PyPI (especially given an explicit syntax for requesting only official packages within a namespace prefix), but without a way for users to indicate to PyPI clients which namespace owners they consider trustworthy, its potential value seems limited.
I really like the idea of using third party domain attestation as the root trust mechanism for publisher verification (specifically, defining a .well-known URL as @woodruffw suggested that clients can use to validate a claim like azure-cli from pyprojects.microsoft.com against the specified domain, rather than having to trust anything published by the repository server itself). Such a mechanism may also help solve the distribution problem for PEP 480 TUF signing keys.
there’s a valid concern that clients spidering out to check attestations against multiple domains could bring back the bad old days of find-links external repository metadata causing installation reliability problems before PEP 470 removed that feature. Implementation PEPs would need to address that concern (e.g. by having most packages continue to default to using PyPI as their sole publishing authority, and by leveraging PEP 740 to have PyPI host checkable metadata for cases where the third party attestation servers are unavailable). Having artifact level hashes pre-empt the need to check project level attestations is also aimed at avoiding this problem (since lock file users would only potentially face issues at lock file resolution time, not when installing from a lock that contains artifact hashes)
I like the general idea of “trust on first use” as the basis for adding new trusted publishers to a personal development environment, but I think any practical UX for that feature is going to need a way to provide a base set of trust rules, defined either by a specific organisation (for institutional use), or by the PSF (for new users in general). The differences between TLS usability (wide-spread) vs SSH usability (niche, only for highly technical users) is a major influence on my thinking here.
the procedural efforts needed for the PSF to be able to administer the restricted namespace grants proposed in PEP 752 feel like they could be better invested in a more general “verified publisher” program that goes beyond the basic metadata validation that PyPI is already doing. Such a program could also potentially have multiple tiers, such as “verified” (the PSF is aware of the publisher’s legal identity, is satisfied it is a genuine identification, and that the user won’t intentionally publish malware), and “high assurance” (the PSF is confident the publishing access for that account is appropriately controlled with a low risk of malicious compromise).