CCing @sethmlarson, since this came up as a (small) roadbump in PEP 740 (which is still in draft).
I’m filing this as a discussion thread, since there may be consequences to this idea that I haven’t thought through.
PyPI’s data model for packages looks like this:
Projects have names (e.g. requests)
Each project has zero or more releases, corresponding to versions (e.g. requests==2.31.0)
Each release has one or more files, corresponding to individual installable “distributions” of that release (e.g. requests-2.31.0.tar.gz for the sdist, requests-2.31.0-py3-none-any.whl for the wheel)
Many projects have >2 files per release, corresponding to wheel builds for various Python versions, architectures, host OSes, etc. A recent version of cryptography has 23 release files (1 sdist, 22 wheels).
Files within a release can “overlap” in terms of platform compatibility, and the package installer (e.g. pip) is responsible for selecting the “best” release file. I believe this “best” selection is not explicitly documented anywhere, but is probably something like “pick the most specific wheel for the current host, followed by any compatible wheel, followed by the source distribution.”
The data model above is fine, in terms of ensuring that installers can select a compatible non-source distribution.
However, PyPI’s implementation of it has a quirk: every release is open ended, in the sense that a maintainer of project spam can add new files to spam==1.2.3 long after the first (or last) previous file was uploaded.
This has a few arguably negative effects:
Maintainer confusion on uploads: a maintainer looking to publish a new version of an old package may forget to change the version string, and instead silently upload new wheels to an old package version. This is both hard to debug (since the index reports no error) and can cause downstream confusion, since the new distributions are not necessarily compatible with the old ones (and worse, may be silently prefered during installation due to their greater specificity).
Unnecessary pessimization during resolution: an optimal backtracking dependency resolver needs to pessimistically assume that the set of files for each release under consideration may change at any point, potentially causing a cascade of recomputed dependencies if a release gains an additional file in the middle of resolution. I’m not sure if current major installers attempt to be “optimal” in this way (it’s arguably not worth it!), but having a guarantee that the release’s files will not change would enable further optimizations/caching here, I think.
Security: in its current form, PyPI is uniquely vulnerable to a form of dependency compromise that other packaging indices are not: an attacker who wants to compromise users of spam without signaling their presence via a new release can instead create new more-specific distributions in current releases, relying on dependency resolvers to faithfully select the most specific wheel for use. This even affects users who pin (e.g. spam==1.2.3) since version pins have no effect on wheel resolution, although it doesn’t affect users with pre-existing hash pins.
My proposal: PyPI should put a cap on the “open ended” nature of each release. What that looks like needs further discussion, but here are some initial ideas to get the conversation started:
Prevent additional file uploads to a release, beginning X hours after the first file upload. This effectively gives packagers X hours to upload all files to a release, after which the release is “frozen” and can no longer be modified (in terms of attached files). The value of X would need to be decided.
Prevent additional file uploads to a release, beginning X hours after the last file upload. This is like (1), except that it keeps the release “open” potentially indefinitely, so long as the uploader continues to upload files within the window. The primary advantage of (2) over (1) is that it allows a little more flexibility in the event of long-running publish jobs or release mishaps.
Prevent additional file uploads to a release, as soon as a new stable (i.e. non-beta/rc/etc) release is made. In other words: once spam==1.2.4 is released, spam==1.2.3 is considered “frozen”.
Some combination of (1)+(3) or (2)+(3)?
I’m curious to hear what people think of this idea! In particular, I’d like to hear about legitimate use cases for “open ended” releases, and whether this is something anybody in the community is currently relying on.
This is not technically indefinite, since the user would eventually run out of valid wheel tag combinations. But it would be much longer than the hard X hour window, if a user is determined to keep a release open. ↩︎
A pretty frequent request is the ability to “stage” uploads. Right now as soon as I upload my first wheel, that’s the latest version and everyone will try to download that version, even though it’ll take at least a few seconds to upload the rest of the artifacts.
It seems to me that solving that problem is a great way to solve the open ended problem. If, instead of just uploading things one at a time that immediately went live, the workflow was something like:
Create “staged” release
Upload all artifacts to that release
Publish that release
Then it’d be straightforward and obvious to say “no new artifacts after that”.
At one point there was a discussion about “draft” releases, and as I recall one of the ideas was that any release would be in draft (essentially yanked-by-default) until it was “finished” by some explicit action (another API call). That would be a neat way to resolve the feature side of this, as you could then freeze file uploads after the finishing step.
You could also just allow an API call to lock the release, which is extensible to the full “draft” feature if that is ever agreed upon.
In terms of doing things automatically, most automated build/publish processes are going to do all the builds, then some basic testing, and then publish all the files together, so those could easily handle quite a short window after the first publish.
However, I’ve previously advocated/suggested that distributing a manual build process (like we do with CPython itself, where one person tags the sources and others produce the binaries) should be encouraged for where a maintenance team doesn’t feel confident covering all platforms. I’m not sure how much this is actually happening, but if I were doing it I’d want a 48 hour window at least. That feels long, and an opt-out option feels expensive.
I think (3) is ruled out by bugfix releases. People need to release 1.1.2 and 1.2.3 simultaneously, and you can’t consider 1.1.2 locked immediately just because there’s already a 1.2.2. The same window applies as above, I believe, so it doesn’t really gain anything.
What should definitely be available though is the publish date/time of each file, and I think it’s reasonable to highlight (either in PyPI directly, or dedicated scorecards/tools) when there’s a significant discrepancy between release times. It’s not necessarily bad, but it’s worth manually checking whether it’s expected.
There are significant upsides to this proposal for security as well as caching, so I like the idea in principle. I’d like to point out though that it completely removes the use of build numbers, and using those to deal with an issue in a single wheel - see What to do when you botch a release on PyPI. Since you also can’t delete individual wheels, all that’s left is yanking or deleting the whole release.
MarkupSafe (and probably other projects with platform wheels) currently takes advantage of this property. When a new version of Python is released, we can start a build for the wheels for that version. The publishing workflow is the same as the original wheels, using trusted publishing to upload to PyPI. In this way, we can add wheel support for the currently released version without having to create an entire new release. MarkupSafe doesn’t change much anymore, so an entire new version just to add a new Python version would just cause unneeded bumping and downloads for all other users.
Conceptually I’m in favor of moving in this direction for PyPI, the “every release is open-ended” is a common gotcha for folks designing security systems and policy since we’re one of the only ecosystems (if not only) that has this property.
Draft releases would be the best solution since it offers benefits alongside the removal of capabilities.
IMO this is the use-case that’s most served by PyPI having open-ended releases that doesn’t have a replacement in “draft” releases or release windows expiring. I’m less convinced by the build numbers or “fixing a subset of wheels” use-cases, if something is actively broken then a new version seems like a good idea?
This doesn’t seem true - pretty much any ecosystem that deals with distributing and installing binaries has the option of updating builds. Linux distros, Conda-forge, Nix, Spack, etc. all work this way. Perhaps you’re only thinking of source releases on language-specific packaging indexes?
Do you think doing abi3 builds would help here? I see that MarkupSafe has a bunch of individual per-CPython-version wheel builds, which should be collapsible into singular abi3 wheels. That wouldn’t completely eliminate your matrix, but I think it’d avoid the need to push new files at all when new Python versions are released.
In your estimation as a package maintainer, would you consider this an acceptable tradeoff to make? In other words: if PyPI were to (not that it necessarily will) go down this route, do you think your project would be the target of ire from users who receive the occasional spurious bump (which should be even more spurious with abi3)?
Yep, I was thinking of language-specific packaging indices in that original comment.
However, for more general indices: do those indices support new variants of the same release in the same way that PyPI does? My understanding is that many non-language indices support some kind of rebuild + reupload, but that the rebuild overwrites the existing artifact for a specific target. This is distinct from the dependency compromise vector I originally mentioned, which involves both artifacts still being present but the dependency resolver selecting the newer one because Python packaging allows for multiple compatible distributions.
Couldn’t this be solved easily by an extra prompt in twine:
Note: you are uploading a new distribution for an existing release. Is this what you want? [y/n]
whenever the user tries to upload a new distribution for a version that already has distributions which were all uploaded more than (say) a day ago? IMHO, this doesn’t need outright forbidding such uploads.
If security is a concern, isn’t it necessary to check hashes anyway? To me, this sounds like the real solution here is to pick up the work on standardizing a lock file format.
It can be useful when you repackage software that you don’t have control upon. For example, Ninja is redistributed on PyPI by the scikit-build folks. The PyPI package is not maintained by the upstream Ninja authors. In such cases, having to ask for a new release upstream if you botch something is not great.
Wouldn’t overwriting the existing artifact for the specific target have the same compromise vector? A new artifact is uploaded and is selected, either because it’s been overwritten or because we’ve allowed multiple compatible distributions.
I think that this proposal does close a real capability that could be used to attack people in a way that is, by default, pretty “quiet”. I also think that capability can, at least in theory, be used for useful and positive reasons.
I don’t think that any real decision one way could be made here unless we get some real numbers behind how often people actually use that capability for useful and positive reasons, and even then we should consider if there are other mechanisms we can put into place to mitigate without removing that capability .
Lock files (with hashes) do solve this, but unless we get to a place where they are emitted and used by default, that provides a much more limited impact than disallowing them does. That’s not to say that is a bad solution, that’s just one of the trade offs of that solution. ↩︎
Similar, but not identical: one is arguably more surreptitious than the other, since the attacker in PyPI’s case can pinpoint specific targets to deploy a compromised file to, versus having to risk general discovery.
(This is arguably a pretty thin hair to split. But I think it’s worth pointing out as part of how PyPI’s model for names/releases/files is subtly different!)
Fully agreed! I don’t think an actionable PEP can come out of this discussion until we have concrete numbers (which I’ll work on).
You see it has a version number and a build number - in this case containing info on the exact Python interpreter built against, plus an increasing 0, 1, ... build number that should be bumped when rebuilding the same version. What exactly changes in a rebuild is flexible. You may add new architectures for example, e.g. via changing skip conditions in the recipe:
skip: true # [py<39]
skip: true # [(aarch64 or ppc64le) and python_impl == "pypy"]
It’s very similar though, both may silently change the result one may get from an install command. I wouldn’t worry about the differences here.
To be clear - is the vulnerability here in the situation where a malicious party has managed to gain the ability to upload to PyPI a new artifact for an existing project and release? Because if that’s the case, surely they already have sufficient power to do all sorts of damaging things, and this one is probably the least of our worries.
I’m unclear why this attack vector is considered significant, when (for example) we’re still awaiting an implementation of PEP 708, to address dependency confusion attacks (which it seems to me are a much more serious and realistic threat).
I don’t want to look like I’m arguing that because we haven’t fixed one problem, we should ignore all others. But on the other hand, if we’re trying to make progress with limited resources, maybe we’d be better focusing on existing work, and giving the PyPI maintainers a chance to keep up?
I think there’s still an important distinction here: a build number is part of an explicit ordering during dependency resolution. “Post-releases” play a similar role to build numbers in PEP 440, and have an explicit ordering.
In contrast, there is no explicit ordering between compatible release distributions. Package installers generally try to select the “most” specific distribution, but even specificity is not super well defined (e.g., it’s unclear whether a wheel with a single tag set “should” be ordered over a different wheel with a compressed tag set that contains the former).
This is arguably very pedantic. But I think these kinds of quirks are worth enumerating, even if we ultimately don’t consider them sufficient justification for making changes to PyPI’s release behavior
I think I’ve caused a lot of confusion by including a security point in my original post . To make things explicit: I do not think this is a significant attack vector, especially compared to other kinds of dependency attacks.
A summarized version of my position is this: the fact that PyPI releases are “forever open” feels like a quirk to me (one that the community has evolved practices around!), rather than something that was originally intended. The proximate reason for that quirk is a limitation in PyPI’s “legacy” upload API around single file uploads, not an explicit design decision to keep releases open forever. This ends up having weird consequences for Python packaging’s data model and ordering logic, a tiny part of which might have security consequences. But 99% of the “itch” here is it just being a bit quirky.
I don’t think fixing this is a significant priority, and I’m certainly not volunteering the PyPI maintainers (or myself) to do any immediate work on this here – there’s plenty of otherr stuff that needs doing
This is a fair point. The wheel specification is unfortunately vague in a number of areas. It was created at a time when the stakes were a lot lower, and the level of precision that’s now necessary wasn’t the norm. A proposal to tighten up this area would be welcome.
Again, I think it’s more that it was a design that made sense when PyPI was developed (originally, PyPI didn’t even do the hosting - it was literally just an index - and it was almost entirely a source-only repository). Not so much a “quirk” as a design choice that was made under different constraints. And now, while we could change that design, we need to accept that backward compatibility concerns make it harder than it would be if we designed something from scratch.
It’s a fair point to make, but it’s certainly not something I can get worked up about, personally.
Do you have any examples of this happening in real life? This seems unlikely to happen, especially if the maintainer prepared wheels for the same platforms as for the previous version (as it seems unlikely that there would be no common OS and Python versions between two releases).
Using different data to resolve the same package in different places sounds more likely to cause unresolvable dependencies. If the resolver were to fetch details of a package X multiple times, what if the dependency specification was open-ended (no maximum/specific version) and the maintainer of X uploaded a new release while dependencies were being resolved?