Pre-Pep: Staged Releases separated from PEP-694

I want to be very precise about the terminology I’m using, which I’ve also been careful with in PEP 694. I think this will explain why I prefer to use the term “quarantine” for this particular feature rather than “staging”.

In 694, “staging” refers to an addressable pre-release container of release artifacts. My (rough) analogy is like staging a house you’re selling: you’re setting it up with furniture and knickknacks so someone can get a sense of what it will look like before they’ve bought it. 694’s stages are like that: you can see what a release will look like before it’s published. It’s “addressable” because it means you could point your installer at the stage and do live testing of the release, again before it’s published. You can also modify the contents before it’s published (“hmm, that couch would look better over there”).

The feature being discussed here isn’t a stage in that sense, hence my preference for the term “quarantine”. The scope of the quarantine isn’t important, but the fact that the artifact isn’t installable (or discoverable) is important.

Agreed, although s/staging/quarantine. Cooldowns are an important piece of the puzzle, but I agree that they are installer-facing, and thus under direct user control.

Staged or not (in the 694) sense, having an index scan uploads for known vulnerabilities is a very useful feature, but it’s separate from either the uploader or installer. It’s also an index-specific feature and thus may not need to be defined in a PEP. An index (like PyPI) could just say “there is an inherent delay between uploading an artifact and its (implicit) publishing of 5 minutes so that our index can scan it.” If there are no API or interoperability concerns, then it might not need a PEP.

What exactly is “the gatekeeping idea”? Is it just the scanning delay I described above?

First, in 694, the ability to create a stage as part of the multi-artifact upload process is not a requirement of all indexes. I wouldn’t be surprised if only PyPI implements staging in the 694 sense.

Second, I don’t see a way to implement staging with the implicit immediate publishing process of the legacy upload mechanism without some kind of new API. And if you’re going to implement a new API, then I think 694 is the right way to go. I’m open to suggestions for how to make the current legacy mechanism work with 694’s definition of stages, but I can’t see it. If we’ll need to add APIs, then do we want multiple likely different APIs? Probably not.

I can imagine an auto-quarantine for individual files uploaded using the legacy mechanism though and I think that would be a really useful feature for PyPI. I’d want to have a plan for the scanning side of the equation though, in order to make auto-quarantine useful.

Hmm, I feel scanning for vulnerabilities is actually yet another corner - weren’t we speaking about detecting malware? I think it’s a bit different thing, e.g. no need to hold a release because there is a vulnerability.

I agree on that, but I again feel confused about the scope. If we want to delay 5 minutes for malware detection, I defer again to my question about scanning sources and their quality. But I totally agree this is an index feature without need for PEP.

Exactly, what also was previously refered also as “security gate”.

Fully agree, and to be clear - I was thinking about PyPI as the primary consumer.

I think I wasn’t precise here. When I said about “being able to install”, the only thing I meant in this context was “the release file the index will later serve is public”. I’m aware this is a simplified situation and won’t fully work if the release depends on held dependencies, but my understanding was that we think about a simplified solution.

+1

I think we really need to define the scope because we are mixing a few things that can, but do not have to, be integrated into one solution (and even if I try to separate them, I’m still guilty of trying to solve multiple problems at once). I was focused on “staging”, and thus pushing into maintainer control and public visibility, but if the focus is on integrating scanning into the publishing process, then I see no point in messing up too much with staged publishing.

(the only digression: as already said, I see a publishing policy requiring manual approval as also significant feature, also index-specific and likely not requiering PEP - but maybe it should be a separate discussion topic?)

What do you mean exactly by “auto-quarantine”? Do you mean the delay for scanning, or automatically putting the release into quarantine after qualified reports? It confuses me because PyPI uses this term for the second (and it already exists), and my vision about any pre-publishing scanning is that puting a release into quarantine would be the possible outcome of it.

Thanks for publishing this @cjames23 - I’ve read through it once but definitely need to digest and compare against 694 a bit more, so take my questions and comments as fumblings in the semi-dark.

I now think you are effectively using “staging” in the same sense as 694, to some common extent, if different around the edges. I think your PEP is also allowing for implicit creation of a stage, which would also allow for atomic publishing.

A client requests that an uploaded file enter a staged release, rather than being published immediately, by including an additional form field staged with the value true in the legacy multipart/form-data upload request. All files uploaded with staged=true for the same normalized project name and version join the same staged release.

So this isn’t just a per-artifact construct; you really are allowing for multiple artifacts to now live in a common staging area. That’s a change in behavior that 694 makes explicit, but this PEP adds implicitly. A related question: what happens if I have 5 artifacts for a particular name-version pair, and forget to add staged=true for one of them? I assume the one with the missing key would get published immediately and not to a stage. What happens when the stage gets published? Is it possible to upload two duplicate artifacts, one to the stage and one not? How would that conflict be resolved?

694 explicitly wants to allow fully private releases, for testing and embargo purposes. This PEP doesn’t allow that IIUC. This PEP also isn’t explicit about some behavior, like, what if an artifact in the stage is corrupt? 694 explicitly allows mutability (overwrites, deletes) of artifacts in a stage, and then once published, all artifacts become immutable just like they are with the legacy upload.

The security gate described in this PEP is really an index-specific policy. The PEP could be clearer about the aspects which are protocol (and thus interop) requirements, and which are index-specific policy decisions for which PyPI or other indexes could make different decisions. An example of that would be this PEP’s fail-open timeout. PyPI could be 5 minutes, Artifactory could be 15 minutes, a GitLab package registry could be immediate. How would clients discover the policy and know how to drive the stage?


Another thought occurred to me as I read your PEP. We probably could turn this “inside out” so to speak. Meaning, 694 defines extensible upload mechanisms, with a default that every index must provide. I could almost see legacy upload as one of those mechanisms. Meaning, if I used 694 to create a stage and then used the stage token in a legacy upload, you could merge these ideas in a different way. You wouldn’t need two new endpoints, and since you are changing the legacy payload anyway (staged=true), it seems to me a 6/half-dozen difference[1].


  1. mostly, there’s still the implicit stage creation aspect of your PEP ↩︎

I personally want us to be more careful about requiring web UI login for controlling certain behaviors. We already have a number of things that can only be done in the web UI (e.g. create a project name in an org or move a project to an org) and that causes a lot of scaling problems for enterprises with zillions of packages. We’ve talked elsewhere about scriptable (REST) APIs and there are are some technical hurdles to get past first, but it would be super inconvenient if more functionality is locked behind a web UI login only.

There are important use cases for it though. Public scanning of staged release is at odds with private pre-release testing. I think we can probably do both if we think about it more. Plus, we may not need to make all staged releases public, e.g. if the index itself will security scan or allow for trusted partner clients to access staged releases for scanning. We’d have to work out some permissions with the latter though. An enterprise that’s doing its own scanning and wants embargoed atomic release might not be happy allowing third parties, even trusted ones, to scan their uploads before publishing.

What are we trying to solve with workflows like this though? If I’m a bad actor, can’t I just never enable secure publishing? Secure publishing might be useful for a good actor without the resources to do their own scanning, and wants some peace of mind that third parties will do it before their users get their hands on new releases.

There’s another gap that occurs to me: not all security issues are discoverable at the time of upload. What do you do about post-upload dependency attacks? I release foo 1.0 that depends on bar < 2. Today bar 1.9 is safe, but tomorrow a malicious actor uploads bar 1.10 with an attack. foo 1.0 hasn’t changed but is now vulnerable. It’s already been published so it can’t be scanned.

When I wrote that I was thinking that new uploads would automatically get the quarantine flag and the effects would be just the same as the post-publishing quarantine. Only after whatever scans occur give the artifact/release a clean bill of health would the auto-quarantine flag be removed. It could still be applied after the fact for legitimate malware reports.

I do not want to put any words into your mouth here, but as I read this it would imply that the belief would be that the only way to discover malicious packages is to have someone become a victim of the attack. There are plenty of ways where heuristic based scanners can apply here. Let’s take Miasma as an example, which used .pth files to download bun and then run a credential harvester javascript file. Where the previous versions of the infected libraries did not contain a pth file at all. So we could at the very least here use those types of heuristics and do checks for known attack styles. Having an auto-quarantine with a heuristic based scanner or even something like socket.dev, if Mike and others are open to it and want to try to get the OSS offering from them, would then catch these without putting any users at risk.

I will have to go through the rest of your response and try to answer your questions but I think this one is maybe critical to answer. I think there are multiple forms of staging and that they can all use the same underlying mechanisms. What this boils down to is 0…N number of artifacts associated with a given package version are uploaded whether Upload 2.0 API or legacy. Those artifacts are then put into a staged state which can be explicit or implicit. I would argue that we can make going into staging implicit but moving out of staging can be explicit or implicit depending on the type of staging happening. I would lean into explicit and state that even with scanning where we say we are staging this until either the scanner completes or a time window has elapsed.
(The reason for the time window here is to not be completely disruptive and accept that no signal should be treated the same as a good signal until proven otherwise.) Once that happens a maintainer will need to explicitly take an action to move the artifacts into an available state. I do not think this needs to be a UI feature necessarily, but I did lean into Kamil’s idea of a separate token so that it makes compromising the process mean that both credentials have been compromised.