Feature Proposal for PyPI: Draft Releases

steve.dower · April 15, 2020, 9:20am

Good point, but can we change that? That doesn’t seem to be in the spec, and since the implementation isn’t finished yet presumably nobody is depending on it. (CDNs can be purged, I do it all the time)

I’m just trying to avoid us having so many disjointed UX flows for when users come to figure out how to use our stuff. We already have enough of those, we don’t need to deliberately design more

FRidh · April 15, 2020, 2:53pm

I would very much like to be able to fetch artifacts using an SRI hash over their contents. E.g., one could fetch an artifact using https://pypi.org/project/artifacts/sha256-<insert hash>. Maybe it should still be prefixed or suffixed with a regular filename.

dstufft · April 16, 2020, 4:07pm

I started to touch on this before, but I wanted to bring it up explicitly. I don’t think I like the idea of this boolean flag. I think it’s likely to be a point of confusion for people as they switch between projects because they’re going to get vastly different behavior depending on some flag that isn’t obvious to them at the point of upload.

I can easily see people thinking they set it on a project, when they have misremembered but they actually haven’t and accidentally publishing a release that they meant to be a staging release. Likewise I can imagine someone not realizing/remembering that this flag had been set, and just doing twine upload and thinking they were done and walking away.

In general, I don’t like behavior changes to be toggled by some out of band flag. We have some experience with this kind of mechanism (we used to have an “auto hide old releases” flag) and it was a semi regular source of confusion for people.

I would personally prefer it if we basically just always auto-published, but allowed passing along a flag in the upload API that says “this is for a draft release”, which would then do the right thing on the backend. This would require changes in twine, but I don’t see that as a major blocker, new features often require new version of tooling.

I presume the implication here is that if I try to twine upload a second release version as a draft (as in I have 1.0 as a draft, and then upload 2.0 as a draft), it will generate an error? Or will it just build up a list of draft uploads that can all be published as a single entity?

I don’t think we want to do this redirection, I think once a release is no longer a draft, these URLs just stop working to highlight the transient nature of them. Nothing outside of a release process should really be relying on them. This isn’t something to get a semi private repository or something like that.

I’ve been thinking about this, and to start with I want to declare a few principles that I think we should ensure is satisfied by the ultimate solution here:

URLs should not be deterministic from any data available to the end user.
Draft releases should have a limited time frame of availability, and then the URL should stop working (but unless we expunge all of the released files, a new URL should be able to be retrieved that will be usable again for another period of time).

So given that, I’m going to riff on @EWDurbin suggest here, and say that a format like /draft/{some hash}/ is what the base URL should be, this is what a user would put in in their --extra-index-url flag. Inside of that would be an URL like /draft/{some hash}/{project name}/, so that it matches the PEP 503 repository URL structure.

So the biggest question to me here then, is what should {some hash} be? I was thinking about this and I don’t like any solution that is using something like Release.id or anything in it, because I don’t like the idea of exposing primary keys out like that, because in my opinion, primary keys should be an implementation detail of the database, not a public using interface (and this includes anything that derives the hash from Release.id).

My suggestion here would be something like a Draft model, which is basically just a model that acts to store the {some hash}, and link it to the release that it is a draft for. We could also do it as a column on the existing Release model, like a Release.draft_slug or something, but I feel like a Draft.slug is a cleaner overall model.

As far as what that generated hash should be, I would just generate some random ID. Maybe something nice like generate 5 random words, and make the url something like https://pypi.org/drafts/ghost-due-concern-strip-fall/ to make it easier to pass these random URLs around between disparate systems (some of which may be manual). The other option is to just do:

import string
import secrets
alphabet = string.ascii_letters + string.digits
password = ''.join(secrets.choice(alphabet) for i in range(8))

And get an URL like https://pypi.org/drafts/3AtfAvJgSMoT3wY/ which is both shorter, and “looks” more like an auto generated thing.

In the UI I would just make the URL as normal (http://pypi.org/p/{project}/{version}/), and add some appropriate UI elements to indicate that the URL you are currently viewing is for a draft release. I would not list draft releases on any other UI page available to the public, so you’d have to know the URL ahead of time to view it.

Yanking purposely solves a different use case, and has different public facing properties. Yanking a release is something that is publicly visible, publishing a draft release is not. Releases that are yanked or unyanked are subject to the immutability of files that PyPI currently has, draft releases would not be.

Could we change those things? Yea I mean it’s all just code, so it’s possible. However I think that would be a step backwards. It’s a good thing that we don’t let you replace foo-1.0.tar.gz with completely different contents the next day, and walking that back is not something I would be a fan of.

dstufft · April 16, 2020, 4:09pm

I think that this is unrelated to this proposal, it’s something worth discussing either in it’s own thread or as an issue on Warehouse.

njs · April 16, 2020, 9:20pm

+1 for twine upload --draft ....

(Possibly twine should also internally switch it’s “normal” publishing mode to work by creating a draft release, uploading all the given files into it, and then publishing it, so that all the files are published atomically. But that’s a tangent.)

Maybe the API should have verbs:

CreateDraftRelease(project) -> draft_release_id
AppendToDraftRelease(draft_release_id, artifact)
PublishDraftRelease(draft_release_id)
DiscardDraftRelease(draft_release_id)

?

It sounds like you’re saying that the draft release object and the draft release URL should have different, independent lifecycles? That seems confusing to me. I think it’s reasonable to say that unpublished draft releases get automatically discarded after some time (maybe one week?), but let’s keep the same URL for that whole time.

I think you want secrets.token_urlsafe(bytes_of_entropy)

dstufft · April 16, 2020, 9:39pm

Yea i considered it, but didn’t really mention it mostly because Warehouse certainly can’t change it’s default (at least in the existing API) so it’s not really super relevant. But you could imagine a twine upload --draft and twine upload --publish or something that gave explicit commands, and then the default could be managed to move from one to another.

I’m assuming a minimum number of changes to the existing upload API, so with that I’m basically suggesting adding a field to the current POST request that currently says “this is a draft upload”, and then Warehouse would determine if there is an in progress draft, and if not create one, and if there is append to it. Publishing/discarding would likely be through the UI to start with. Longer term I can imagine the API having some verbs like what you said though.

Yea, I was kind of sketch on that idea to begin with TBH. Mentally I just didn’t like the idea of throwing away an upload implicitly, but it’s probably perfectly fine to say that after some period of time, a draft release just gets purged along with any relevant uploads.

Nope! I picked what I did carefully I didn’t want the punctuation marks that exist in an urlsafe token because all alphanumerics (and possibly even just letters) creates a nicer looking, random slug than an urlsafe b64.

njs · April 16, 2020, 9:44pm

It might end up being simpler to create a new clean-slate upload API for this, in order to keep the semantics boring and explicit and avoid the “check for an existing draft, check if the versions match, …” heuristics. But I’m not maintaining warehouse so I’ll defer to y’all on that

dstufft · April 17, 2020, 4:34am

It’s pretty trivial, we already have to look up and/or create a Release object for every upload. depending on how we modeled it, it would either be an extra query to look up the DraftRelease object for that release, or it would just already be there as an attribute on Release or something like that. It’s not at all difficult.

ncoghlan · April 19, 2020, 7:43am

Definite +1 for keeping “auto-publish” as the default for all projects, and offering an opt-out in the upload API to make use of the new draft release feature.

Regarding the URL, and the create-or-append semantics for draft releases, the combination of “only one release in draft at a time” and “create-or-append is implicit” seems problematic to me, as I could easily see a situation where a project with maintenance branches tags multiple branches at once while responding to a security issue, and then an automated CI pipeline kicks off making a separate release for each of those branches. With only one draft per project permitted at a time, any project with long-lived maintenance branches would need serialisation logic in place to ensure releases weren’t being made from multiple branches at the same time.

That concern gets significantly reduced if the restriction is “one draft in preparation per version per project” (likely accompanied by a cleanup algorithm like “unpublished drafts will expire after 14 days without modification”).

For the URL itself, effective automation is going to need either a predictable URL (i.e. if you know which release you’re trying to test, you can work out what --extra-index-url you need) or else some kind of query API that lets you know the index URL for that draft. Given the former can also function as the later to some degree (if https:/pypi.org/draft/pip/20.0.2/ 404s, you know that draft isn’t in preparation), a predictable URL seems like the simplest automation-friendly approach to take, where one possible usage model would be:

upload draft sdist
trigger wheel builds and sdist install testing (using predictable draft index URL)
upload draft wheels
trigger wheel install testing (using predictable draft index URL)
publish release (this will presumably be UI driven in the initial version of this feature)

penguin_brian · April 20, 2020, 9:54pm

I like this. I am not sure it was made clear what restrictions would occur after revision is published. I would suggest:

Package cannot be unpublished and reverted back to draft.
New package files cannot be uploaded for that revision.
Existing uploads cannot be changed.

i.e. the revision is read only for all intents and purposes.

I would also suggest that a revision once published should not be deletable (it is going to break anybody that is referencing that revision) but admit that there are cases where this is may be required (legal reasons, defamation, offensive content, unauthorized upload, revision found to contain malware, etc).

njs · April 20, 2020, 10:32pm

These two are already how things work, and I don’t think anyone would consider changing them.

This would be a new restriction, and I’m not sure it’s useful. For example: after the original manylinux PEP was accept, the numpy project went back and uploaded manylinux wheels for several existing releases. This seems like a useful and harmless thing to allow?

penguin_brian · April 20, 2020, 11:01pm

The problem is that there is no guarantee that the new wheels uploaded actually correspond to the code or other files from that revision. If a malicious person got access to a well known account, he/she could upload wheels for old versions and suddenly everybody who thought they were using tested known good versions will start using this malicious code. Or a well meaning person might upload a new wheel accidentally containing the wrong version or maybe even deliberately - e.g. containing a bug fix that they consider critical. My feeling is once a version is uploaded and published it should never change.

pf_moore · April 21, 2020, 7:30am

I get your arguments, but as a counterpoint, how would you propose, under your stated restrictions, that a project add wheels for Python 3.9 when that version comes out? Are you suggesting that they must cut a full new release (potentially a significantly greater overhead for many projects, compared to uploading new wheels)? How would they handle making older versions (potentially needed to satisfy dependencies that other projects might impose) available for Python 3.9?

There are benefits in immutable releases, but also practical needs for backdated binaries to be uploaded. Balancing those two requirements isn’t easy.

dstufft · April 21, 2020, 2:16pm

We’re likely going to keep the same restrictions in place for projects released using the current, “immediate” mechanism and the proposed draft mechanism. Any new restrictions like this should probably be it’s own discussion and should apply to all projects, regardless of how they were uploaded (given that to consumers, it’s not going to be obvious if a project was uploaded using the draft feature or not).

That’s not to say these features are inherently bad or good, but they deserve their own discussion independent of any other feature, and should apply universally (which means we’d have to figure out a way to cope with the differences between a released uploaded as a draft, and one uploaded immediately.

alanbato · April 23, 2020, 6:17am

I want to start by saying thank you all for your feedback and ideas, as well as your time.

This discussion has gotten a bit lengthy, so I will try my best to summarize it so folks can hop back into the discussion without going through all the posts. Aditionally, I’d like to list the re-state some points so we can all be on the same page.

These things are inside the current scope, and we are mostly certain we need to consider them in the final implementation:

This feature will be opt-in. If the projects don’t specify they want to create draft releases before publishing, they won’t see any difference in their current pipelines and uploading behavior.
The way someone would install this draft release would be by providing the --extra-index-url parameter with an obfuscated simple index generated for that draft release.
i.e. (https://pypi.org/draft/{SOME_HASH}/simple)
This index is specific for a certain release of a certain project, to be able to support the following use case:

These draft indexes should only live until the release is published or a time limit is exceeded, and should return 404 or 410 thereafter to discourage misuse. After a set period of time (a week?), a draft release is purged along with any relevant uploads.
The URL for the release page of a draft release will have the same format we use now. ( http://pypi.org/p/{project}/{version}/ )
We would add UI elements to indicate that what you’re seeing is a draft release. These URL would not be indexed or visible anywhere else, so only maintainers working on said release know how to access it.

These other things are considered out of scope, or already solved by our current implementation/tools:

Using the same Simple API endpoint, with a flag or something else specifying that you intend to work with drafts. i.e. https://pypi.org/simple/pip/?draft=1
This would probably require to go through the PEP process for an interoperability standard, and is not our goal.
Using this feature as a mean of release yanking. This functionality is now provided by PyPI as of this PR (yay!)

These other other things are still undetermined, and could use further discussion so we can approach a comfortable consensus if possible:

What the generated “hash” of the URL should be. These points were made.

How will package maintainers opt-in into this feature?
By the use of a server-side flag (auto_publish_releases) set in the PyPI project settings page? The benefit being that the adoption of this feature by the upload clients wouldn’t require any extra work.
Or should we instead pass a flag at time of upload?
i.e. twine upload --draft ...
this lets projects specify the behavior per-release in a more explicit manner, and might make it easier to onboard maintainers on to this new feature.

Please feel free to voice your opinion on the above points if you haven’t already, but most importatly I’d like you to direct the discussion so that we can reach a decision regarding the two points above, as these are still open questions.

ncoghlan · May 4, 2020, 5:07pm

At this point, I think my main recommendation for addressing the open questions would be to have a sample repo that uses the new API for release automation.

My own bias is towards making it so that automation scripts don’t need to query PyPI to find out whether or not drafts are enabled, or what the draft URL is, but I don’t think requiring queries would be an enormous problem either.

alanbato · December 19, 2020, 2:58am

Hi friends, after working on this on and off these past several months, I’ve made a PR to the warehouse repo implementing this feature. So please take a look and let’s continue the conversation there