PEP 694: Upload 2.0 API for Python Package Repositories

alanbato · July 13, 2024, 3:47am

This is great stuff!
I’d be happy to help in any way to gain more traction on this. I’ll answer what questions/comments I can regarding the PR that I made proposing an implementation for some of the features mentioned in this PEP.

Yes, the PR creates an md5 hash of the project name + the version number of the release you’re drafting. It could definitely be something else, but as you mentioned, it is still an open question how obscure or how reproducible we want that to be. Ideally, it’s something that only maintainers and the systems they control can use, without requiring authentication. I landed on the current hash as a middle point between “it’s always <my_project>-draft for ease of use!” and “it’s random UUID string”.

One important difference is that in my implementation there were no proposed changes to the API responses, so there was no way to give that random string back to upload clients, and thus had to be shared through the website. This PEP changes that constraint, and opens up more possibilities on how that draft url can look like

If I recall correctly, in the PR implementation the project does get created, but with no versions listed on the simple index, and I think you’d get a 404 if you were to navigate to https://pypi.org/project/my-package. The project and release both exists, but the single draft release wouldn’t be discoverable.

Yes, using the current PR you’d be able to test the installation of your draft release for package A with the draft releases of B and C by supplying a --extra-index-url parameter for each package. I don’t recall if I tested this use case specifically, but I have a strong recollection of talking about it and it being supported under that implementation.

If we were receiving the release metadata from warehouse in the form of draft:token, then it could be possible to pass a --draft <token> param to clients and have them or warehouse resolve that into the corresponding repository to get the draft release from. However, as I understand it this would mean a separate proposal for client implementations to agree on taking and using that new token through the --draft param. Although the UX is a bit less nice, the reuse of --extra-index-url gives us “free” compatibility with implementation clients that already support it.

I believe there’s a big difference between both approaches due to the way the Pyramid maps URLs to views using a Traversal algorithm, and if I recall correctly the draft hash needs to be present as part of the /path/to/release. But I might be wrong! It was my first time writing Pyramid code.

Yes, it’s incorrect! The deletion of draft releases is intended, and one of their main benefits. I remember testing the uploading, then deleting, then re-uploading of a release during development. I think I simply forgot to update that admonition text

barry · July 16, 2024, 8:25pm

Thanks for the detailed response, Alan! In order to hopefully manage the responses a little better, I’ll respond to each topic in turn.

I think we can have our cake and eat it too, with a small modification to the session creation protocol as described in the PEP. Let’s say we add an optional nonce or token key to the POST payload, taking a string value. Then, if it’s missing, the server uses a hash of the name+version. If the token is provided, it contributes to the hash.

This way, the session owner is in control of the obscurity of the session URLs. If they don’t care, they can omit the token, and if they do care, they can provide the token and still calculate the hash fully on the client side. I don’t think we’d need to change the response payload, although it probably wouldn’t hurt. I think it’s enough that the PEP fully describes the algorithm used to create the hash.

Re: including the hash in the final path component for URLs vs in a query parameter:

It’s been quite a while for me, so I’m rebooting my baseline, especially with the hybrid approach in warehouse, so it’s something I’ll look into. I think the important point is that this lies on the implementation side of the equation, not the PEP standard side, since the server always returns the URLs to the client. Thus I don’t think we need to mandate this, and alternative PEP 694 implementations can choose otherwise.

barry · July 16, 2024, 8:27pm

That makes sense. I think we want the PEP to be explicit about this so that we don’t leave it up to the implementations to provide possibly incompatible behavior (making it trickier for clients to be comformant).

barry · July 16, 2024, 8:30pm

Maybe that’s good enough. Since we get it for free, it’s certainly a convenient first step. We should allow for clients to provide a better experience, but could leave it to them to decide on the UX, in which case…

…maybe it is better after all to explicitly return the hash in the response.

barry · July 16, 2024, 8:30pm

No worries, glad to know my understanding is correct!

Again, thanks for the great responses.

woodruffw · September 24, 2024, 3:49pm

Apologies for the necro-post here, but I want to leave a notice of intent so others in the community are aware of it: over the next few months, myself and a few others from my team (CC @facutuesca) will be working on an implementation of PEP 694 for PyPI. We should have more details shortly, and we’ll post regular updates on the PyPI issue tracker as we have for other PEP implementations

(We received funding to do this from Alpha-Omega, who are also funding a few other general “project lifecycle” maintenance tasks on PyPI.)

pf_moore · September 24, 2024, 4:00pm

As far as I can see, the PEP is still in draft status. Presumably you will not be releasing the implementation to live before the PEP is marked as final? Do you have any indication of what’s happening on that front? The PEP doesn’t even have a PEP-delegate assigned at the moment.

barry · September 24, 2024, 4:59pm

This is great news, though I should mention that I’ve also been working on this for a while. I posted a list of questions earlier and have been considering proposing concrete changes to the PEP. I also branched @alanbato 's original PR and have been working through merge conflicts as main advances, and repairing test failures. I’m pretty far along but of course would welcome collaboration.

I’m willing to be either a PEP sponsor or co-author (if my proposed changes are accepted, though I haven’t drafted a PEP PR yet).

pf_moore · September 24, 2024, 5:09pm

I should note that @dstufft would normally be PEP-delegate, as he’s responsible for PyPI-related PEPs, but as he’s the PEP author we need someone else. I’m not sure I have the necessary expertise to be a suitable delegate^[1], so if someone is interested in volunteering, please speak up!

although if we’re stuck, I’d be willing to handle it as best I can ↩︎

woodruffw · September 24, 2024, 5:26pm

Yep, that was my thinking – we’d push for the PEP to become provisional (with changes as necessary) before exposing a public, stable version of the new upload endpoint.

Sorry for missing this! I would be extremely happy to collaborate on the implementation + have us help however we can with the PEP changes.

barry · September 24, 2024, 5:35pm

No worries, and yay! Let me make a pass at my proposed changes to the PEP and have you take a look. We can drive both for PEP acceptance and implementation PR in parallel. I’ll post the PEP PR here when ready.

pradyunsg · September 24, 2024, 5:53pm

FWIW, we could use https://upload.pypi.org/experimental-pep-694/ or a similar unwieldy and unambiguously “not forever” name before this PEP becomes provisional (restricting it to a hard-coded list of packages even) if we need to test out the design with a live thing on PyPI.

If we do that tho, ideally we’ll also have a clear plan to move to https://upload.pypi.org/[maybe, something path under that] directly once the PEP is provisional.

woodruffw · September 24, 2024, 6:08pm

Yep, that’s exactly what I was thinking! IIRC we did basically that for Trusted Publishing (first a closed beta, then a public beta). Technically that didn’t require a PEP process since it was unique to PyPI, but the process we followed there should be suitable here as well

barry · September 24, 2024, 6:10pm

I’m not 100% sure we need to do this but if we decide we do, would we need to outline this transition plan in the PEP?

barry · September 24, 2024, 6:20pm

Actually, thinking about this a bit more, why not just propose https://upload.pypi.org/2.0 as the provisional-for-now-eventually-to-be-forever URL? This lets us potentially rev the upload protocol in the future why still maintaining the PEP 694 protocol in parallel for some time. Yes, the API version number is repeated in the POST JSON, but that seems fine.

Technically, in terms the PEP uses, this would be the root URL for PyPI, but allow alternative indexes to choose any other root URL they want.

pradyunsg · September 24, 2024, 7:37pm

I don’t think so but I’ll defer to folks more familiar with pypi.org.

My thinking was that making bigger changes would be easier, expectations are clearer and it’ll be easier to sunset if needed.

Here’s the “if I had more time, I’d have written a shorter letter” part of this message:

“oh, a smaller change means existing clients will keep working” vs “let’s make the bigger change since we’re going to get rid of this anyway & this is for experimenting with”.
Having an experimental-and-will-go-away endpoint makes it significantly less likely that package authors will rely on it. Even when they do, it does a clear job of setting expectations that things can change.
It’ll be explicit when the API is deemed stable enough for “proper” usage.
It’ll be easier to provide a clear error message that “hey, you’re trying to use the old protocol”.
If this ends up having longer time scales due to volunteer availability, there’s a higher risk of earlier issues happening, especially package maintainers actually starting to rely on this functionality, and setting expectations clearly can help avoid certain kinds of frustration.
If we decide not to do this for some reason, yanking this out will be way easier when it was gonna be removed all along.

konstin · September 25, 2024, 3:57pm

Great to see this is moving forward!

From the perspective of a client implementer, I’d prefer starting with the final upload URL and leaving the provisional status handling to the client.

As a client, the upload interface isn’t different whether it’s https://upload.pypi.org/experimental-pep-694/ or https://upload.pypi.org/2.0. To support this feature, we’d a --staged-publish flag that switches from legacy uploads to uploads 2.0 and switches default publishing URL to the new upload URL. As long as the feature is in preview, it would ask users to also pass the --preview flag to ensure opt-in to this unstable, and when the feature gets stabilized we’d remove the --preview flag. It’s also convenient because it doesn’t break when the provisional state gets stabilized without breaking changes, the --preview flag just becomes redundant.

barry · September 25, 2024, 11:35pm

I’ve just submitted a PR that updates PEP 694 with topics discussed here and elsewhere. Please take a look!

Changes include:

Formatting and phrasing.
Added myself as a co-author (though I don’t want to step on @dstufft 's toes so I’m happy to revert this)
Proposed the root URL for PyPI to be https://upload.pypi.org/2.0 although we may want a provisional root URL while the implementation is still in its experimental phase.
Added an nonce string to the session creation request JSON, which allows clients to decide whether staged previews are easily guessable or not.
In the session creation response JSON, rename the draft subkey of the urls key to stage.
In the session creation response JSON, add status and cancel subkeys to the urls key.
Describe the expected behavior when this API is used for the first upload of a project.
Fix the chunked upload header examples, and provide examples for both the first and second chunk upload.
Describe how to replace a partially or fully uploaded file in a staged release before the stage is published.
Describe that it is an error to publish a stage that has no files uploaded to it.
Elaborate on how the session token is calculated from the hash of the project name, version, and optional nonce.
Elaborate on how staged previews can work; make this optional for indexes to support.

merwok · September 26, 2024, 1:45am

Was there a thought about prefixing the non-standard headers with PyPI- or X-?

The latter was a geneal recommendation that is now deprecated, as there are non-standard protocol headers that become standard and migration concerns prevent renaming. But here you’re not defining generally useful protocol-level headers, rather custom use application-level ones. Prefixing would avoid conflicts and send a signal to human readers.

barry · September 26, 2024, 4:53pm

I believe the Upload-* headers are taken from the tus.io specification. This is covered in the PEP’s FAQ.