PEP 694 -- PyPI upload API 2.0 (Round 2)

429 is another option.

(Apologies in advance for the long reply! You got me thinking, which is always dangerous. :sweat_smile:)

I’ve been mulling this over a lot, especially the out-of-band stage token/URL calculation and the nonce. I’m pretty much convinced at this point that we should let the server calculate the stage token/URL (if it supports stages) and give up on the out-of-band property, as cool as it might be.

Thinking this through then, the PEP would change roughly as follows:

  • In the session creation request payload, we get rid of the optional nonce key.
  • The session creation response body, doesn’t have to change. It still returns the session-token and links.stage keys, and they MUST still be omitted if the index doesn’t support staged previews.
  • We remove the entire Publishing Session Token section from the PEP, and just say that if the server supports staged previews, the stage token/URL SHOULD (MUST?) be cryptographically unguessable[1]. We leave it up to the index to figure out the details.
  • This means that any client that wants access to the preview stage, either by direct URL or stage token (e.g. ${TOOL} install --staging ${TOKEN}) must either get it from the tool that created the session, or query the server directly (more on that below).

In exchange for giving up the out-of-band calculation, we gain a simpler protocol, and flexibility on the index side to use any algorithm it wants to calculate the stage token/URL, and this can change without needing any future PEP if the need arises. That seems like an overall win to me.

How do clients other than the original session creating tool get the preview stage token for a package+version upload session? There are two ways in the current PEP.

It can POST to the session creation endpoint the same package name + version tuple as before. About this, the PEP currently says:

If a second session is created for the same name-version pair while a session for that pair is in the pending state, then the server MUST return the JSON status response for the already existing session, along with the 200 OK status code rather than creating a new, empty session. This is effectively a “get the session status” request.

It can also create a file upload session within the overall publishing session[2], and that response includes a links.publishing-session URL which is defined as

The endpoint where actions for the parent Publishing Session can be performed.

Doing a GET to this URL returns the session status with the exact same response as the session creation response payload described above, effectively giving you the links.stage URL and session-token as before.

This seems awkward to me for a couple of reasons:

  1. There’s more than one way to get the session status.
  2. There’s a possible race condition, although I think it’s mostly harmless. E.g. your uploading action and your CI testing action could race on the session creation request, and whichever one gets there first does the actual session creation. I think it’s harmless because session creation is idempotent and keyed off the same data, i.e. the package name and version (and of course, both are implicitly gated on having the same correct authentication credentials. But I could be missing something that makes this more serious.
  3. Getting the session status through the file upload session creation request’s links.publishing-session key could lead to creating a file upload session just to get this information, without actually having a file to upload. Sure, the file upload session can be immediately canceled if it’s only to get the session status URL, but that seems like a wart.

I don’t have any wonderful ideas for cleaning this up. Does anyone else think this is a problem?

I see further replies from @mgorny and @EpicWink so I’ll leave this response here, and will reply to them momentarily.


  1. is that even a term? ↩︎

  2. yes, too many “sessions” but we haven’t found any better way to describe these bits of the protocol ↩︎

2 Likes

Thanks @mgorny.

As you’ll see in my latest response to @woodruffw I’m now in the “let’s get rid of the nonce” camp, so the whole question of the gentoken() algorithm goes away.

Re: RFC 3339. Totally agree, and I already made this change in my draft PR. Thanks!

Good callout, thanks. I do think this needs tightening up, so I’ll try to address this in my PR (currently in draft).

I’d forgotten about the Location header when I posted my response to @woodruffw above[1]. I wonder if that gives us a way out of the “wart” I described? :thinking:

I like the text you propose, but now that I’m reading it, I think that “should” should become MUST.

429 Too Many Requests seems more aimed at rate limiting requests rather than semantically disallowing parallel updates. 409 Conflict doesn’t seem like a perfect match either. We already use 422 Unprocessable Content in this context for an unsupported upload mechanism, so I think 409 makes the most sense. I think the update there is just to switch the last two sentences of that paragraph, i.e.

The server MAY allow parallel
uploads of files, but is not required to. If the server determines the upload cannot proceed, it MUST
return a 409 Conflict.


  1. protocols are hard! ↩︎

1 Like

If you’re expecting requests to perform idempotent actions, would PUT be a better fit?

Getting status and session token sounds like GET to me: either a well-defined URL based on name and version which gets the session (or 404 is there isn’t any: TBD the behaviour in historical sessions), or well-defined URL based on name which gives a list of all sessions (query parameter to select only unfinished sessions), or both.

3 Likes

This all sounds great to me! My 0.02c would be for MUST on unguessable tokens – I think it’s nice to have that be a property of the protocol itself, and then let the server decide about disclosure. But it’s a relatively minor point :slightly_smiling_face:

I don’t have any great ideas for cleaning this up, but I agree with your appraisal about it being a bit of a wart. I like @EpicWink’s point about perhaps using PUT for this, to emphasize idempotency though!

Thinking out loud: maybe it’s OK for it to not be super easy to re-retrieve the session token? On the uploading side I would expect the uploader to only need to obtain it once (on session creation), after which they can pass it around internally like they’d do with e.g. a trusted publishing minted token. So perhaps it’d even be OK to allow only one POST, and then reject subsequent POST or PUT requests with the same tuple. But I don’t have a strong intuition about that!

Sounds good to me. Okay, the nonce is gone and I’ve rewritten this portion of the PEP to guarantee that both the session token and stage URL must be cryptographically unguessable, and that it must be possible to calculate the URL from the token, leaving the token generating algorithm and exact URL format up to the index.

1 Like

Having built many REST APIs, this definitely came to mind, but as the Upload API 2.0 actually isn’t a RESTful API, the distinction between POST and PUT seems less, um, pedantically urgent :grin:[1].

Yah, thinking further, this leads to:

  • Disallow multiple session creation requests. If a package+version session is in the pending, processing, or complete state, a second session creation request with the same tuple returns a 409 Conflict instead of being accepted. We include a Location header in this response to point to the session status URL. I thought about returning a 3xx response in this case, but 409 Conflict seems best.
  • We remove the links.publishing-session key from the file upload session response. You just need to keep hold of them from the session creation request.
  • Both the Location header and the links.session key in the session creation response still point to a URL you can GET to get the session status, just as before.

I’m also rewording and moving the “multiple session creation requests” section to both describe the above, and to describe what happens for multiple requests when a session is in the error or canceled state. Essentially, the response is the same as if the previous session tuple was never given, except that the session status Location header, session-token, and links.stage URLs MUST be different.


  1. I’d very likely be much more of a stickler for that if this was a RESTful API, but that ship has sailed for this PEP ↩︎

1 Like

I like all of this! I think this additionally makes implementing the PEP a bit easier as an uploader tool, which I’m of course for :wink:

2 Likes

I think I’ve addressed all outstanding feedback so far, all of which is greatly appreciated! I’ve flipped the PR from Draft to Ready.

3 Likes

Thank you all for your feedback. I’ve published an update which should address all known concerns and questions. I’m not starting a new thread since I hope at this point we’ve settled into something very close to, if not the final version of the PEP.

I’ve added a list of high level changes so you should be able to track what’s new in this version.

4 Likes

Sorry for the delayed bump!

I did another read through of PEP 694, and I think it’s great. One thing I wanted to call out: the current draft defines its own error response schema:

Clients in general should be prepared to handle HTTP response error status codes which MAY contain payloads of the the following format:

{ “meta”: { “api-version”: “2.0” }, “message”: “…”, “errors”: [ { “source”: “…”, “message”: “…” } ] }

Instead of a custom schema here, do you have opinions on aligning this with an RFC, e.g. RFC 9457? I recently made a similar change to PEP 807 to use the standard “problem details” format, with the idea being that it’s uniform and HTTP clients can often handle it gracefully out of the box. I was also hoping to (soon) propose a change to the standard index APIs, to encourage error responses to use these “problem detail” formats too:

(The nice thing about that RFC is that it’s pretty flexible – as long as the overall shape conforms to a “problem detail,” we can include custom keys like meta to convey the api-version, etc.!)

3 Likes

Thanks for the suggestion and reference @woodruffw! I wasn’t aware of RFC 9457, but 100% agree we should align PEP 694 error schema to it. I’ll take a look through PEP 807 and work on a PR to update 694.

2 Likes

I think this PR does the trick, with the relevant errors section being defined.

The update has been published.

2 Likes