PEP 694 -- PyPI upload API 2.0 (Round 2)

(Apologies in advance for the long reply! You got me thinking, which is always dangerous. :sweat_smile:)

I’ve been mulling this over a lot, especially the out-of-band stage token/URL calculation and the nonce. I’m pretty much convinced at this point that we should let the server calculate the stage token/URL (if it supports stages) and give up on the out-of-band property, as cool as it might be.

Thinking this through then, the PEP would change roughly as follows:

  • In the session creation request payload, we get rid of the optional nonce key.
  • The session creation response body, doesn’t have to change. It still returns the session-token and links.stage keys, and they MUST still be omitted if the index doesn’t support staged previews.
  • We remove the entire Publishing Session Token section from the PEP, and just say that if the server supports staged previews, the stage token/URL SHOULD (MUST?) be cryptographically unguessable[1]. We leave it up to the index to figure out the details.
  • This means that any client that wants access to the preview stage, either by direct URL or stage token (e.g. ${TOOL} install --staging ${TOKEN}) must either get it from the tool that created the session, or query the server directly (more on that below).

In exchange for giving up the out-of-band calculation, we gain a simpler protocol, and flexibility on the index side to use any algorithm it wants to calculate the stage token/URL, and this can change without needing any future PEP if the need arises. That seems like an overall win to me.

How do clients other than the original session creating tool get the preview stage token for a package+version upload session? There are two ways in the current PEP.

It can POST to the session creation endpoint the same package name + version tuple as before. About this, the PEP currently says:

If a second session is created for the same name-version pair while a session for that pair is in the pending state, then the server MUST return the JSON status response for the already existing session, along with the 200 OK status code rather than creating a new, empty session. This is effectively a “get the session status” request.

It can also create a file upload session within the overall publishing session[2], and that response includes a links.publishing-session URL which is defined as

The endpoint where actions for the parent Publishing Session can be performed.

Doing a GET to this URL returns the session status with the exact same response as the session creation response payload described above, effectively giving you the links.stage URL and session-token as before.

This seems awkward to me for a couple of reasons:

  1. There’s more than one way to get the session status.
  2. There’s a possible race condition, although I think it’s mostly harmless. E.g. your uploading action and your CI testing action could race on the session creation request, and whichever one gets there first does the actual session creation. I think it’s harmless because session creation is idempotent and keyed off the same data, i.e. the package name and version (and of course, both are implicitly gated on having the same correct authentication credentials. But I could be missing something that makes this more serious.
  3. Getting the session status through the file upload session creation request’s links.publishing-session key could lead to creating a file upload session just to get this information, without actually having a file to upload. Sure, the file upload session can be immediately canceled if it’s only to get the session status URL, but that seems like a wart.

I don’t have any wonderful ideas for cleaning this up. Does anyone else think this is a problem?

I see further replies from @mgorny and @EpicWink so I’ll leave this response here, and will reply to them momentarily.


  1. is that even a term? ↩︎

  2. yes, too many “sessions” but we haven’t found any better way to describe these bits of the protocol ↩︎

2 Likes