Since we’ve been talking about which urls to use: It would be great if a future protocol would support a single (base) url for both fetching metadata and uploading packages. Currently, it’s a major user confusion for that publishing takes a different url than registry url for installation. This becomes more pronounced when using tools such as poetry and uv that integrate both installation and publishing.
Could you provide more detail on this confusion? I ask because I think in general, most users and uses aren’t really presented with the download and upload endpoints. They just use pip
or uv
or whatever and that takes care of things for them, modulo of course any additional indexes they may need to reference.
In my PR update to 694, when you create a session, the response returns a JSON structure to the client with explicit URLs to use for uploading artifacts, getting statuses, publishing the stage, etc.
For pypi this is abstracted away by being the default, but for alternative registries, the user currently needs to provide both URLs themselves, e.g. FR: Ability to publish to Azure Artifacts · Issue #7860 · astral-sh/uv · GitHub.
Currently, there is no standard for how authentication information is set on upload, which leads to incompatibilities between registries (Authentication for private registries · Issue #8221 · astral-sh/uv · GitHub). It would be great if the PEP could cover authentication. I hope that doesn’t feel like scope creep, this should only be a short section while ensuring that we can write client documentation that applies to all registries.
A simple option is codifying the current warehouse implementation:
- Username and password authentication are support through the standard
Authorization: Basic <base64_encode("{username}:{password}")>
. - Token authentication is supported through the same mechanism by settings the username to
__token__
and using the token as password. - All other authentication schemes (e.g. trusted publishing) produce tokens, to be used as specified above.
The only extension beyond HTTP basic authorization would be that __token__
becomes official.
It’s a reasonable request, so I think I’ll add it (to my PR, which I remind folks is still just a PR to PEP 694!). And I agree with your simple options, which essentially mirrors the situation in warehouse today IIRC.
I’ve been thinking more and more that PyPI/warehouse needs a proper, full REST API, and I’ve had the experience to help push that forward, so … I think I will! I’m still thinking about the way to do that, and it won’t block progress on 694, but my current thinking is that we need an Informational PEP laying out the principles for a REST API first, and authentication would be part of that PEP. I don’t mind repeating that in some future PEP though.
Actually, I don’t believe that’s supported for PyPI anymore. It’s probably better to create an API token anyway.
Tokens are clearly preferable than using your credentials with access to your entire account, the question is more on how to format this token into the upload request. This is relevant for implementing and documenting something like twine upload --upload2.0 --token ...
(or in my, case uv publish --token ...
), where we don’t want to drop the user into username/password after they just generated a token.
Please share any thoughts you have!
I pushed a small update to my update PEP to talk about authentication, but it essentially just builds on what PyPI/warehouse already support.
Caveat: I’m not a security expert, I’m writing this from the perspective of someone who writes docs and does user support for a client.
When a username and password are given, the standard[1] is Basic access authentication, the Authorization: Basic ...
header. For tokens, there doesn’t seem to be any standard. For example with trusted publishing, we pass the first token through Authorization: bearer ...
, the second one as json body {"token": "..."}
and then for the actual upload as basic access authentication Authorization: Basic ...
with username set to __token__
.
I have no particular stance on how the token should be encoded, other than codifying what pypi does seems convenient, but if we declared one encoding the canonical one, it would be great for writing a client that supports tokens for all registries.
I don’t know if it’s the only standard, but it’s the standard in the sense that all servers I’ve encountered are using it so it’s safe to offer
--username
and--password
options to the user and encode them this way ↩︎
Authorization: Bearer <token here>
is seen in various places.
One ref: Bearer Authentication | Swagger Docs
I think request authentication can be specified independently from this upload specification. Servers with different security strategies shouldn’t have to support an authentication method which doesn’t make sense.
For example, servers inside a closed VPN may not need auth at all. Or, a server with no separate auth service will use one-shot auth via request signing with an asymmetric key.
Instead, I suggest a specification for just PyPI, or at most for any free-for-all servers, for the minimum set of supported auth methods (or even just “a standard, well-known, not-deprecated auth strategy”).
Auth methods are likely to evolve, while this standard (hopefully) doesn’t need to. Uploading clients will need to update to stay secure.
I’m not a security expert, but I have a lot of experience building API clients. My opinion below is probably pretty biased by my experience, which is basically all OAuth 1 & 2, plus custom… let’s generously call those “protocols”.
Disagree with your first statement but agree with the second.
Having a single spec makes it easier to implement consistently until/unless it’s proven that this is a big enough topic that it needs to be split off.
Here’s what I would expect to see from a new API spec:
- servers MAY require authentication via an Authorization header
- if present, that Authorization header MUST be of the form
Bearer {token}
- if the server detects an invalid header, it MUST respond with a HTTP 401 Unauthorized
That covers authentication in an OAuth2-compatible way, but intentionally says nothing about authorization, roles, etc. Servers are not being required to implement OAuth2 or OIDC, but it’s possible for them to do so if they wish.
OAuth2 is pretty widely deployed these days.[1] The only three header forms I’ve seen in the wild are Bearer {token}
, bearer {token}
, and token {token}
. I’m pretty sure specs say it should be Bearer
, but I’m not certain.
Nothing should be said about the token format, and many services use JWTs for their tokens, so there’s some encoded and signed payload data inside of the header.
I’m sure some folks will read this and feel that other Auth methods, like basic auth, should be allowed. I think it’s better to standardize on tokens today, so that client implementations are simpler and truly compatible. If you have other needs, you can always encode data in the token.
Whether or not it’s the “best” spec, I have no idea. It’s at least flexible enough that many orgs have bent it to their needs. ↩︎
Practically speaking, Warehouse (PyPI) already supports three token authentication schemes:
Authorization: basic $BASE64_CRED_PAIR
Authorization: token $API_TOKEN
Authorization: bearer $API_TOKEN
In practice, I suspect that ~99% of current uploads use Authorization: basic
, with the credential pair being __token__:$API_TOKEN
. But they should all be strictly equivalent when using any kind of API token.
Edit: Permalink: warehouse/warehouse/macaroons/security_policy.py at f978eecde8e4a38fe7ac0b731824960eb4e1ba2e · pypi/warehouse · GitHub
I agree! In my PR to update PEP 694, I document how[1] PyPI currently does auth, and mention that this is index-specific. I don’t believe auth is inherently tied to API design, whether PEP 694 or a future REST-like API, and as you say, auth methods will likely continue to undergo evolution.
to my best understanding ↩︎
I’m not sure that’s even possible in practice. Alternative indexes have their own backward compatibility and general security principles to consider, so it’s not certain that any auth requirement will even be possible to support in other index implementations.
I’ve returned to this PEP for the first time in 2? 3? years after dozens of conversations at PyCon US 2025 that somehow invoked features PEP 694 would provide.
Seems the recollection I shared each time was indeed accurate, which is that I would very much like to see us offload the multi-part upload work to an S3 compatible multi-part upload interface rather than implement it as part of warehouse.
I know this alternative is mentioned in the PEP, and that the majority of the content in the PEP is about the implementation of uploads… In the interest of motivating the PEP to move as quickly as possible here’s what I’d like to propose:
PEP 694 is reduced in scope to remove the [File Upload](PEP 694 – Upload 2.0 API for Python Package Indexes | peps.python.org section), and a new key is added to the “Create an Upload Session” section for upload-mechanism
, and a contract for how to say “sorry dawg we don’t support that mechanism”.
Further responses can then include any necessary URLs, keys, etc that the user may need to execute that mechanism.
The PEP could? SHOULD? specify a simple
mechanism that is equivalent to our current “one-shot” synchronous upload.
Draft PR to better specify what the hell I’m talking about: PEP 694: Abstract file upload mechanisms by ewdurbin · Pull Request #4431 · python/peps · GitHub
I finally got a chance to review the changes. In general I like the direction you’re going in! I’m not sure my comments were completely coherent, given that I got distracted several times and ran out of gas at the end of a long week. Thanks for working on this! I’ll definitely take another look next week.
Reading through this PEP I was wondering why there was such an emphasis on resumable uploads. The only justification I can see is the following problem statement:
It does not support any mechanism for parallelizing or resuming an upload. With the largest default file size on PyPI being around 1GB in size, requiring the entire upload to complete successfully means bandwidth is wasted when such uploads experience a network interruption while the request is in progress.
Is wasted bandwidth due to network interruption a significant problem for python package indexes? Perhaps naively, I would have thought that the vast majority of python packages would be too small to benefit (e.g. < 1 MiB), and the few large packages, such as CUDA packages, would likely be uploaded from well connected infrastructure. Is there any data from PyPI on how many partially completed uploads they have, especially with respect to package size?
Or is the problem more about minimising state on the upload server, by allowing direct streaming of package uploads to an object store?
EWDurbin’s PR probably addresses this by making additional upload methods optional, but I’d be interested in hearing the original rational.
I can’t speak to @dstufft 's original motivation but you are probably right that most packages are small enough not to matter. I don’t have any data about failure rates for big packages, but I do think it’s worth providing the option for resumable uploads and chunked uploads, which is why I kept that option in my recent update, which represents the current published version.
We had some excellent conversations about 694 at PyCon and @EWDurbin 's PR is a great improvement, because it makes the common case easy and allows for index-specific enhancements, e.g. to support direct-to-S3 or other file upload protocols possible without having to go through the PEP process.
I’ve been too busy lately to do the latest review of the PR, but plan to finish that up this week so it can hopefully get published. PEP 694 brings other, much needed improvements to the upload experience.