PEP 694: Upload 2.0 API for Python Package Repositories

konstin · October 2, 2024, 4:07pm

Since we’ve been talking about which urls to use: It would be great if a future protocol would support a single (base) url for both fetching metadata and uploading packages. Currently, it’s a major user confusion for that publishing takes a different url than registry url for installation. This becomes more pronounced when using tools such as poetry and uv that integrate both installation and publishing.

barry · October 2, 2024, 4:24pm

Could you provide more detail on this confusion? I ask because I think in general, most users and uses aren’t really presented with the download and upload endpoints. They just use pip or uv or whatever and that takes care of things for them, modulo of course any additional indexes they may need to reference.

In my PR update to 694, when you create a session, the response returns a JSON structure to the client with explicit URLs to use for uploading artifacts, getting statuses, publishing the stage, etc.

konstin · October 3, 2024, 4:25pm

For pypi this is abstracted away by being the default, but for alternative registries, the user currently needs to provide both URLs themselves, e.g. FR: Ability to publish to Azure Artifacts · Issue #7860 · astral-sh/uv · GitHub.

konstin · October 18, 2024, 5:55pm

Currently, there is no standard for how authentication information is set on upload, which leads to incompatibilities between registries (Authentication for private registries · Issue #8221 · astral-sh/uv · GitHub). It would be great if the PEP could cover authentication. I hope that doesn’t feel like scope creep, this should only be a short section while ensuring that we can write client documentation that applies to all registries.

A simple option is codifying the current warehouse implementation:

Username and password authentication are support through the standard Authorization: Basic <base64_encode("{username}:{password}")>.
Token authentication is supported through the same mechanism by settings the username to __token__ and using the token as password.
All other authentication schemes (e.g. trusted publishing) produce tokens, to be used as specified above.

The only extension beyond HTTP basic authorization would be that __token__ becomes official.

barry · October 18, 2024, 8:52pm

It’s a reasonable request, so I think I’ll add it (to my PR, which I remind folks is still just a PR to PEP 694!). And I agree with your simple options, which essentially mirrors the situation in warehouse today IIRC.

I’ve been thinking more and more that PyPI/warehouse needs a proper, full REST API, and I’ve had the experience to help push that forward, so … I think I will! I’m still thinking about the way to do that, and it won’t block progress on 694, but my current thinking is that we need an Informational PEP laying out the principles for a REST API first, and authentication would be part of that PEP. I don’t mind repeating that in some future PEP though.

barry · October 18, 2024, 10:43pm

Actually, I don’t believe that’s supported for PyPI anymore. It’s probably better to create an API token anyway.

konstin · October 19, 2024, 10:59am

Tokens are clearly preferable than using your credentials with access to your entire account, the question is more on how to format this token into the upload request. This is relevant for implementing and documenting something like twine upload --upload2.0 --token ... (or in my, case uv publish --token ...), where we don’t want to drop the user into username/password after they just generated a token.

barry · October 19, 2024, 7:07pm

Please share any thoughts you have!

I pushed a small update to my update PEP to talk about authentication, but it essentially just builds on what PyPI/warehouse already support.

konstin · October 19, 2024, 9:10pm

Caveat: I’m not a security expert, I’m writing this from the perspective of someone who writes docs and does user support for a client.

When a username and password are given, the standard^[1] is Basic access authentication, the Authorization: Basic ... header. For tokens, there doesn’t seem to be any standard. For example with trusted publishing, we pass the first token through Authorization: bearer ..., the second one as json body {"token": "..."} and then for the actual upload as basic access authentication Authorization: Basic ... with username set to __token__.

I have no particular stance on how the token should be encoded, other than codifying what pypi does seems convenient, but if we declared one encoding the canonical one, it would be great for writing a client that supports tokens for all registries.

I don’t know if it’s the only standard, but it’s the standard in the sense that all servers I’ve encountered are using it so it’s safe to offer --username and --password options to the user and encode them this way ↩︎

merwok · October 20, 2024, 3:22am

Authorization: Bearer <token here> is seen in various places.
One ref: Bearer Authentication | Swagger Docs

EpicWink · October 20, 2024, 7:33am

I think request authentication can be specified independently from this upload specification. Servers with different security strategies shouldn’t have to support an authentication method which doesn’t make sense.

For example, servers inside a closed VPN may not need auth at all. Or, a server with no separate auth service will use one-shot auth via request signing with an asymmetric key.

Instead, I suggest a specification for just PyPI, or at most for any free-for-all servers, for the minimum set of supported auth methods (or even just “a standard, well-known, not-deprecated auth strategy”).

Auth methods are likely to evolve, while this standard (hopefully) doesn’t need to. Uploading clients will need to update to stay secure.

sirosen · October 20, 2024, 9:27am

I’m not a security expert, but I have a lot of experience building API clients. My opinion below is probably pretty biased by my experience, which is basically all OAuth 1 & 2, plus custom… let’s generously call those “protocols”.

Disagree with your first statement but agree with the second.

Having a single spec makes it easier to implement consistently until/unless it’s proven that this is a big enough topic that it needs to be split off.

Here’s what I would expect to see from a new API spec:

servers MAY require authentication via an Authorization header
if present, that Authorization header MUST be of the form Bearer {token}
if the server detects an invalid header, it MUST respond with a HTTP 401 Unauthorized

That covers authentication in an OAuth2-compatible way, but intentionally says nothing about authorization, roles, etc. Servers are not being required to implement OAuth2 or OIDC, but it’s possible for them to do so if they wish.

OAuth2 is pretty widely deployed these days.^[1] The only three header forms I’ve seen in the wild are Bearer {token}, bearer {token}, and token {token}. I’m pretty sure specs say it should be Bearer, but I’m not certain.

Nothing should be said about the token format, and many services use JWTs for their tokens, so there’s some encoded and signed payload data inside of the header.

I’m sure some folks will read this and feel that other Auth methods, like basic auth, should be allowed. I think it’s better to standardize on tokens today, so that client implementations are simpler and truly compatible. If you have other needs, you can always encode data in the token.

Whether or not it’s the “best” spec, I have no idea. It’s at least flexible enough that many orgs have bent it to their needs. ↩︎

woodruffw · October 20, 2024, 1:34pm

Practically speaking, Warehouse (PyPI) already supports three token authentication schemes:

Authorization: basic $BASE64_CRED_PAIR
Authorization: token $API_TOKEN
Authorization: bearer $API_TOKEN

In practice, I suspect that ~99% of current uploads use Authorization: basic, with the credential pair being __token__:$API_TOKEN. But they should all be strictly equivalent when using any kind of API token.

Edit: Permalink: warehouse/warehouse/macaroons/security_policy.py at f978eecde8e4a38fe7ac0b731824960eb4e1ba2e · pypi/warehouse · GitHub

barry · October 21, 2024, 3:46am

I agree! In my PR to update PEP 694, I document how^[1] PyPI currently does auth, and mention that this is index-specific. I don’t believe auth is inherently tied to API design, whether PEP 694 or a future REST-like API, and as you say, auth methods will likely continue to undergo evolution.

to my best understanding ↩︎

barry · October 21, 2024, 3:51am

I’m not sure that’s even possible in practice. Alternative indexes have their own backward compatibility and general security principles to consider, so it’s not certain that any auth requirement will even be possible to support in other index implementations.