Pypitoken - A library for generating and manipulating PyPI tokens

ewjoachim · March 8, 2021, 9:19pm

TL;DR: I’ve made https://pypitoken.readthedocs.io/en/latest and I think it would fit nicely under PyPA and use it in Warehouse. What do you think ?

For a little time now, I’ve had the opportunity to look at the PyPI API token (a.k.a macaroons) implementation from up close, on multiple aspects:

Finding security issues
Working on a PR on warehouse to automatically disable tokens when seen in a public GitHub commit
Creating (& subsequently fixing) problems in Warehouse around the implementation of those
Suggesting improvements (1, 2) to those tokens implementations
and seeing other people’s work go unmerged too

Macaroons were chosen as PyPI tokens for multiple reasons, including that they have a very interesting property: any bearer can add restrictions on a token to limit its scope. This is something that can be done locally by deriving a new macaroon with added Caveats.
This is partly implemented in Warehouse, which can generate user-wide or project-scoped macaroons, but it has never really been communicated that you, as a user, can generate your own project-scoped macaroon from a user-wide macaroon. Also, Warehouse only allows creating single-project macaroons, but the implementation actually makes it possible for a macaroon to be valid for a list of projects.
Finally, a lot of new restrictions were suggested initially and haven’t been implemented yet, I believe due to the complexity of adding them into warehouse as an already big codebase with limited maintainer time.

I realized recently that we would gain a lot to have the macaroon computations be delegated out of Warehouse for the following gains:

This would help document the different restrictions (caveats) that one can add to a macaroon, and making it much much easier to restrict one’s own macaroon
We could even implement this in tools such as twine
This would make the implementation in Warehouse much simpler and less error-prone
This would make implementing new kinds of restrictions much easier, at least for quite a few of them
This would ensure that the code for generating, adding caveats and checking macaroons is all consistent: it’s in the same codebase.

So I’ve gone ahead and created pypitoken:

It’s a small & well documented library, that I’ve written with the goal of it being the missing link described above. If you want more details, I encourage you to ~~crashtest~~ try out the documentation.

I’ll start the work of proposing an integration into Warehouse, but in parallel, I wanted to get opinions, and, if you folks are interested, start discussing about putting this library under PyPA. As a PyPA member, according to PEP 609, I can ask a PyPA committer to put this to vote. It might be interesting to make a first round of feedback first before jumping to that, but if any of the PyPA committers reading this wants to go ahead and start this process, it can work too.

We could wait for this package to be used in Warehouse before deciding to have the PyPA adopt it (or not), or we could do it the other way around. I believe that both actions would help the other one make more sense. If this package isn’t used by Warehouse, it still makes sense to develop it as the client for adding restrictions to tokens and thus it makes sense (to me) to have it under PyPA. And if Warehouse were to add this, while it would make me more comfortable that PyPA controls a package linked to PyPI security, it’s not an obligation (for example, PyMacaroon). So all in all, the 2 choices are not necessarily linked.

Lastly, whether this project is adopted by PyPA or not, if anyone is interested to contribute or help maintain it, please get in touch

steve.dower · March 11, 2021, 9:51pm

This sounds amazing, assuming I’m reading your writeup correctly (and the docs, which are indeed very good!)

So PyPI already supports validating these tokens, and just not generating the more complex varieties? That’s great, and I’d enthusiastically support making this project as official as needed to help it.

Though I’d assume that integrating into Warehouse is more about UI? Presumably they can already add more caveats to when generating tokens easily enough.

ewjoachim · March 11, 2021, 11:42pm

So PyPI already supports validating these tokens, and just not generating the more complex varieties?

Yes, but not exactly. PyPI supports generating:

tokens without restrictions,
tokens with restricted to a single project for which the use has rights

And checking:

tokens without a restriction (of course)
tokens restricted to an arbitrary number of arbitrary projects (it just checks that the current upload is included in those projects)

A user doing a bit of reverse engineering, reading Warehouse code, OR using pypitoken can generate tokens in the slight gap between what Warehouse can generate and what Warehouse can verify, but this gap is really small.

For now pypitoken doesn’t add advanced restrictions yet. I’ve paved the way for those to be included but that’s all. As of today, switching Warehouse token generation to pypitoken would not add nor remove features. The fact that the lib exists as a local tool for users adds a few features to Warehouse (e.g. this) (they were already possible before but undocumented)

Your question did highlight some lack of clarity in the doc, so I’ve taken the opportunity to update this section.

Though I’d assume that integrating into Warehouse is more about UI? Presumably they can already add more caveats to when generating tokens easily enough.

The important parts of integration into Warehouse are:

token generation: this part has the UI all done already
token verification: this is the part where each type of restriction must be included. More precisely: if we use pypitoken, Warehouse won’t need to know how tokens are verified, but it will need to provide the appropriate context needed for the verification, and the more restrictions we’ll implement, the more context Warehouse will need to provide.

And then there’s the step of adding restrictions. This one, as you noted, needs adding more UI. As of today, it’s expected that if we add new restrictions in PyPI, we need to add some UI to generate those restrictions, but with pypitoken, it’s not true anymore. You can generate a “full” token, and then apply restriction locally in a python shell or script. Of course, it’s possible to implement applying those restrictions in Warehouse directly through UI, but what used to be a requirement is now just a possibility.

I hope I’ve answered your questions, and thank you a lot for your kind message !

ewjoachim · March 20, 2021, 10:55pm

The Warehouse integration PR is here
Also, I’ve created a twine ticket to start discussing integration there too
Also, poetry

sigmavirus24 · March 21, 2021, 11:15am

I’ve weighed in on the twine discussion, but I’m not seeing the value here. This feels a lot like a library in search of a problem rather than a solution to a problem we have.

ewjoachim · April 26, 2021, 9:30pm

I realize I didn’t answer you here.

To me, the main problem we’re trying to solve is that the PyPI tokens where created with a lot of ideas in mind, and so far we haven’t been able to fully execute those because of the complexity of writing them as part of warehouse.

Having the token be in their own lib with supporting documentation, and simpler environment should allow for easier integration of new kinds of restrictions such as the ones mentioned in the original ticket.

This is, to me, exactly the same reasonning that led to readme_renderer. Putting independant & complex parts of PyPI into their own lib to ensure the changes we make are voluntary, meaningful, thouroughly tested, (and encourage reusability)…

And then, all the part on being able to add restrictions locally to your token is sugar on the top. But even if we don’t follow the path of restricting tokens just before using them (as suggested in the twine issue), it still makes sense to add as much general restrictions as possible before storing the tokens, and this will be made easier if we reduce the cost of developping new restrictions

Is it clearer ?