PyPI security work: multifactor auth progress & help needed

(Sumana Harihareswara) #1

Hi, Python packaging colleagues! As Ernest blogged last week, a team has kicked off work on improving Warehouse security, accessibility, and internationalization. See the blog post & links for more details and who’s working on what, but our first milestone is:

  • Support for two-factor authentication via TOTP and U2F/FIDO.
  • Application-specific tokens scoped to individual users/projects (this will also cover adding token-based login support to twine and setuptools)
  • Advanced audit trail of user actions beyond the current journal (allowing publishers to track all actions taken by third party services on their behalf).

As project manager, I’ll be sending progress reports about twice a month, and posting meeting notes on the wiki.

Engineer William Woodruff of Trail of Bits is working on TOTP support. UX designer and developer Nicole Harris is reviewing that work, working on relevant help text, and developing the user experience that multi-factor auth and our other objectives will require.

And today a few of us discussed several open issues. If you’d like to help out, we’d love volunteer help with:

Want to help? Check out our Warehouse’s developer environment setup docs and tell us if you have trouble getting started!

And please tell me if you’re planning to join us at sprints at PyCon North America, May 6th-9th, so we can plan tasks.

More next month, including some schedule estimates.

Thanks to the Open Technology Fund for funding this work!

Sumana Harihareswara, Warehouse project manager


(Dustin Ingram) #2

I’d be particularly interested to hear if folks have thoughts about how to do this is a easy, secure way. We generally try to avoid doing introspection into the individual distributions, but it seems this would require at least some interaction with the file contents.


(Cooper Lees) #3

I don’t want to scope creep, but this could also be extended to make sure .whl files are .zip like as well. With a check that mandatory files are present.

1 Like

(Dustin Ingram) #4

We already do that:

1 Like

(Brett Cannon) #5

More scope creep: can we just upload zip files instead of two different file types for .tar.gz and .zip via .whl? :wink: (I know this isn’t going to happen, but I can dream …).


(Gregory P. Smith) #6

The code already pokes inside zip/whl/egg files as you’ve noticed (zip contents validation). PR created, but I do agree with Brett, long term we should pick a single archive format and only support that on PyPI. An interim period where we auto-convert from the other formats to the desired one would make sense.

If we care about storage and network bandwidth costs zip and .tar.gz are both less than ideal. But a single format is a larger conversation. #dream


(Nathaniel J. Smith) #7

There were some long threads about file formats on PyPI back in 2016:

That eventually led to PEP 527. The biggest change in PEP 527 was to slim down PyPI’s support matrix from a dozen+ file formats to just 3: wheel, .tar.gz sdist, and .zip sdist. There was a lot of debate about whether to reduce that further to just one format of sdist, and if so, whether it should be .tar.gz or .zip. @dstufft’s original PEP draft made .tar.gz the only support format, IIRC based on it being by far the most common in actual usage, but it looks like the final draft relaxed that.

Then in PEP 517, we needed to specify a generic interface for producing sdists, and we made it use .tar.gz, basically because that’s what @dstufft liked better:

I think our options going forward are:

  • Continue to support both .zip and .tar.gz sdists on PyPI (status quo).
  • Drop support for .zip sdists. This would be pretty simple: we’d stop allowing .zip uploads, and that’s basically it. We’d be committed to handling both .zip and .tar.gz everywhere forever, for wheels and sdists respectively. This obviously isn’t a prohibitive cost (we do it now), but is some cost.
  • Drop support for .tar.gz sdists. This would be more involved: we’d have to first update PEP 517, and change setuptools to default to generating .zip sdists. (And ideally make the same change to other build backends, like distutils and flit.) And after that was deployed for a bit and people adapted to .zip sdists being common, then we could stop allowing .tar.gz uploads. The upside would be that .zip is somewhat easier to work with programmatically (since it supports random access), and that it matches wheels. And the downside of course is the transition costs.

(It’s an interesting side-effect of using discourse, that suddenly these conversations are drawing in folks who didn’t follow this history!)


(Gregory P. Smith) #8

I made a PR for this. I don’t have anything beyond normal-peon access to the warehouse repository so someone else will have to assign/review/merge if desired. This was my first time touching the warehouse codebase. “such docker. oh my.”


(Sumana Harihareswara) #9

Thank you @gpshead for your PR!


(Sumana Harihareswara) #10

Hi, Python packaging colleagues!

We continue to work towards our first goal: support for two-factor authentication on PyPI via TOTP and U2F/FIDO. William and Nicole are continuing their development and design work as I mentioned in the last update, with additional work by Mark Mossberg at Trail of Bits, plus Ernest, Dustin, and Donald advising and reviewing.

We are working out our rollout plans for multifactor auth, and so we don’t yet have an estimate for when we’ll deliver that and when we’ll start the API keys or audit trail work. But the existing work-in-progress PR for MFA is ready for you to try out and play with now, and we’ll have more for you to try out next month at the PyCon sprints.

Want to help?

Thank you @gpshead for your PR to validate whether uploaded packages ending in tar.gz are actually tarballs!

And please speak up in this topic if you’re planning to come to sprints at PyCon North America, May 6th-9th, so we can plan tasks.

We’ll send another progress report around mid-month. That’s also when PSF aims to announce another Request For Information for Warehouse security improvements: “highly requested security features in PyPI such as cryptographic signing and verification of files uploaded and installed from the index” (possibly using TUF).

Thanks to the Open Technology Fund for funding this work!

Sumana Harihareswara, Warehouse project manager


(Adam Englander) #11

Installation and setup was a breeze. I’ll be there for four days of sprints and would love to contribute. I am the architect and former security lead on a Python based MFA API. I hope I’ll be able to lend some SME help along with knocking out tickets. :monkey_face:

1 Like

(Wes Turner) #12

How does the plan to implement TUF (and maybe webauthn) dovetail with this work?

What issues need more attention from volunteers with which skills?

1 Like

(Wes Turner) #13

What’s the GH Issue # for the API keys work; is the system being designed so that I can create a per-package key (so that I’m not delegating all privileges to my CI builds and CI build systems)?


(Justin Cappos) #14

We’re excited to help with the encryption support aspect whatever is decided. Just point us at the best way to dive in. I can’t personally make it but I can check to see if someone from our team can attend PyCon if being there in person would be a major help…