RFC: improving pip security with package signing (PEP-458)

I’m hoping to get some eyeballs on a pip (draft) PR of mine: It implements PEP-458 in pip.

In short pypi.org will soon provide signed metadata that clients can use to verify that the package they download is what was originally uploaded to pypi.org and that the supply chain has not been tampered with. In the future the same infrastructure can be used to verify developer signatures as well.

My Pull Request has all the details but this link is probably better for code review purposes (it does not include the vendored dependencies). I’m looking for

  1. high level review (I’m not a pip expert so may have integrated at wrong levels of abstraction)
  2. discussion on a couple of design aspects
    2.1 progress indication: currently disabled
    2.2 control over downloads: TUF (a new vendored dependency) controls download details
    2.3 threads: multithreading (in pip list -o/-u) is disabled in the PR

I’ve tried to explain much more in the PR cover letter. I’m hoping to get some discussion going on the PR but am happy to talk here as well.

I’m hoping the feature can be tested soon, but as of now a test server is not yet available.

3 Likes

As of November 13
https://github.com/pypa/pip/pull/9041#issuecomment-726758609 there are
some questions from Jussi that could use replies from folks here.

Hi there,

the PEP-458 talks about package signing that would be provided by PyPI. Are there any plans or a listing of “minimal requirements” that need to be met for self-hosted packages (outside of public PyPI)? Or is PyPI considered the implementation to host Python modules to support signed packages?

Thanks in advance for any replies!
Fridolin

Hi Fridolin,

I can share an overview – I’ve mostly looked at the client side issues but as high level description this should be match current plans for the backend too.

The work being done on the client side (pip) should be fully compatible with other repositories that provide signed metadata – the only thing that needs to happen in this case is off-band delivery of the initial signed metadata for the repository: the initial metadata for pypi.org will be shipped with pip but 3rd party repositories would have to handle that in another way. In practice this would mean somehow installing a directory with a few json files on the client device in the pip user data directory (~/.local/share/pip/ on linux).

On the repository side things change a bit more (as I assume many 3rd party repositories only serve the files and do not actually run Warehouse):

  • There is now a need to actually manage the keys used to sign the metadata – both to keep the online and offline keys secure and to re-generate keys as needed
  • The metadata that is served to clients is just json files but parts of it need to be regenerated and re-signed when new packages are added to the repository
  • Also, there is an assumption that parts of the metadata get re-signed at intervals (to prevent an attacker serving old files)

Warehouse is going to handle all of the above on pypi.org (well, apart from managing offline keys). If the 3rd party repository also runs Warehouse, they should be able to use all of the code – but would have to take care of initial repository setup and key management. Implementing the same things in a smaller standalone component (to avoid running full Warehouse) might be possible but definitely not trivial and no-one is currently doing that.