PEP 458: Secure PyPI downloads with package signing

tiran · December 11, 2019, 9:23pm

It’s similar for Red Hat. CoreOS added Notory support (Golang implementation of TUF spec) to Quay (container registry) for image signing based on the design from Docker Inc… Red Hat acquired CoreOS shortly before Red Hat was acquired by IBM. I’m not familiar with the details of Notary in Quay and there isn’t much documentation on the topic. The Quay registry seems to use an external TUF service to do the work.

Apparently there was some research to improve yum/dnf (package manager) based on ideas from TUF spec, too. The DNF specs don’t mention TUF. Fedora and RHEL rely on GPG signatures and hash files for packaging.

EWDurbin · December 12, 2019, 9:44pm

I just noted that a handful of @mnm678’s responses were marked as spam by the system. I’ve approved the posts and manually increased trust level to try to mitigate that in the future.

mnm678 · December 12, 2019, 9:45pm

To prevent any confusion, we can remove the mention of Microsoft.

This looks good. I’ll add it to the proposed text.

These are additional benefits that should be added to the intro.

I agree, these attacks should be mentioned.

Here is a quick rewrite with those changes. I think we could mention the attacks from @SantiagoTorres somewhere as well.

Attacks on software repositories are common, even in organizations with very good security practices (Attacks on software repositories · theupdateframework/pip Wiki · GitHub). The resulting repository compromise allows an attacker to edit all files stored on the repository and sign these files using any keys stored on the repository (online keys). In many signing schemes (like TLS), this access allows the attacker to replace files on the repository and make it look like these files are coming from PyPI. Without a way to revoke and replace the trusted private key, it is very challenging to recover from a repository compromise. In addition to the dangers of repository compromise, software repositories are vulnerable to an attacker on the network (MITM) intercepting and changing files. These and other attacks on software repositories are detailed here. This PEP aims to protect users of PyPI from compromises of the integrity, consistency and freshness properties of PyPI packages, and enhances compromise resilience, by mitigating key risk and providing mechanisms to recover from a compromise of PyPI or its signing keys. In addition to protecting direct users of PyPI, this PEP aims to provide similar protection for users of PyPI mirrors.

To provide compromise resilient protection of PyPI, this PEP proposes the use of The Update Framework (TUF). TUF provides protection from a variety of attacks on software update systems, while also providing mechanisms to recover from a repository compromise. TUF has been used in production by a number of organizations including Cloudflare, Datadog, DigitalOcean, Docker, Flynn, IBM, Kolide, LEAP, RedHat, and VMware. More details about TUF are included later in this PEP and in the specification.

This PEP describes changes to the PyPI infrastructure that are needed to ensure that users get valid packages from PyPI. These changes should have minimal impact on other parts of the ecosystem. The PEP focuses on communication between PyPI and users, and so does not require any action by package developers. Developers will upload packages using the current process, and PyPI will automatically sign these packages. In order for the security mechanism to be effective, additional work will need to be done by PyPI consumers (like pip) to verify the signatures and metadata provided by PyPI. This verification can be transparent to users (unless it fails) and provides an automatic security mechanism. There is documentation for how to consume TUF metadata in the TUF repository. However, changes to PyPI consumers are not required, and can be done according to the timelines and priorities of individual projects.

tiran · December 12, 2019, 10:47pm

Please drop the list of companies from the text and rather mention a list of technologies or products that are based on TUF spec. There are legal implications in using brand names and trademarks to endorse and promote a 3rd party product. Also you are still misspelling one of the brands in your list.

I care more about technologies than fancy companies names. For example you could mention that the TUF spec is used in Cloud Native Computing Foundation’s Notary service, which provides the infrastructure for container image signing in Docker Registry.

dstufft · December 12, 2019, 11:10pm

For what it’s worth, I think I got added to the authors list more as an honorarium due to some of my early feedback rather than actually being an author in any real part of this PEP.

mnm678 · December 13, 2019, 8:18pm

Fair enough. We can replace that sentence with:

TUF has been used in production by a number of organizations, including use in Cloud Native Computing Foundation’s Notary service, which provides the infrastructure for container image signing in Docker Registry. The TUF specification has been the subject of three independent security audits.

mnm678 · December 16, 2019, 7:23pm

@ncoghlan Would you be interested in sponsoring this PEP?

ncoghlan · December 16, 2019, 11:00pm

Aye, I’d be happy to sponsor both PEP 458 and PEP 480 (Historical context: Donald and I provided a lot of the original feedback that led to splitting the package signing design between PEP 458 & PEP 480, and I specifically called this out as a project that was gated on funding back in 2016: The Python Packaging Ecosystem | Curious Efficiency)

Regarding PEP titles, I would suggest the following:

PEP 458: Transport independent delivery assurance for PyPI packages
PEP 480: Opt-in end-to-end package signing for PyPI packages

Shorthand description of PEP 458: Publish TUF metadata from PyPI to allow validatation of PyPI mirrors and detect attempted freeze attacks and TLS MitM attacks.

Shorthand description of PEP 480: Allow package publishers to sign their own TUF package metadata to reduce the risks associated with a compromise of the PyPI service.

Personal opinion:

I’ve long been convinced that PEP 458 is a good idea, as it’s high value (due to mirror validation), with minimal UX impact on either publishers or consumers (as it’s an automated client level check, like HTTPS).

I’m far more skeptical about PEP 480, as when it comes to detecting mutation of previously published packages, a system inspired by Certificate Transparency : Certificate Transparency seems more viable (creating a public append-only log of artifact hashes would provide more comprehensive coverage with substantially less collective effort), and for publication of new malicious artifacts, experience suggests that attacks tend to focus on either clients (typosquatting, social engineering) or direct compromise of publisher systems (which would potentially grant access to the publisher signing keys anyway).

mnm678 · December 18, 2019, 8:45pm

Thank you!

I like these titles, they make the goals of the PEPs more clear. However, it might be good to mention security in the title, maybe “Secure transport independent download integrity for PyPI packages” or something similar.

ncoghlan · December 18, 2019, 10:29pm

And “download integrity” is going to be immediately clear to more people than “delivery assurance”, so I like that as an updated title for PEP 458.

Regarding the summary earlier in the thread, I think that draft text makes the most sense as a new summary for PEP 480.

For PEP 458, the focus should be on answering the question “How can a package installation client ensure that a mirror is providing the same packages as PyPI itself?”.

The key question for PEP 480 is different, as it’s “How can a package installation client ensure that PyPI is providing the same packages as the original publisher uploaded?”. While there’s merit in allowing publishers that actively want to manage their own signing keys to do so, we can also reasonably assume the majority of publishers are not going to take on that extra responsibility. So regardless of whether the full PEP 480 end-to-end signing support is implemented or not, I expect that we’re eventually going to want a secure public transparency log for artifact hashes anyway.

mnm678 · December 19, 2019, 11:40pm

I added the new title and summary to a fork of the PEP repository. I can continue to update that as we discuss it here.

This is a good way to illustrate the differences, PEP 458 is about PyPI to the package installation while PEP 480 is about the original publisher to PyPI.

I think the transparency log could be a good addition to PEP 480, it might allow faster discovery of compromises or other package issues, especially for publishers who decide not to implement end-to-end signing. However, having the option of end-to-end signing offers more automatic security guarantees where possible.

ncoghlan · December 20, 2019, 12:52pm

Aye, I agree that would be a good way of structuring it - improved security against mutation of old artifacts for everyone, and improved security against mutation of future artifacts for publishers that choose to opt in to that.

sumanah · January 3, 2020, 5:12pm

Per peps pull request #1247, the title of PEP 458 is currently “Secure PyPI downloads with package signing”. So I’m asking the Discourse admins to change the title of this thread accordingly.

sumanah · January 8, 2020, 8:39pm

PEP 458 now reflects a few further improvements that folks discussed in this thread, since sponsor @ncoghlan merged these pull requests over the last few days:

New section: Hash algorithm transition plan
Updated abstract
Updated Discussions-To and Post-History headers

I think the next step is for @mnm678 to do another sweep to find past comments to reply to/resolve. Take a look at the past distutils-sig discussions (start at the ones I linked to in this thread but also search the archive for “TUF” and “PEP 458” to find others), and at the PyCon 2019 sprint notes I mentioned:

And I’ve been pinging other acquaintances and colleagues who work in securing the package supply chain and asking them to comment on the PEP.

aixtools · January 9, 2020, 12:43pm

FYI: my two bits after reading all of this.

I have the impression I understand the objectives - whereas last year the goals were obscure to me.

Thanks for the clarification and simplification (in the discussions).

mnm678 · January 14, 2020, 1:10am

I went through the PyCon notes and previous mailing list discussions (especially this) to look for any open issues related to this PEP.

There are a couple of topics that could use some discussion here. These include:

Should packages uploaded before PEP 458 be back signed? Are they considered trusted by default? This is a one time concern for when PEP 458 is deployed the first time. For this reason, I would support backsigning all existing packages. Any packages currently on PyPI could be signed as part of the rollout of PEP 458. This would allow users to use only signed packages starting at the initial rollout, instead of keeping track of which packages are supported.
How is TUF different from transparency logs (like Certificate Transparency)? I see transparency logs as complementary to the security goals of PEP 458 and PEP 480. Transparency logs allow for faster detection of a compromise, while TUF (and the PEPs) aim to prevent and recover from a compromise.

In addition, many of the issues discussed have since been clarified or addressed in the PEP. For the purpose of a more open discussion, I paraphrased many of these issues and their current resolution (with links to the more detailed writeups in the PEP). These include:

Availability concerns
- Is a package available for immediate download when it is uploaded to PyPI? There is a long discussion of metadata availability issues in the PEP that ends with the note “Moreover, PyPI MAY serve distribution files to clients before the corresponding consistent snapshot metadata is generated. In that case the client software SHOULD inform the user that full TUF protection is not yet available but will be shortly.”
- Are packages available if for some reason PEP 458 is not working? Yes, but the user has to specify a flag so that they are aware that they are using a less secure option. More details are here
Computation and storage costs
- Can outdated files be purged? All metadata except root metadata can be regularly purged (PEP 458 – Secure PyPI downloads with signed repository metadata | peps.python.org).
- What are the storage costs of PEP 458? There is a table that describes the storage costs here
Key management
- How are online keys managed? PEP 458 allows the existing Hashicorp vault to be used for online key management. (PEP 458 – Secure PyPI downloads with signed repository metadata | peps.python.org)
- How are offline keys managed and stored? There is some advice given in the PEP about how this can be done. This is mostly an issue for the PyPI maintainers, and so they can decide how best to store their offline keys.

mnm678 · January 14, 2020, 5:00pm

Another thing to discuss is a transition plan for PEP 458. There was some previous discussion about this topic. Basically, the issue is how to transition users to verifying updates using PEP 458 and how they can report any issues during this process.

sumanah · January 14, 2020, 5:33pm

Thanks for asking about this, @mnm678. @pf_moore @ncoghlan @dstufft In your opinion, does a short “onboarding end users” paragraph along these lines need to be part of PEP 458? I think it does not, because the PEP does not prescribe how pip should use the new metadata:

This PEP does not prescribe how package managers, such as pip, should be adapted to install or update projects from PyPI with TUF metadata.

pf_moore · January 14, 2020, 5:52pm

I don’t think it’s needed. As you say, if it was present it would contradict the statement that the PEP doesn’t say how package managers should handle TUF.

On the other hand, it says so little (basically little more than “if you support TUF it should be on by default”) that I can’t really get excited either way

More broadly, I don’t think that “transitioning users” should be a concern for the PEP. For all practical purposes, whether TUF is in use is intended to be transparent to end users (if the package manager they use supports it, things are safer with no visible impact - much like https vs http). So what’s to transition?

sumanah · January 14, 2020, 6:08pm

That makes sense to me, Paul! Thanks for confirming. Assuming Nick or
Donald agrees, I think it would be fine for the PEP authors to go ahead
and close that PR without merging (maybe move some of that thinking into
the PEP 480 draft if desired).