PEP 458: Secure PyPI downloads with package signing

Thanks - that sounds loads better (to me at least!) :slightly_smiling_face:

2 Likes

I’ll also say this latest summary reads very well.

2 Likes

I still think this statement is vastly overstating what the PEP addresses.

  • legitimately published yet malicious packages are not in any way prevented or identified by this proposal
  • a compromise of PyPI’s storage system is the only compromise that would be protected against, assuming none of the keys were kept in compromised storage
  • there’s no recovery from a compromise of the root key
  • recovery implies restoration, as Paul mentioned, but all we can really do is fail validation for anything signed by a key that was never properly endorsed or that was (presumably compromised and) used after its expiration
  • (I assume that there’s no attack whereby the attacker forces a key rotation and resigning of a package that was injected without correctly signed metadata, but I haven’t worked this one through)

MITM attacks and client-side redirection attacks seem to be the primary vector being protected against. They should at least get a mention.

I’m not aware of anyone at Microsoft using TUF in production. Could you email me at steve(dot)dower(at)microsoft.com with either the team or a person you know who is involved? My understanding was that TUF does not meet our compliance requirements, so I’m interested to see how they made it work.

2 Likes

I agree that it can be refined. What about something along the lines of…?

“This PEP aims to protect users of PyPI from compromises of the integrity, consistency and freshness properties of PyPI packages, and enhances compromise resilience, by mitigating key risk and providing mechanisms to recover from a compromise of PyPI or its signing keys.”

Might be a bit long winded, but maybe someone can make something out of it.

  • legitimately published yet malicious packages are not in any way prevented or identified by this proposal

Correct. Do you think my suggestion above reduces ambiguity?

  • a compromise of PyPI’s storage system is the only compromise that would be protected against, assuming none of the keys were kept in compromised storage

It also protects against malicious CDNs/mirrors, which usually don’t have access to the signing keys. Furthermore, the PEP recommends to store some upper-level role keys (root, targets and bins) offline, which allows a seamless recovery from compromises of online keys (timestamp, snapshot, bin-n).

  • there’s no recovery from a compromise of the root key

There actually is, although it is a race between the legitimate holders of the root keys and the attacker. Whoever gets to first publish a new root metadata file with new keys wins. If the attacker is able to make clients replace the compromised root keys with keys that are only controlled by the attacker, then you are right, there is no in-band way to recover. But having the root keys separated from the metadata publishing infrastructure, i.e. PyPI gives the legitimate holders of the keys an enormous advantage in this race. Also note that the attacker needs to compromise the required signing threshold of root keys (recommended to be stored offline in different locations) to even enter that race.

  • recovery implies restoration, as Paul mentioned, but all we can really do is fail validation for anything signed by a key that was never properly endorsed or that was (presumably compromised and) used after its expiration

This PEP does describe how to restore the repository after a compromise. But you are right, it does not describe strategies for the client, what to do if the TUF metadata indicates a compromise, other than not installing/updating the invalid targets. I think it is out of scope, but we can probably brainstorm ideas.

  • (I assume that there’s no attack whereby the attacker forces a key rotation and resigning of a package that was injected without correctly signed metadata, but I haven’t worked this one through)

Not sure I understand. Would you mind elaborating?

MITM attacks and client-side redirection attacks seem to be the primary vector being protected against. They should at least get a mention.

Yes that and attacks against CDNs/mirrors. Furthermore, I see PEP 458 as a major stepping stone for PEP 480, but that’s a different discussion.

1 Like

8 posts were split to a new topic: Removing the mention of “Microsoft” from PEP 458

It’s similar for Red Hat. CoreOS added Notory support (Golang implementation of TUF spec) to Quay (container registry) for image signing based on the design from Docker Inc… Red Hat acquired CoreOS shortly before Red Hat was acquired by IBM. I’m not familiar with the details of Notary in Quay and there isn’t much documentation on the topic. The Quay registry seems to use an external TUF service to do the work.

Apparently there was some research to improve yum/dnf (package manager) based on ideas from TUF spec, too. The DNF specs don’t mention TUF. Fedora and RHEL rely on GPG signatures and hash files for packaging.

I just noted that a handful of @mnm678’s responses were marked as spam by the system. I’ve approved the posts and manually increased trust level to try to mitigate that in the future.

2 Likes

To prevent any confusion, we can remove the mention of Microsoft.

This looks good. I’ll add it to the proposed text.

These are additional benefits that should be added to the intro.

I agree, these attacks should be mentioned.

Here is a quick rewrite with those changes. I think we could mention the attacks from @SantiagoTorres somewhere as well.

Attacks on software repositories are common, even in organizations with very good security practices (https://github.com/theupdateframework/pip/wiki/Attacks-on-software-repositories). The resulting repository compromise allows an attacker to edit all files stored on the repository and sign these files using any keys stored on the repository (online keys). In many signing schemes (like TLS), this access allows the attacker to replace files on the repository and make it look like these files are coming from PyPI. Without a way to revoke and replace the trusted private key, it is very challenging to recover from a repository compromise. In addition to the dangers of repository compromise, software repositories are vulnerable to an attacker on the network (MITM) intercepting and changing files. These and other attacks on software repositories are detailed here. This PEP aims to protect users of PyPI from compromises of the integrity, consistency and freshness properties of PyPI packages, and enhances compromise resilience, by mitigating key risk and providing mechanisms to recover from a compromise of PyPI or its signing keys. In addition to protecting direct users of PyPI, this PEP aims to provide similar protection for users of PyPI mirrors.

To provide compromise resilient protection of PyPI, this PEP proposes the use of The Update Framework (TUF). TUF provides protection from a variety of attacks on software update systems, while also providing mechanisms to recover from a repository compromise. TUF has been used in production by a number of organizations including Cloudflare, Datadog, DigitalOcean, Docker, Flynn, IBM, Kolide, LEAP, RedHat, and VMware. More details about TUF are included later in this PEP and in the specification.

This PEP describes changes to the PyPI infrastructure that are needed to ensure that users get valid packages from PyPI. These changes should have minimal impact on other parts of the ecosystem. The PEP focuses on communication between PyPI and users, and so does not require any action by package developers. Developers will upload packages using the current process, and PyPI will automatically sign these packages. In order for the security mechanism to be effective, additional work will need to be done by PyPI consumers (like pip) to verify the signatures and metadata provided by PyPI. This verification can be transparent to users (unless it fails) and provides an automatic security mechanism. There is documentation for how to consume TUF metadata in the TUF repository. However, changes to PyPI consumers are not required, and can be done according to the timelines and priorities of individual projects.

Please drop the list of companies from the text and rather mention a list of technologies or products that are based on TUF spec. There are legal implications in using brand names and trademarks to endorse and promote a 3rd party product. Also you are still misspelling one of the brands in your list.

I care more about technologies than fancy companies names. For example you could mention that the TUF spec is used in Cloud Native Computing Foundation’s Notary service, which provides the infrastructure for container image signing in Docker Registry.

For what it’s worth, I think I got added to the authors list more as an honorarium due to some of my early feedback rather than actually being an author in any real part of this PEP.

Fair enough. We can replace that sentence with:

TUF has been used in production by a number of organizations, including use in Cloud Native Computing Foundation’s Notary service, which provides the infrastructure for container image signing in Docker Registry. The TUF specification has been the subject of three independent security audits.

4 Likes

@ncoghlan Would you be interested in sponsoring this PEP?

Aye, I’d be happy to sponsor both PEP 458 and PEP 480 (Historical context: Donald and I provided a lot of the original feedback that led to splitting the package signing design between PEP 458 & PEP 480, and I specifically called this out as a project that was gated on funding back in 2016: https://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.html#making-pypi-security-independent-of-ssl-tls)

Regarding PEP titles, I would suggest the following:

PEP 458: Transport independent delivery assurance for PyPI packages
PEP 480: Opt-in end-to-end package signing for PyPI packages

Shorthand description of PEP 458: Publish TUF metadata from PyPI to allow validatation of PyPI mirrors and detect attempted freeze attacks and TLS MitM attacks.

Shorthand description of PEP 480: Allow package publishers to sign their own TUF package metadata to reduce the risks associated with a compromise of the PyPI service.

Personal opinion:

I’ve long been convinced that PEP 458 is a good idea, as it’s high value (due to mirror validation), with minimal UX impact on either publishers or consumers (as it’s an automated client level check, like HTTPS).

I’m far more skeptical about PEP 480, as when it comes to detecting mutation of previously published packages, a system inspired by http://www.certificate-transparency.org/what-is-ct seems more viable (creating a public append-only log of artifact hashes would provide more comprehensive coverage with substantially less collective effort), and for publication of new malicious artifacts, experience suggests that attacks tend to focus on either clients (typosquatting, social engineering) or direct compromise of publisher systems (which would potentially grant access to the publisher signing keys anyway).

5 Likes

Thank you!

I like these titles, they make the goals of the PEPs more clear. However, it might be good to mention security in the title, maybe “Secure transport independent download integrity for PyPI packages” or something similar.

3 Likes

And “download integrity” is going to be immediately clear to more people than “delivery assurance”, so I like that as an updated title for PEP 458.

Regarding the summary earlier in the thread, I think that draft text makes the most sense as a new summary for PEP 480.

For PEP 458, the focus should be on answering the question “How can a package installation client ensure that a mirror is providing the same packages as PyPI itself?”.

The key question for PEP 480 is different, as it’s “How can a package installation client ensure that PyPI is providing the same packages as the original publisher uploaded?”. While there’s merit in allowing publishers that actively want to manage their own signing keys to do so, we can also reasonably assume the majority of publishers are not going to take on that extra responsibility. So regardless of whether the full PEP 480 end-to-end signing support is implemented or not, I expect that we’re eventually going to want a secure public transparency log for artifact hashes anyway.

1 Like

I added the new title and summary to a fork of the PEP repository. I can continue to update that as we discuss it here.

This is a good way to illustrate the differences, PEP 458 is about PyPI to the package installation while PEP 480 is about the original publisher to PyPI.

I think the transparency log could be a good addition to PEP 480, it might allow faster discovery of compromises or other package issues, especially for publishers who decide not to implement end-to-end signing. However, having the option of end-to-end signing offers more automatic security guarantees where possible.

2 Likes

Aye, I agree that would be a good way of structuring it - improved security against mutation of old artifacts for everyone, and improved security against mutation of future artifacts for publishers that choose to opt in to that.

2 Likes

Per peps pull request #1247, the title of PEP 458 is currently “Secure PyPI downloads with package signing”. So I’m asking the Discourse admins to change the title of this thread accordingly.

2 Likes

PEP 458 now reflects a few further improvements that folks discussed in this thread, since sponsor @ncoghlan merged these pull requests over the last few days:

I think the next step is for @mnm678 to do another sweep to find past comments to reply to/resolve. Take a look at the past distutils-sig discussions (start at the ones I linked to in this thread but also search the archive for “TUF” and “PEP 458” to find others), and at the PyCon 2019 sprint notes I mentioned:

And I’ve been pinging other acquaintances and colleagues who work in securing the package supply chain and asking them to comment on the PEP.

2 Likes

FYI: my two bits after reading all of this.

I have the impression I understand the objectives - whereas last year the goals were obscure to me.

Thanks for the clarification and simplification (in the discussions).

1 Like