PEP 458: Secure PyPI downloads with package signing

pf_moore · December 4, 2019, 7:50pm

That’s precisely the sort of thing the PEP should make clearer.

To be clear, I have no problem with doing (1). But that doesn’t mean that the PEP shouldn’t be sufficiently clear to allow non-technical users to read and understand enough to know what it’s providing (to use your analogy, I don’t know how https is implemented, but I know what it’s for, what it protects against, roughly how it does it, and importantly, what it doesn’t protect against - the PEP should give the same level of understanding here).

There’s a somewhat new situation here that we’re having to navigate. We have got some volunteers, we’ve got some money to let them do what they propose, but we still need to ensure (as a community) that we want what they are offering, and someone is willing to pay for. Having known community specialists like yourself support the proposal is a good step in that direction, but it’s not the whole story.

Some other things that are typically covered in a PEP but which are missing here:

Review of how other ecosystems handle this issue. This data integrity issue isn’t unique to Python. How do other languages (rust, javascript, ruby) and distributions (Red Hat, Ubuntu, Homebrew, Microsoft (nuget)) handle it?
Discussion of how “PyPI consumers” should implement this. In view of our principle that we avoid implementation defined behaviour, I’d like to see an explanation of how a tool that wants to consume data from PyPI would implement the consumer end of the protocol. Presumably in terms of using the TUF library from PyPI. I don’t think it’s acceptable to expect tools to copy pip’s implementation. (An obvious example of a tool would be distlib, and we have a goal to make it easy to write new standards-based tools, so we should take that into account).

I’d also like to see the PEP title changed, as it’s currently basically meaningless. Something like “Implement (whatever it is we’re implementing) for PyPI using TUF” would much better explain the proposal - and would make searching for the relevant PEP a lot easier as well!.

sumanah · December 5, 2019, 2:13pm

PEP title: I suggest “Secure PyPI downloads with package signing” which gets across:

this is about downloads (not uploads)
this is about PyPI (not other tools)
the way we do it is with something having to do with the signing of packages

joshuagl · December 5, 2019, 2:47pm

Trying to keep the PEP title short but descriptive, how about “Improved PyPI download integrity via signed packages”?

pf_moore · December 5, 2019, 3:12pm

The problem I have with “signing” is that it implies authors will do the signing, which is precisely what we want to avoid. To use @dstufft’s analogy again (which I like a lot) HTTPS uses signing, but internally - to the end user it’s not about signing at all, but about secure communication.

mnm678 · December 5, 2019, 4:23pm

I’m a coauthor of PEP 458 and a maintainer of TUF. I wanted to weigh in here on some of the issues raised about the PEP in this discussion, and present some ideas for moving forward.

First, I want to note that while I am a maintainer of the TUF project, my goal here is to enhance the security of PyPI. While I think TUF will help achieve the security goals specified in this PEP, if the community has other ideas or would like to pursue alternative strategies, I would be happy to contribute to and support those as well.

It seems that the title and introduction to the PEP are causing some confusion about the purpose and scope of this PEP, and that some of the prior discussion about this PEP did not make it into the text. To that end, I have some suggestions for areas to discuss here and in the PEP:

Change the title: We need something that better reflects the contents of the PEP. I suggest something like “Supporting end-user verification of PyPI Packages”, but am open to other suggestions.
Change the introduction to include a brief description of the problem the PEP solves, and what entities the document affects. As mentioned earlier in this thread, this PEP solves the problem of making sure that users get the specific packages that PyPI serves (even if they use a mirror), and providing some protection in case PyPI is compromised. These changes will affect only the PyPI infrastructure (not package maintainers). In addition, they allow PyPI consumers like pip to verify packages. Though most of this information is already in the PEP, the project goals are not clearly stated at the beginning of the document, and this should be addressed.
Include a description of how a client can interact with the solution proposed in PEP 458.
Contrast the proposed solution with other solutions. We need to explain early in the document how TUF is different from other solutions like TLS and GPG signing.
Consider including an out of scope section. There was some discussion here about the issue of typosquatting packages, and there are many other security issues this PEP does not cover. Rather than a long and incomplete list of what this PEP does not do, I would prefer a clearer definition of what the PEP does do as described above. However, if others think an out of scope section is necessary, please suggest what content belongs in this section.

trishankatdatadog · December 5, 2019, 6:14pm

First, let me acknowledge my conflict of interest, having written the PEP, and since I am also one of the lead researchers and developers for TUF.

Having said that, I cannot think of any other system that provides the level of security that it does while preserving usability for developers. We have put years of thought into this, especially as we learned from experience.

Other solutions (such as Certificate-Transparency-like solutions) may be simpler from a usability point of view, especially as there is literally nothing for developers to do (just like this PEP, actually), but it does not provide any trustworthy information whatsoever as to who wrote the original source code, and who or what built the source code into packages.

OTOH, you can combine TUF with other solutions to provide end-to-end guarantees that your package was not tampered with anywhere between, say, Django developers and end-users. Indeed, this is what we have done with the Datadog Agent integrations.

We thought that it would be good to see the same level of usable security on PyPI, which is why we originally wrote this PEP. Let us know if you have more questions.

EWDurbin · December 9, 2019, 8:48pm

Thanks for summarizing these points @mnm678!

Does anyone have anything to discuss regarding the suggestions? I notice that there have been a number of “Like” responses from those who have raised concerns, but those don’t seem particularly actionable

pf_moore · December 9, 2019, 9:32pm

As someone who “liked” that post, my intention was to express support for the idea of doing those things. I don’t have a proposal as such on any of them, I’d like one of the PEP authors to respond with a proposal that can be considered and discussed. To be honest, as @mnm678 is one of the authors, I was hoping that the number of “likes” would encourage her to do so.

I was sort of hoping you’d provide answers to the questions already posed, before we started asking further questions. That way, the discussion would feel more like a dialogue, which is honestly where I currently feel the process is failing here.

I’m getting very frustrated at this point. There has been a lot of feedback on this thread, and little or no response from the PEP authors (other than @mnm678’s post, which was a good start but @EWDurbin’s follow-up seems to imply that it’s not going anywhere without more input from the people who have already commented ). There still seems to be this misunderstanding that the response to feedback should be a revised PEP - and that’s absolutely not the point here, we want a discussion first with consensus before the PEP gets updated.

Honestly, as things stand I don’t feel that I can support this PEP. Not for any technical reasons, but simply because it’s failing to follow the basic principle of community consensus. Obviously if the authors are ignoring the need for consensus, they can also ignore my views, and I guess there’s not much I can do about that. But it seems a shame if we can’t reach agreement here somehow.

SantiagoTorres · December 10, 2019, 12:22am

I am not a PEP author, yet I do think TUF is what we need.

As a bystander on the conversation, I’m somewhat confused as to what you mean with this: do you either mean that the suggestion should be somehow materialized somewhere before or after community consensus? Are you not supporting the PEP (RFP?!) because you don’t feel there’s consensus (yet?). It appears to me — again, only an outsider — that the process is somewhat unclear even to people inside of the community, and thus I’d expect more leniency. I’m hoping that you, as a more senior community member, could perhaps help with specific pointers on how to help drive community consensus (be it pro or against)?

As far as I’m aware, here are the questions (apologies if I missed any):

Does TUF help against typosquating? (or any other signing solution for that matter?)
no it doesn’t, as far as I’m aware.
Should the name reflect that?
definitely, I think the authors could clearly outline the limitations of this and arguably any other repository/package signing /integrity-solution.
Does this mean that developers must sign all packages when uploading to PyPI?
as far as I’m aware, no. That’s part of what’s discussed on PEP 458/480. Those who do not want to manage their keys will have packages signed by an automated element running in the PyPA infrastructure (again, I’m not in a position of authority to clarify this)
Review how other ecosystems handle the issue (also grouped up with "this has never happened to PyPI as far as we are aware).
I think there’s quite some literature review around it, as well as other successful implementations in other ecosystems. What exactly where you thinking would pass the bar of “enough of a review”?
Backwards compatibility with non-signing solutions.
I think @dstufft’s HTTP metaphor pretty much outlines how this would behave

I believe it’s possible to work out a plan to drive community consensus around those questions. What I’m not completely aware of is how to actually turn this into something that’s actionable.

edit: for some reason I was trying to format this and got sent, apologies.

sumanah · December 10, 2019, 2:58am

Hey @pf_moore. I think – to second what @SantiagoTorres just said – the PEP authors here are genuinely trying to work with the community here and build consensus, and have benefited from and could use more mechanical guidance on how to do that, because their previous attempts evidently didn’t work quite right. (Examples of such guidance: “do as much work as possible in discussions on Discourse/mailing list, including things like suggesting specific textual changes and reviewing/asking for review, even though we’re generally used to doing that work in GitHub pull request reviews when we work together on code.” “When a community member asks a question or shares a criticism of the existing proposal, it’s better to have a back-and-forth conversation with lots of little iterative questions and answers, instead of the proposal author coming back with a multi-paragraph essay or proposal revision. Yes, even if that feels spammy to you at first and means the thread ends up being hundreds of posts.” Are those right? I’m inferring. PEP discussion experts – are these right? In particular, is there newish updated “what to do where” guidance on PEP stuff that reflects our new Discourse reality?) And here’s another example:

“Liking” a post on Discourse does not have clear semantics. It could mean “I like how you expressed this” or “I’m glad you spoke” or “welcome” – it does not clearly mean “yes, please do the things you have proposed.” So now I am glad that you have explicitly said to @mnm678 that you want her to go ahead and expand on those items.

In the absence of @dstufft as BDFL-Delegate explicitly guiding the discussion here, I suggest that @mnm678 go ahead and share – here in the thread, not as a pull request – some thoughts per her previous post, especially/starting with:

mnm678 · December 10, 2019, 4:43am

I will aim to answer the current questions in this post. In addition, I will start a list of action items and proposed changes to the PEP based on the discussion in this thread. These are meant as a starting place, so feel free to give feedback and discuss.

Here are my answers to some of the questions from the thread. It’s a long thread, so please let me know if I miss any pressing questions.

No. PEP 458 and TUF solve a different security problem (making sure that users get the packages that are hosted on PyPI)

Yes. There was some discussion earlier in the thread about options for new names for the PEP. I think more discussion here is needed to form a consensus.

PEP 458 does not provide any mechanism for developer signing of packages. It adds signatures after packages are uploaded to PyPI. An extension of this work (such as what is described in PEP 480), would allow for developer signing to further extend the chain of trust (developer to PyPI as well as PyPI to user)

Here is a current list of action items:

Change the title: This needs more discussion. Some earlier posts with various proposed titles were a good start, but there doesn’t seem to be consensus yet. Hopefully more clarification about the goal of the PEP will help here.
Add a couple paragraphs to the beginning of the PEP (probably the abstract) to give an overview of the PEP. Here is a draft of that section as a starting place:

Attacks on software repositories are common (Attacks on software repositories · theupdateframework/pip Wiki · GitHub). The resulting repository compromises allow attackers to replace popular packages with malware that looks like the original package. In addition, an attacker on a repository can use any online keys (like those provided by TLS) to validly sign arbitrary packages. These and other attacks on software repositories are detailed here. This PEP aims to protect users of PyPI from malicious packages and to provide a mechanism to recover from a repository compromise.

To provide compromise resilient protection of PyPI, this PEP proposes the use of TUF. TUF provides protection from a variety of attacks on software update systems, including arbitrary software installation and rollback attacks, while also providing mechanisms to recover from a compromise. It does this by generating signed metadata using threshold signatures and offline root keys that can be used to verify the accuracy and timeliness of software packages. More details about TUF are included later in this PEP and in the specification.

This PEP describes the changes to the PyPI infrastructure required to produce TUF metadata. These changes should have minimal impact on other parts of the ecosystem. The PEP focuses on communication between PyPI and users, and so does not require any action by package developers. Developers will upload packages using the current process, and PyPI will automatically sign these packages. In order for the generated TUF metadata to be effective, additional work will need to be done by PyPI consumers (like pip) to verify the metadata provided by PyPI. This verification can be transparent to users (unless it fails) and provides an automatic security mechanism. There is documentation for how to consume TUF metadata in the TUF repository. However, changes to PyPI consumers are not required, and can be done according to the timelines and priorities of individual projects.

pf_moore · December 10, 2019, 12:09pm

Yes, I probably let my frustration here get the better of me. I think your points are good, and would help a lot. This is probably one of the reasons the PEP process specifies that each PEP should have a sponsor, precisely to guide the authors through handling the process. I don’t have the time (too many RL things to handle right now) to offer to sponsor the PEP myself, but hopefully someone can.

Understood, that’s why I felt it was worth clarifying my position, as @EWDurbin seemed to think the number of likes implied that one of the people adding a like was planning on taking the next step.

For most of these, your answers sound fine to me, and if the PEP included those answers in a way that “casual readers” could get without getting overwhelmed with detail, then I’m good with that.

@sumanah suggested “Secure PyPI downloads with package signing” above, which has had 2 replies but no other followup yet, so that one’s in progress.

Something as simple as “TUF has been used successfully in X, Y and Z, and has provided the following benefits there” would be plenty. None of this should be a huge burden, it’s just a matter of remembering when writing the background in the PEP that things which are “obvious” to the authors are not obvious to the reader.

On the assumption that the HTTPS metaphor is accurate, I’d love to see the overview of “what TUF provides” rewritten in terms of that metaphor. In the interests of clarity, though, I hope it’s obvious that I can’t say myself how to do that, it needs someone who actually knows TUF to give the explanation.

Cool. That’s worth getting up front in the summary. But making it a positive statement (“this is what this PEP does provide”) would be better than phrasing it negatively (“this is what this PEP doesn’t provide”).

Again, if the explanations in the PEP were more focused on an overview of what it does (rather than what it doesn’t do) this would probably become much less of a cause of confusion naturally. A cleaner separation of the two PEPs would be really helpful.

Thanks for this. I’ll add some comments on my immediate impressions. Please don’t take them as more than “first impressions”. Ideally, I’d spend more time reviewing and offering suggestions, but I really just don’t have sufficient time at the moment to get into that level of discussion. So I’ll offer my impressions as just that - basic data on how this comes across to a non-specialist - and leave it at that. I’m sorry I can’t offer any more at the moment.

That’s a lot of words that mostly just make me glaze over. If I’m understanding the paragraph, the key points to me are:

Attackers can (somehow - not important how) cause a user to download code that’s different from what PyPI has stored as package X, version M.N. This is a good point, and AIUI, protecting against this is basically what this PEP is about.
Attackers can make it look like a package has a valid signature when it doesn’t. This is confusing, because at the moment, users of PyPI don’t interact with signatures at all. I’m pretty sure this is where the confusion about “is this PEP about package author signing” starts. Given that this PEP is not proposing to address this attack at all, IMO you’d be better not mentioning it here.
The phrase “replace popular packages” makes me think of (1) typosquatting, and (2) attackers with direct update access to the PyPI data store. As far as I know, the PEP doesn’t actually cover either of these cases, so this phrase should probably be amended. Maybe “intercept network traffic to substitute malicious code for popular packages”?
“Recover from a repository compromise” reads to me (as a database specialist by trade) like a backup and recovery solution. Also, I don’t think this is about “recovery from”, rather than about “protection against”.

Most of this seems to be selling TUF in general, rather than focusing on what the PEP needs. Maybe reduce this to something like “TUF is a general protocol that provides a number of security mechanisms. This PEP proposes that PyPI use TUF to implement the measures discussed here.” In general, the PEP should avoid discussing general TUF capabilities, and focus very strongly on just the parts that solve the problems the PEP is aiming to address.

That’s a change of focus, and probably contributes to my confusion. I thought the PEP was describing a solution to a specific security risk, but now it appears that the proposal is to “generate TUF metadata”! Who moved the goalposts? I frankly couldn’t care less about whether we generate TUF metadata, but I do care about improving end to end security. I would personally avoid mentioning TUF metadata, or indeed any of TUF’s mechanisms, until you’re well into the “implementation details” section. As this paragraph stands, its main effect is to confuse me about what the goal of this PEP is, and put me off the PEP (because the “new goal” isn’t something I’m interested in).

As I said, this is only very basic impressions. I’m sorry I don’t have time to do more, but I hope it’s of some use nevertheless.

sumanah · December 10, 2019, 2:30pm

@EWDurbin can speak more to this in detail - I defer to him!

Some people talked about it during this year’s sprints (here are notes that should be incorporated into mailing list/Discourse discussion and potentially, subsequently, into the PEP).

Thanks for the clarification, extension of good faith, and suggestion, @pf_moore.

I see we ran into a problem here with sponsorship. I’ll go ahead and quote here from PEP 1:

If one or more of the PEP’s co-authors are core developers, they are responsible for following the process outlined below. Otherwise (i.e. none of the co-authors are core developers), then the PEP author(s) will need to find a sponsor for the PEP.

Ideally, a core developer sponsor is identified, but non-core sponsors may also be selected with the approval of the Steering Council. The sponsor’s job is to provide guidance to the PEP author to help them through the logistics of the PEP process (somewhat acting like a mentor). Being a sponsor does not disqualify that person from becoming a co-author or BDFL-Delegate later on (but not both). The sponsor of a PEP is recorded in the “Sponsor:” field of the header.

Originally @dstufft was one of the PEP coauthors, but per @dstufft’s comment a few days ago, he has not been doing the logistics-guiding. Therefore I suggest that Donald, @EWDurbin, and @mnm678 look at the list of core Python developers and ask some people whether they would be interested in sponsoring the PEP.

EWDurbin · December 10, 2019, 2:47pm

Indeed, the PSF Packaging WG have funds granted for the purposes of implementing cryptographically verifiable artifacts on PyPI, though the funding is not specific to TUF/PEP-458. We are not proceeding with any work on that until a PEP is finalized, but have reserved funding for an implementation.

In the meantime a subset of the funding will be used to begin implementation of automated systems for analyzing uploads to detect potentially malicious artifacts to mark them for administrator/moderator review.

mnm678 · December 10, 2019, 9:31pm

Thank you for the feedback. I rewrote the proposed intro to incorporate your comments. I tried to make it more clear what potential attacks this PEP is addressing. Here’s a new draft:

Attacks on software repositories are common, even in organizations with very good security practices (Attacks on software repositories · theupdateframework/pip Wiki · GitHub). The resulting repository compromise allows an attacker to edit all files stored on the repository and sign these files using any keys stored on the repository (online keys). In many signing schemes (like TLS), this access allows the attacker to replace files on the repository and make it look like these files are coming from PyPI. Without a way to revoke and replace the trusted private key, it is very challenging to recover from a repository compromise. In addition to the dangers of repository compromise, software repositories are vulnerable to an attacker on the network intercepting and changing files. These and other attacks on software repositories are detailed here. This PEP aims to protect users of PyPI from malicious packages and to provide a mechanism to recover from a compromise of PyPI or its signing keys.

To provide compromise resilient protection of PyPI, this PEP proposes the use of The Update Framework (TUF). TUF provides protection from a variety of attacks on software update systems, while also providing mechanisms to recover from a repository compromise. TUF has been used in production by a number of organizations including Cloudflare, Datadog, DigitalOcean, Docker, Flynn, IBM, Kolide, LEAP, Microsoft, RedHat, and VMware. More details about TUF are included later in this PEP and in the specification.

This PEP describes changes to the PyPI infrastructure that are needed to ensure that users get valid packages from PyPI. These changes should have minimal impact on other parts of the ecosystem. The PEP focuses on communication between PyPI and users, and so does not require any action by package developers. Developers will upload packages using the current process, and PyPI will automatically sign these packages. In order for the security mechanism to be effective, additional work will need to be done by PyPI consumers (like pip) to verify the signatures and metadata provided by PyPI. This verification can be transparent to users (unless it fails) and provides an automatic security mechanism. There is documentation for how to consume TUF metadata in the TUF repository. However, changes to PyPI consumers are not required, and can be done according to the timelines and priorities of individual projects.

pf_moore · December 10, 2019, 9:43pm

Thanks - that sounds loads better (to me at least!)

brettcannon · December 10, 2019, 9:47pm

I’ll also say this latest summary reads very well.

steve.dower · December 11, 2019, 5:32am

I still think this statement is vastly overstating what the PEP addresses.

legitimately published yet malicious packages are not in any way prevented or identified by this proposal
a compromise of PyPI’s storage system is the only compromise that would be protected against, assuming none of the keys were kept in compromised storage
there’s no recovery from a compromise of the root key
recovery implies restoration, as Paul mentioned, but all we can really do is fail validation for anything signed by a key that was never properly endorsed or that was (presumably compromised and) used after its expiration
(I assume that there’s no attack whereby the attacker forces a key rotation and resigning of a package that was injected without correctly signed metadata, but I haven’t worked this one through)

MITM attacks and client-side redirection attacks seem to be the primary vector being protected against. They should at least get a mention.

I’m not aware of anyone at Microsoft using TUF in production. Could you email me at steve(dot)dower(at)microsoft.com with either the team or a person you know who is involved? My understanding was that TUF does not meet our compliance requirements, so I’m interested to see how they made it work.

lukpueh · December 11, 2019, 1:54pm

I agree that it can be refined. What about something along the lines of…?

“This PEP aims to protect users of PyPI from compromises of the integrity, consistency and freshness properties of PyPI packages, and enhances compromise resilience, by mitigating key risk and providing mechanisms to recover from a compromise of PyPI or its signing keys.”

Might be a bit long winded, but maybe someone can make something out of it.

legitimately published yet malicious packages are not in any way prevented or identified by this proposal

Correct. Do you think my suggestion above reduces ambiguity?

a compromise of PyPI’s storage system is the only compromise that would be protected against, assuming none of the keys were kept in compromised storage

It also protects against malicious CDNs/mirrors, which usually don’t have access to the signing keys. Furthermore, the PEP recommends to store some upper-level role keys (root, targets and bins) offline, which allows a seamless recovery from compromises of online keys (timestamp, snapshot, bin-n).

there’s no recovery from a compromise of the root key

There actually is, although it is a race between the legitimate holders of the root keys and the attacker. Whoever gets to first publish a new root metadata file with new keys wins. If the attacker is able to make clients replace the compromised root keys with keys that are only controlled by the attacker, then you are right, there is no in-band way to recover. But having the root keys separated from the metadata publishing infrastructure, i.e. PyPI gives the legitimate holders of the keys an enormous advantage in this race. Also note that the attacker needs to compromise the required signing threshold of root keys (recommended to be stored offline in different locations) to even enter that race.

recovery implies restoration, as Paul mentioned, but all we can really do is fail validation for anything signed by a key that was never properly endorsed or that was (presumably compromised and) used after its expiration

This PEP does describe how to restore the repository after a compromise. But you are right, it does not describe strategies for the client, what to do if the TUF metadata indicates a compromise, other than not installing/updating the invalid targets. I think it is out of scope, but we can probably brainstorm ideas.

(I assume that there’s no attack whereby the attacker forces a key rotation and resigning of a package that was injected without correctly signed metadata, but I haven’t worked this one through)

Not sure I understand. Would you mind elaborating?

MITM attacks and client-side redirection attacks seem to be the primary vector being protected against. They should at least get a mention.

Yes that and attacks against CDNs/mirrors. Furthermore, I see PEP 458 as a major stepping stone for PEP 480, but that’s a different discussion.

brettcannon · December 12, 2019, 7:12pm

8 posts were split to a new topic: Removing the mention of “Microsoft” from PEP 458