PEP 458: Surviving a Compromise of PyPI

That’s precisely the sort of thing the PEP should make clearer.

To be clear, I have no problem with doing (1). But that doesn’t mean that the PEP shouldn’t be sufficiently clear to allow non-technical users to read and understand enough to know what it’s providing (to use your analogy, I don’t know how https is implemented, but I know what it’s for, what it protects against, roughly how it does it, and importantly, what it doesn’t protect against - the PEP should give the same level of understanding here).

There’s a somewhat new situation here that we’re having to navigate. We have got some volunteers, we’ve got some money to let them do what they propose, but we still need to ensure (as a community) that we want what they are offering, and someone is willing to pay for. Having known community specialists like yourself support the proposal is a good step in that direction, but it’s not the whole story.

Some other things that are typically covered in a PEP but which are missing here:

  • Review of how other ecosystems handle this issue. This data integrity issue isn’t unique to Python. How do other languages (rust, javascript, ruby) and distributions (Red Hat, Ubuntu, Homebrew, Microsoft (nuget)) handle it?
  • Discussion of how “PyPI consumers” should implement this. In view of our principle that we avoid implementation defined behaviour, I’d like to see an explanation of how a tool that wants to consume data from PyPI would implement the consumer end of the protocol. Presumably in terms of using the TUF library from PyPI. I don’t think it’s acceptable to expect tools to copy pip’s implementation. (An obvious example of a tool would be distlib, and we have a goal to make it easy to write new standards-based tools, so we should take that into account).

I’d also like to see the PEP title changed, as it’s currently basically meaningless. Something like “Implement (whatever it is we’re implementing) for PyPI using TUF” would much better explain the proposal - and would make searching for the relevant PEP a lot easier as well!.

PEP title: I suggest “Secure PyPI downloads with package signing” which gets across:

  • this is about downloads (not uploads)
  • this is about PyPI (not other tools)
  • the way we do it is with something having to do with the signing of packages

Trying to keep the PEP title short but descriptive, how about “Improved PyPI download integrity via signed packages”?

The problem I have with “signing” is that it implies authors will do the signing, which is precisely what we want to avoid. To use @dstufft’s analogy again (which I like a lot) HTTPS uses signing, but internally - to the end user it’s not about signing at all, but about secure communication.

1 Like

I’m a coauthor of PEP 458 and a maintainer of TUF. I wanted to weigh in here on some of the issues raised about the PEP in this discussion, and present some ideas for moving forward.

First, I want to note that while I am a maintainer of the TUF project, my goal here is to enhance the security of PyPI. While I think TUF will help achieve the security goals specified in this PEP, if the community has other ideas or would like to pursue alternative strategies, I would be happy to contribute to and support those as well.

It seems that the title and introduction to the PEP are causing some confusion about the purpose and scope of this PEP, and that some of the prior discussion about this PEP did not make it into the text. To that end, I have some suggestions for areas to discuss here and in the PEP:

  • Change the title: We need something that better reflects the contents of the PEP. I suggest something like “Supporting end-user verification of PyPI Packages”, but am open to other suggestions.
  • Change the introduction to include a brief description of the problem the PEP solves, and what entities the document affects. As mentioned earlier in this thread, this PEP solves the problem of making sure that users get the specific packages that PyPI serves (even if they use a mirror), and providing some protection in case PyPI is compromised. These changes will affect only the PyPI infrastructure (not package maintainers). In addition, they allow PyPI consumers like pip to verify packages. Though most of this information is already in the PEP, the project goals are not clearly stated at the beginning of the document, and this should be addressed.
  • Include a description of how a client can interact with the solution proposed in PEP 458.
  • Contrast the proposed solution with other solutions. We need to explain early in the document how TUF is different from other solutions like TLS and GPG signing.
  • Consider including an out of scope section. There was some discussion here about the issue of typosquatting packages, and there are many other security issues this PEP does not cover. Rather than a long and incomplete list of what this PEP does not do, I would prefer a clearer definition of what the PEP does do as described above. However, if others think an out of scope section is necessary, please suggest what content belongs in this section.
6 Likes

First, let me acknowledge my conflict of interest, having written the PEP, and since I am also one of the lead researchers and developers for TUF.

Having said that, I cannot think of any other system that provides the level of security that it does while preserving usability for developers. We have put years of thought into this, especially as we learned from experience.

Other solutions (such as Certificate-Transparency-like solutions) may be simpler from a usability point of view, especially as there is literally nothing for developers to do (just like this PEP, actually), but it does not provide any trustworthy information whatsoever as to who wrote the original source code, and who or what built the source code into packages.

OTOH, you can combine TUF with other solutions to provide end-to-end guarantees that your package was not tampered with anywhere between, say, Django developers and end-users. Indeed, this is what we have done with the Datadog Agent integrations.

We thought that it would be good to see the same level of usable security on PyPI, which is why we originally wrote this PEP. Let us know if you have more questions.

1 Like

Thanks for summarizing these points @mnm678!

Does anyone have anything to discuss regarding the suggestions? I notice that there have been a number of “Like” responses from those who have raised concerns, but those don’t seem particularly actionable :slight_smile:

As someone who “liked” that post, my intention was to express support for the idea of doing those things. I don’t have a proposal as such on any of them, I’d like one of the PEP authors to respond with a proposal that can be considered and discussed. To be honest, as @mnm678 is one of the authors, I was hoping that the number of “likes” would encourage her to do so.

I was sort of hoping you’d provide answers to the questions already posed, before we started asking further questions. That way, the discussion would feel more like a dialogue, which is honestly where I currently feel the process is failing here.

I’m getting very frustrated at this point. There has been a lot of feedback on this thread, and little or no response from the PEP authors (other than @mnm678’s post, which was a good start but @EWDurbin’s follow-up seems to imply that it’s not going anywhere without more input from the people who have already commented :slightly_frowning_face:). There still seems to be this misunderstanding that the response to feedback should be a revised PEP - and that’s absolutely not the point here, we want a discussion first with consensus before the PEP gets updated.

Honestly, as things stand I don’t feel that I can support this PEP. Not for any technical reasons, but simply because it’s failing to follow the basic principle of community consensus. Obviously if the authors are ignoring the need for consensus, they can also ignore my views, and I guess there’s not much I can do about that. But it seems a shame if we can’t reach agreement here somehow.

1 Like

I am not a PEP author, yet I do think TUF is what we need.

As a bystander on the conversation, I’m somewhat confused as to what you mean with this: do you either mean that the suggestion should be somehow materialized somewhere before or after community consensus? Are you not supporting the PEP (RFP?!) because you don’t feel there’s consensus (yet?). It appears to me — again, only an outsider — that the process is somewhat unclear even to people inside of the community, and thus I’d expect more leniency. I’m hoping that you, as a more senior community member, could perhaps help with specific pointers on how to help drive community consensus (be it pro or against)?

As far as I’m aware, here are the questions (apologies if I missed any):

  • Does TUF help against typosquating? (or any other signing solution for that matter?)
    no it doesn’t, as far as I’m aware.
  • Should the name reflect that?
    definitely, I think the authors could clearly outline the limitations of this and arguably any other repository/package signing /integrity-solution.
  • Does this mean that developers must sign all packages when uploading to PyPI?
    as far as I’m aware, no. That’s part of what’s discussed on PEP 458/480. Those who do not want to manage their keys will have packages signed by an automated element running in the PyPA infrastructure (again, I’m not in a position of authority to clarify this)
  • Review how other ecosystems handle the issue (also grouped up with "this has never happened to PyPI as far as we are aware).
    I think there’s quite some literature review around it, as well as other successful implementations in other ecosystems. What exactly where you thinking would pass the bar of “enough of a review”?
  • Backwards compatibility with non-signing solutions.
    I think @dstufft’s HTTP metaphor pretty much outlines how this would behave :slight_smile:

I believe it’s possible to work out a plan to drive community consensus around those questions. What I’m not completely aware of is how to actually turn this into something that’s actionable.

edit: for some reason I was trying to format this and got sent, apologies.

Hey @pf_moore. I think – to second what @SantiagoTorres just said – the PEP authors here are genuinely trying to work with the community here and build consensus, and have benefited from and could use more mechanical guidance on how to do that, because their previous attempts evidently didn’t work quite right. (Examples of such guidance: “do as much work as possible in discussions on Discourse/mailing list, including things like suggesting specific textual changes and reviewing/asking for review, even though we’re generally used to doing that work in GitHub pull request reviews when we work together on code.” “When a community member asks a question or shares a criticism of the existing proposal, it’s better to have a back-and-forth conversation with lots of little iterative questions and answers, instead of the proposal author coming back with a multi-paragraph essay or proposal revision. Yes, even if that feels spammy to you at first and means the thread ends up being hundreds of posts.” Are those right? I’m inferring. PEP discussion experts – are these right? In particular, is there newish updated “what to do where” guidance on PEP stuff that reflects our new Discourse reality?) And here’s another example:

“Liking” a post on Discourse does not have clear semantics. :slight_smile: It could mean “I like how you expressed this” or “I’m glad you spoke” or “welcome” – it does not clearly mean “yes, please do the things you have proposed.” So now I am glad that you have explicitly said to @mnm678 that you want her to go ahead and expand on those items.

In the absence of @dstufft as BDFL-Delegate explicitly guiding the discussion here, I suggest that @mnm678 go ahead and share – here in the thread, not as a pull request – some thoughts per her previous post, especially/starting with:

1 Like

I will aim to answer the current questions in this post. In addition, I will start a list of action items and proposed changes to the PEP based on the discussion in this thread. These are meant as a starting place, so feel free to give feedback and discuss.

Here are my answers to some of the questions from the thread. It’s a long thread, so please let me know if I miss any pressing questions.

No. PEP 458 and TUF solve a different security problem (making sure that users get the packages that are hosted on PyPI)

Yes. There was some discussion earlier in the thread about options for new names for the PEP. I think more discussion here is needed to form a consensus.

PEP 458 does not provide any mechanism for developer signing of packages. It adds signatures after packages are uploaded to PyPI. An extension of this work (such as what is described in PEP 480), would allow for developer signing to further extend the chain of trust (developer to PyPI as well as PyPI to user)

Here is a current list of action items:

  • Change the title: This needs more discussion. Some earlier posts with various proposed titles were a good start, but there doesn’t seem to be consensus yet. Hopefully more clarification about the goal of the PEP will help here.
  • Add a couple paragraphs to the beginning of the PEP (probably the abstract) to give an overview of the PEP. Here is a draft of that section as a starting place:

Attacks on software repositories are common (https://github.com/theupdateframework/pip/wiki/Attacks-on-software-repositories). The resulting repository compromises allow attackers to replace popular packages with malware that looks like the original package. In addition, an attacker on a repository can use any online keys (like those provided by TLS) to validly sign arbitrary packages. These and other attacks on software repositories are detailed here. This PEP aims to protect users of PyPI from malicious packages and to provide a mechanism to recover from a repository compromise.

To provide compromise resilient protection of PyPI, this PEP proposes the use of TUF. TUF provides protection from a variety of attacks on software update systems, including arbitrary software installation and rollback attacks, while also providing mechanisms to recover from a compromise. It does this by generating signed metadata using threshold signatures and offline root keys that can be used to verify the accuracy and timeliness of software packages. More details about TUF are included later in this PEP and in the specification.

This PEP describes the changes to the PyPI infrastructure required to produce TUF metadata. These changes should have minimal impact on other parts of the ecosystem. The PEP focuses on communication between PyPI and users, and so does not require any action by package developers. Developers will upload packages using the current process, and PyPI will automatically sign these packages. In order for the generated TUF metadata to be effective, additional work will need to be done by PyPI consumers (like pip) to verify the metadata provided by PyPI. This verification can be transparent to users (unless it fails) and provides an automatic security mechanism. There is documentation for how to consume TUF metadata in the TUF repository. However, changes to PyPI consumers are not required, and can be done according to the timelines and priorities of individual projects.