PEP 458: Surviving a Compromise of PyPI

For the record, Donald has since replied. Thanks!

1 Like

I have merged PR #1203.

Please try to have discussions here, not on GitHub.

When a technical question is resolved here, one of the co-authors should submit a new PR to GitHub, which will be merged quickly by one of the PEP editors (Chris Angelico, @barry, @brettcannon and me).

The BDFL-Delegate (@dstufft) should guide or moderate the discussion here as he sees fit, until he is happy with the PEP, at which point he can approve it – and let the SC know. (Or he can reject the PEP or defer it or ask the authors to withdraw it.)

1 Like

Thanks @guido.

PEP 458 is ready for community review and – per these RFI threads and this Discourse discussion – the plan is for contractors to start work on PyPI this month (on implementing the foundations for cryptographic signing (and malware detection, which is not relevant to this PEP)). @EWDurbin will be managing that.

1 Like

A bit of feedback: to a relative outsider of the packaging world, various references to “The RFI” (or “The RFP”?) are very mysterious. It took me way too long to understand what was going on (there’s funding from Facebook for PyPI and/or pip work?), and I’m still not sure! Next time can you all be a little clearer about that? E.g. “PyPI Q4 RFI” as a toplevel label is quite mysterious (Q4 of what year?).

4 Likes

I 100% agree with Guido here. Even as an insider of the packaging world, I can make little sense of this PEP. I just skimmed the current version of the PEP and was confronted with a wall of text covering security related issues, that I don’t really follow. And yet, from @sumanah’s comment, it sounds like we might have people lining up to implement this.

In principle, that’s fine - I’m not a security specialist and I’m happy to leave the security decisions to those who are - but I do have some fundamental questions here:

  1. As a package author, will this affect me? Will I be expected to generate/provide some sort of new “trust keys” when I publish my packages?
  2. As a member of a team working on a package (pip) will the answer to the above change if my project wants to “opt in” to something? Sorry - that question is confused (because I am). Basically, though, if the answer to (1) is “nothing new is required, it’s optional”, how easy will it be for me to conform if a project I work on decides to “opt in”? I can happily decide for my own projects that this is all too confusing and opt out, but who will clarify what I need to do if I work on a project that opts in?

I think what I’m saying is that the PEP needs a high level summary of how it affects various user groups:

  • Package consumers
  • Package authors
  • Members of teams of package authors, who may not agree with the “team view”

(also people like PyPI admins, but I consider them to be the target audience for the rest of the PEP!) If I missed such a summary, then it needs to be more prominent :slightly_smiling_face:

2 Likes

Very good feedback, thanks Guido and Paul!

We shall aim to fix these issues in our next PR:

  1. Clarify how PEP 458 is related to the PyPI Q4 2019 RFI.
  2. Clarify that PEP 458 does not require any action from package authors (unlike the case with PEP 480).
  3. Clarify that pip will use exactly the same metadata download and verification workflow regardless of whether a package is signed by PyPI (PEP 458) or package authors (PEP 480).
  4. Clarify that the changes are to PyPI and pip, and does not affect package authors yet.

Does this sound good, Guido and Paul? Do others have other feedback that they’d like us to address?

Cc @lukpueh

1 Like

I’d rather have a discussion here that results in a PR, rather than responses to the questions being in a PR that I then either have to review on github, or spot the differences in the next revision of the PEP.

“Does not require” is not quite what I asked. Is there any option for package authors to do something different based on this PEP?

I guess the implication here is that PEP 458 is about server-side signing whereas PEP 480 is about client-side signing? Which prompts me to say that the title of this PEP (“Surviving a compromise of PyPI”) gives me no clue that this is what it’s about.

“Yet” worries me here. Either this PEP affects package authors or it doesn’t. There’s no “yet” about it. If what you mean is that a different PEP (480) affects package authors, then that is irrelevant to this discussion. It’s entirely possible that this PEP gets accepted but PEP 480 doesn’t. Or the other way around. Either way, this specific PEP needs to stand on its own and if it doesn’t affect package authors, say so clearly.

Not really, to me, sorry…

I think you’re missing the point - posting here isn’t to gather feedback that you can go away and address, it’s to have a discussion, where you engage with the community and come to a consensus about the way to address people’s concerns. As it stands, this PEP seems to have been developed largely out of sight of the community, and that’s what bothers me. Maybe no-one will be interested in having an extensive discussion, given that this is a pretty specialised proposal. But people should still be given the opportunity to engage, and discussions on a github PR in the PEPs repository isn’t, in my view, sufficient for that purpose.

In the interests of saying something about the actual proposal, rather than just about the process, I’ve just gone through the first part of the PEP, and I’ll add some comments.

The abstract does actually give a reasonable overview of what’s being proposed, as long as I take “implement TUF” as a goal in itself, and don’t ask why (there is more on why later in the PEP, which I’ll come to, though). But it does state that “This PEP does not prescribe how package managers, such as pip, should be adapted to install or update projects from PyPI with TUF metadata.” So you’d be perfectly happy if pip didn’t change, and continued to ignore the TUF metadata? That seems unlikely. Certainly I could imagine the PEP merely saying that installers should check the metadata (although I’d also expect a non-normative comment that the PEP assumes that pip will do this), but to merely say “we’ll do a bunch of work to add some metadata that no-one needs to use” seems to be going too far - or to look at it more cynically, to be avoiding the question of what the implications are for pip and other installers. One immediate question I’d ask is how much overhead, both in terms of speed and size of vendored libraries, this would add to pip. I think some idea of the cost to installers is entirely relevant to the PEP.

The motivation section adds some background, but I was somewhat confused by it. It talks about the wiki breach causing loss of data, but the description of how TUF would help PyPI doesn’t discuss data loss, it’s all about detecting (malicious) corruption, which is a different issue (albeit just as serious). So I was left feeling that while I agreed with the that doing something is warranted, I’d been subjected to some sort of “bait and switch” where the motivating event is unrelated to the proposed solution.

After this point, the PEP got too detailed for me, so I won’t go any further.

3 Likes

I have to agree with Paul – this response does not make me feel any better. It seems the RFI was handled largely out of sight of core devs and community, which feels odd given there’s a prize of $400,000 to be paid for implementation work that is incredibly important to the community (even though it’s technically not the PSF’s money). Even the start of this thread makes me feel left out: there’s one message asking when feedback will be received from Donald, followed by a note saying that Donald replied, but with no link to his reply. (If you had needed a private reply, why post here?)

I don’t have time today to even skim the PEPs, but it feels like there’s some kind of culture clash happening. Maybe PyPI and packaging should just not use the PEP process if you all aren’t interested in interacting with the community?

2 Likes

FTR, I feel like I’ve been sufficiently involved in the discussion (though my concerns were declared out of scope, and I’m not a highly-impacted pip or Warehouse maintainer, just someone who was hoping for a general/public solution to a problem I have to solve anyway).

The clash is with the people behind TUF, not the packaging community in general. There’s another discussion about splitting packaging standards out from the PEP process, but I don’t believe it is in any way connected to this one. (Brett and Nick are directly involved in the other thread on behalf of core-dev.)

There have been some miscommunications here but no one has meant to
bypass the community.

As I see it, the TUF team conversed with Python packaging leadership
quite a lot, but not all in one place (such as Discourse), and often in
conversational spaces that have far fewer core devs. I think this is
what happened:

The TUF team conversed with the distutils-sig list at some length when
originally developing the PEP in 2013-2015, e.g.,

and continued replying on the list when people raised questions, e.g.,
TUF, Warehouse, Pip, PyPA, ld-signatures,
ed25519

in 2018, or proactively, e.g., Summary of PyPI overhaul in new LWN
article
.

The PEP remained in Draft status till March of this year, when Brett
consulted with Donald and updated its status to
Deferred
.

In 2018 Facebook gave us a
gift

(it’s $100,000 USD) to be used for PyPI security work, specifically on
cryptographic signing and malware detection. (Sorry for the confusion
here – the $400K you’re thinking of is the funding we just got this
year for pip dependency resolver
work
.) Now that
we had funding, it seemed more likely PEP 458 could be implemented, so
there was more discussion – for instance, at PyCon NA this year,
several TUF folks had a long conversation with several packaging leaders
to talk about feasibility and implementation.

Then, later this year, I publicized the Request For Interest (which
included seeking comment on PEP 458), and did not ever mean to handle it
out of the sight of core devs and community. I publicized it
here

and on distutils-sig, and had one month earlier publicized to Discourse
that the RFI was
coming
.

In most of these cases, in retrospect, I did not use the phrase “PEP
458” in my subject line; I probably should have done so more often.

The TUF folks asked me for advice on getting further with the
PEP
and I got confused
and said (a month ago):

“Current status: python/peps#1203 is awaiting review from @dstufft to
revise PEP 458. After that, there needs to be a discussion on
[Discourse] to get the PEP from ‘Draft’ to ‘Accepted’.”

A few packaging maintainers had shared critique in the pull request and
the PEP authors were responding to it. I should have said that the
Discourse discussion needed to start right away and not wait for the PR
review. I’m sorry about that.

When we were waiting for a review from Donald, I suggested that Trishank
post his nudge publicly rather than needlessly do so privately –
Donald’s reply came in the form of a review on the PR that Trishank had
already posted. I’m sorry that my suggestion went badly and made you
feel left out, Guido.

I don’t mean here to take on the mantle of BDFL-Delegate from Donald, or
the role of TUF implementation manager from Ernest, but since my (in
some cases suboptimal) advice and publicity work is part of the reason
for the current situation, I figured I should share my assessment.

1 Like

It took me a while to realize what I don’t like about PEP 458. It mixes the issue “How to surviving a compromise of PyPI” with a technical solution (TUF). It feels like the PEP is tailored for TUF without exploring alternatives or even verifying if the PEP is asking the right questions. TUF might be the only viable solution, but it’s impossible to gauge when the text is written as “Any security framework you like as long as it is TUF”.

I also feel like the PEP is not written for consumers of PyPI. It gets very technical very fast. Although I’m working in security engineering and deal with security on a daily basis, I find the PEP hard to read. Perhaps it would help if you split the PEP into a general, less technical and more user oriented PEP and a technical PEP for TUF+PyPI. @steve.dower did a really good job with PEP 551 and PEP 578.

I would like to see a general and user-oriented PEP about PyPI security to answer these questions:

  1. How is a package owner/maintainer able to verify that PyPI is serving correct and unmodified files?
  2. As a user of PyPI how can I make sure that pip installs correct and unmodified packages?
  3. As a user of PyPI how can I protect myself against typo-squatting attacks or compromised versions of a package?

Personally I see malicious content and package trust as a more pressing issue than a compromise of PyPI infrastructure. As a member of the Python security team (PSRT) I’m getting reports about typo squatting or malicious packages every week. (Fun fact: There we four email threads about malicious content on PyPI this month and today is just Dec 4.) There might be easier ways to detect a compromise of PyPI until TUF is implemented, e.g. PyPI could push SHA-256 hashsums of all files to a git repo or to an append-only database similar to Certificate Transparency?

5 Likes

This is also my concern (and I’m also on the PSRT, and have insight into Microsoft’s tracking of certain attacks).

As far as I’m aware, nobody has managed to corrupt PyPI’s data stores in a way that TUF would detect. But every day people are uploading/downloading packages with malicious content that are “legitimate” according to the metadata. Without a publisher attached signature, users have no way to determine trust based on who uploaded it.

The workaround most of us are going for is to create private PyPI mirrors with curated packages (on DevPI, Azure Artifacts or (IIRC) Artifactory). pip is slowly but surely getting the kinks out of its support for alternate indices and making this more viable, though many take the opportunity to switch to a multi-language package manager.

Fundamentally, package (and publisher) reputation is far more critical right now than Warehouse-internal tamper detection. And if there have been attacks that would justify this particular investment, it would be great to hear about them (even sent to just the PSRT).

4 Likes

Thanks for the clarifications, Sumana. I guess one thing I am asking then: When an important discussion like this is held on Discourse, occasionally post a link to python-dev to alert the core dev team and other interested parties of the process. I have no problem following a thread on Discourse, but I don’t regularly scan Discourse for new threads, and Discourse doesn’t send me useful emails. I do receive python-dev in my inbox and always scan it for important topics.

I think this is precisely right. In my view, the PEP process and discussions on Discourse and the mailing lists are a perfectly fine way to involve the community in packaging proposals (and I very definitely think the whole community should be involved - to the extent that they want to be - not just packaging specialists).

The problems here seem to be twofold:

  1. Discussions on a github PR are not part of the normal PEP process, and many people don’t track them. IMO, PR reviews should be considered as private conversations, and always summarised back to a more public forum.
  2. There was a fairly substantial discussion (as I understand it) at the packaging summit at Pycon. Face to face discussions like this have always been controversial - they are very high bandwidth, which is a huge benefit, but they are also very exclusive - even if the results are summarised and posted to the lists, the summary doesn’t give any sort of feeling of the “flavour” of the discussion, and so readers miss out on a lot of context and nuance. And to be frank, most face to face discussions aren’t even summarised very well, so really “you had to be there” is the norm.

Add to that the fact that this topic is of interest to a limited group with pretty specialised skills, and it’s really hard to find a good way to engage with the community. But that’s precisely why it’s important to try even harder to do so. And IMO, that’s where the PEP authors here need to improve things. We’re now in a situation where there’s available funds and a proposed implementation, and we’ll see a lot of pressure to “not let bikeshedding and debate get in the way of the opportunity” - whereas I think rushing something in without proper community oversight is actually worse.

As an example, @tiran made some significant and important comments here, and these deserve to be addressed. @steve.dower has similar concerns.

The history here really isn’t too important. No-one acted in bad faith, there’s just been a series of misunderstandings and miscommunications. But thankfully, we’ve exposed the issue and can now hopefully have a productive discussion here that will converge on a proper community-approved solution that addresses the issues. I trust that the PEP authors will engage with that discussion, and can be impartial in discussions over “TUF versus other solutions”, but if they feel that they have a conflict of interests and are unable to represent non-TUF solutions fairly, then I’m equally sure we can find other champions for such solutions. The PEP process has handled far more complex debates in the past, so there’s no reason to feel that it can’t handle this.

2 Likes

This confusion could easily have been avoided if the RFCs had been named more imaginatively.

To follow up some of the discussion here (and to jump around in order a bit to speak to the non technical stuff first):

While I’m one of the people in the other thread advocating to not use the PEP process, it’s got nothing to do with not interacting with the community. It was mostly my fault, I’m in the middle of a job search (well tail end of it now hopefully) and I simply got lazy (or more charitably, I was distracted) and answered inline in Github rather than asking to have it published here and answered here.

Most of the other discussion around this PEP did occur in the open, it was just quite some time ago and on the disutils-sig mailing list. I do not believe that we had an in person discussion about it at PyCon (other than people saying they wanted packaging signing, and pointing them to PEP 458) but I could be misremembering that.

No. This PEP completely transparent to package authors and to package consumers (except if it detects something going wrong, as it does add additional failure modes that may be exposed to end users).

In the general case, no (for pip yes but for different reasons).

No.

Fundamentally the impact to various groups (at least in this PEP) is roughly the same as going from HTTP to HTTPS was. For your average publisher/consumer of Python packages, it’s just something that the tooling handles for them, for PyPI it requires changes to the infrastructure and for a tool like pip it requires code to handle the new protocol.

It sort of isn’t I think? Random consumers of PyPI basically shouldn’t care about the details of this PEP anymore than they care about the details of how TLS works.

I don’t think a PEP like this is actually super useful, this feels to me more like something that should be documented either on PyPI or as part of packaging.python.org. I mean it could be documented as a PEP as well, but ultimately to me it’s not really asking for a proposal, it’s just documenting how to-s.

Particularly (3) isn’t related to PEP 458 at all, since it doesn’t seek to solve that problem at all (I could think of some ways to make it part of the solution, but it would b e apretty minor part overall). (1) and (2) the answer as far as this PEP is concerned is “you don’t have to do anything, it gets handled the same way TLS ensures that an attacker with a privileged network position can’t change the content of responses”.

While we could discuss the relevant merits of different problems to focus on solving, we don’t really get to dictate what exactly we have people willing to work on. This particular PEP was written by volunteers (If memory serves me correct they were grad students at the time) and wasn’t directed work from the PSF or the community. We don’t get to tell volunteers what to work on (unless they ask us), so we have to judge contributions as they come in, not what we’d rather they work on. So in terms of whether we ultimately decide to accept this PEP we don’t really get to decide the effort spent writing/discussing it would be better spent on something else.

We do have some funds that we plan on using to implement this PEP if it is accepted, and perhaps @sumanah or @EWDurbin could better answer this question, those funds were given to us with the understanding we’d use them to implement, among other things, “cryptographic signing” (though it was left open ended what exactly that entailed) so even in that case we have limitations on how we’re able to direct work to be done since part of it needs to be implementing a cryptographic signature scheme for PyPI (part of it is also implementing malicious package detection, but that doesn’t have a PEP because it’s just a new feature of the PyPI code base and doesn’t have ramifications for projects beyond PyPI really).

So AIUI, the current state of things is we’re funding both things, but we need to do both things, or we need to give back (or something? I don’t really know how it works) a portion of the money since we’d ultimately be deciding not to move forward with implementing a feature that satisfies the conditions it was given under.

So in terms of package signing we have 3 options:

  1. Discuss/refine this PEP it until it’s ready to be accepted and then move forward with it.
  2. Someone else volunteers to write a competing PEP and we discuss the relative merits of them both and choose between them.
  3. We do nothing and we let the PSF figure out what the fallout of not implementing package signing would be.

Personally I think we should do either 1 or 2, but unless someone steps forward to do (2) our only real options are 1 and 3 and since I both agree with the idea of getting packaging signing onto PyPI and that TUF is, to me, the best option i’ve seen I’d love for that answer to be (1) (barring someone coming forward to do 2 and coming up with something even better)

PyPI? Not yet, other similar repositories? Yes they have (see for example Rubygems).

There’s also a wider impact to this as well. We currently rely on TLS to ensure that the bits that a user gets is what PyPI is trying to serve them. That works reasonably OK if a user is talking to PyPI itself, but what about mirrors of PyPI? Right now there’s no mechanism in place to ensure that if you ask for “foo-1.0.tar.gz” from a mirror of PyPI, that you’re actually getting what PyPI thinks foo-1.0.tar.gz is. This is a big problem in places like China where access to PyPI is incredibly slow (I’ve had private reports of pip install taking days to complete if you’re not using a mirror there, just due to the bandwidth limitations of trying to go through the great firewall). Currently the only real solution for those people is to either suck it up, or install from a semi random mirror inside of the great firewall. We could try to have the PSF put a presence inside of China, but that only solves the problem for China not for other locations, and specific to China I believe the legalities of doing that are kind of tricky (but that’s a better question for the PSF itself, not me).

Like I said above, this isn’t really related to this particular PEP, unless your argument is this idea isn’t useful and we should only focus on that.

2 Likes

I’m not trying to make the argument now, just wanted to add the context for the new participants that this was the argument I was making when (allegedly) we hadn’t decided to fund this PEP.

I am concerned that we need to be clear that this feature doesn’t help protect against legitimately uploaded malicious packages, which is currently 100% of the malicious packages we’re aware of. As long as that’s communicated clearly (including to all the media sites who will no doubt write this up), I’m fine.

The last thing I want is for people to get the sense that PyPI security is “solved” and stop seeking to invest in it (and I don’t think this is overblown - I’ve heard people legitimately call PyPI “fundamentally insecure” because of a typosquatting attempt).

1 Like

We have limited ability to control how other people communicate this potential change but I know I certainly don’t consider this “solving” PyPI security (or that security is something that can be “solved”, it’s like saying performance is “solved” you can move the needle in one direction or the other or solve specific issues, but you can’t, as a broad statement, “solve” it).

I’m personally unlikely to be the one doing much communication around this, but I do think it’d be great to call out in the PEP itself that this particular problem is out of scope for this PEP (not that it would be an exhaustive list of what’s out of scope, since there is lots of security related work that this PEP does nothing for, but since that is a pretty big topic and there’s a decent chance of confusion, it would be great to be explicit in the PEP).

2 Likes

I also do want to be explicit that this PEP isn’t a forgone conclusion. While I do currently believe the premise of it is a good idea, if others disagree with that I want that discussion so we can figure out if it is in fact a good idea or not.

2 Likes

That’s precisely the sort of thing the PEP should make clearer.

To be clear, I have no problem with doing (1). But that doesn’t mean that the PEP shouldn’t be sufficiently clear to allow non-technical users to read and understand enough to know what it’s providing (to use your analogy, I don’t know how https is implemented, but I know what it’s for, what it protects against, roughly how it does it, and importantly, what it doesn’t protect against - the PEP should give the same level of understanding here).

There’s a somewhat new situation here that we’re having to navigate. We have got some volunteers, we’ve got some money to let them do what they propose, but we still need to ensure (as a community) that we want what they are offering, and someone is willing to pay for. Having known community specialists like yourself support the proposal is a good step in that direction, but it’s not the whole story.

Some other things that are typically covered in a PEP but which are missing here:

  • Review of how other ecosystems handle this issue. This data integrity issue isn’t unique to Python. How do other languages (rust, javascript, ruby) and distributions (Red Hat, Ubuntu, Homebrew, Microsoft (nuget)) handle it?
  • Discussion of how “PyPI consumers” should implement this. In view of our principle that we avoid implementation defined behaviour, I’d like to see an explanation of how a tool that wants to consume data from PyPI would implement the consumer end of the protocol. Presumably in terms of using the TUF library from PyPI. I don’t think it’s acceptable to expect tools to copy pip’s implementation. (An obvious example of a tool would be distlib, and we have a goal to make it easy to write new standards-based tools, so we should take that into account).

I’d also like to see the PEP title changed, as it’s currently basically meaningless. Something like “Implement (whatever it is we’re implementing) for PyPI using TUF” would much better explain the proposal - and would make searching for the relevant PEP a lot easier as well!.