PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks

jamestwebber · July 6, 2023, 2:51am

This at least is trivial to solve with a -y flag, which related tools have (e.g. conda, apt install).

One could even introduce this as entirely optional, with an -i/--interactive flag that prompts the user before installing. That mode could be recommended for specific users who are most concerned about this sort of thing.

It doesn’t solve most of the concerns in this thread, but it’s a common feature in this space.

dstufft · July 6, 2023, 3:19am

Well that solves the interactivity part of it, but it means that any security benefit the relies on someone inspecting the list prior to install is gone.

jamestwebber · July 6, 2023, 3:27am

For sure. I don’t think it’d be a particularly great security feature ^[1] but it might be a nice feature in general for some people. I’ve definitely pip installed in the wrong environment and only realized as it started to resolve more dependencies than I expected.

and is probably off-topic for that reason ↩︎

pelson · July 6, 2023, 11:12am

I appreciate the analogy, but as always they can only go so far. In this case, the analogy breaks down since the possibility of DoS attacks weren’t introduced by the design of the secure transport layer. What I’m trying to get across is that PEP-708 is trading a severe (arbitrary code execution) exploit for a (significantly) less severe developer productivity one (which can’t easily be solved by the developer). The thing that I’m focusing on is that this may not necessarily be a trade-off we have to pay in order to solve the underlying problem.

I don't want to get sucked into solutioneering in this thread, since it is explicitly about PEP-708, but I don't see that this statement HAS to be true... (this is where I get sucked in)

If the index is treated as the configured namespace provider, then every namespace-less project name (incl. dependency definitions) can be reasonably assumed to default their namespace to that index. In other words: “if you’ve uploaded it to pypi, we assume your dependencies are also on pypi”. The constraint would be that no index has the right to define new projects on a different namespace (so the torchtrition case would have failed in the client from the outset, and would have required that the namespace of the extending index be used for torchtrition, and an explicit namespace declaration would be required to use it as a dependency; at least until the point that torchtrition was registered on pypi.org). This is perhaps your rejected Require all projects to exist in the “default” repository, with the addition of a namespacing concept and replacing the word “default” with “configured index” (pypi.org by default, defined by index-url).

From the original “dependency confusion” use case’s perspective (a pypi.org proxy + a local index), you would then expect to be running an internal index (configured in pip via index-url) which is its own namespace AND is a broker of other namespaces (e.g. pypi.org). That index would be responsible for tracking the namespace of a project, and this would be the default value used for its dependencies (declared without a namespace).

I don’t know if it is necessary to prohibit projects of the same name with different namespaces from being installed together - it isn’t obvious that this is solving the underlying problem that project != package name(s), though it is a reasonably strong indication that pypi.org::prjA and my-index::prjA are likely to have package name collisions. Ultimately, this is another (and existing) form of “dependency confusion”, which would require core langage level (i.e. import mechanism) changes to solve properly (possibly re-using the namespace concept). Though we could avoid making the situation worse by explicitly prohibiting projects of the same name from different namespaces being installed together.

From a user perspective, it would be entirely reasonable to take the default name from the index-url. Similar syntax to what you propose would be necessary only to retrieve projects from a different namespace.

This would also be fully backwards compatible - existing project dists would continue to work with updated build and client tools (until you reach a dependency confusion, at which point they would error). To support namespacing the build tools and clients would both need to be extended to support namespacing (esp.) when declaring and requesting dependencies. However, newly built packages with namespace declarations would most likely not be compatible with old clients.

I would be happy to engage on this, perhaps that should be in a separate topic though?

The objective for me in this thread is not to propose a solution, rather to highlight that the PEP is:

very nuanced (i.e. it is easy to misunderstand, which can lead to index misconfiguration and… dependency confusion)
not like other solutions out there in the wild (e.g. scopes / groupId) - this will need special casing for each of the “software repository” tools (e.g. artifactory, nexus, azure artifacts, devpi, etc.) if they want to support this “extra-index” use case (perhaps they won’t bother, since you can just create a repo which does the index grouping in a configurable way)
doesn’t fully “solve” dependency confusion (since index operators still need to use mechanisms such as priority ordering, not mechanisms proposed in PEP-708, to actually resolve name conflicts), it simply prevents the code execution part of dependency confusion
introduces its own (significantly less severe) “dependency confusion” problem (one day you can install internal-project-x, and the next day somebody registers internal-project-x on pypi and it suddenly stops working, with no remedy proposed in this PEP).

A few questions I ask myself regarding this PEP:

Does it prevent a real and important problem? Yes, by making the client secure by default
Does it iterate us towards a solution for the original dependency confusion problem? No, I don’t think so (even though it clearly is better to raise/stop than it is to blindly install dependency confused projects, and we shouldn’t let perfection be the enemy of the good). I don’t believe that the PEP will be useful to solve the underlying problem (but does provide additional tools to solve the problem for the --extra-index-url case with today’s index name ambiguity).
What would I do if I were the PEP delegate (I’m not, and grateful that such a tough decision is Paul’s to take )? I would want high confidence that the effort that would go into implementing and following-up with the PEP couldn’t be instead invested in solving the underlying project name ambiguity through repo-level namespacing (of some kind). Perhaps it has been discussed and rejected in detail elsewhere (happy to be pointed to a canonical reference if it exists! FWIW, it isn’t Namespace support in pypi, as that is about having multiple namespaces within a single repo). I will happily start a new discussion on this topic, if that would be worthwhile?

pf_moore · July 6, 2023, 1:49pm

I’m going to respond just to this point.

I disagree with you that “dependency confusion attacks” include making projects fail to install. My understanding of the term is that it’s about allowing an attacker to install malware by publishing a project whose name shadows that of an existing project. On that basis, preventing installation is the solution, not the problem!

For what it’s worth, I do have the high confidence that you demand, that implementing a namespacing solution would be a much bigger effort, involving many more parties, than PEP 708. The effort to implement PEP 708 would be a drop in the ocean compared to that work, so I don’t see blocking PEP 708 in case we want to invest that effort in an as-yet-undefined alternative proposal as being a reasonable trade-off. If there were an actual PEP proposing a solution that in addition to preventing the installation of malware, also allowed “normal service” to continue uninterrupted even during a dependency confusion attack, I would definitely take that into consideration. I’d be particularly interested in its answers to the following questions:

Will this fix the issue for all projects, or only for projects that have been modified to participate in the solution (use namespaces for their dependencies, or whatever)?
What is the transition plan, and how long would it take to complete?

But as it is, there’s no such PEP, and I don’t intend to let vague assertions that “we might be able to do better” without any supporting evidence influence my decision.

To put it another way, the responsibility is on you to convince me that there’s a practical alternative, not on me to demonstrate that there isn’t one. After all, adding “step 1 - revert PEP 708” to any new proposal is hardly going to break it.

trishankatdatadog · July 6, 2023, 1:52pm

The PEP does go into why such a solution was rejected (for now).

steve.dower · July 6, 2023, 2:02pm

A future namespacing or scoping feature would easily layer on top of PEP 708 anyway, if you view it as a way to inform the installer which index should be preferred for a given package. I still think it works better as a constraint than a namespace, given that dependencies between packages already exist,^[1] but it is likely more logical in a PEP 708 world than it would be without this initial change.

And would be less ambiguous under “require foo to come from private-index” compared to “install private-index:foo and fail(?) if any other libraries depend on pypi:foo”. ↩︎

dstufft · July 6, 2023, 2:21pm

The developer can solve any “DoS” introduced by PEP 708 if installers follow the recommendations in the PEP.

This PEP avoids dictating or recommending a specific mechanism by which an installer allows an end user to configure exactly what repositories they want a specific package to be installed from. However, it does recommend that installers do provide some mechanism for end users to provide that configuration, as without it users can end up in a DoS situation in cases like torchtriton where they’re just completely broken unless they resolve the namespace collision externally (get the name taken down on one repository, stand up a personal repository that handles the merging, etc).

This configuration also allows end users to pre-emptively secure themselves during what is likely to be a long transition until the default behavior is safe.

It doesn’t define the specifics of how that configuration should work, because we generally try avoid dictating UX concerns in PEPs, since individual projects are better situated to figuring out to integrate something into their project in a way that makes sense.

The intention is definitely though that the workflow ends up being roughly:

Users ignore that PEP 708 exists, and are happily pip installing stuff as they do today.
A dependency conflict happens (whether an attack or it being inadvertent), triggering the protections in PEP 708.
Users decide which repository they want to install X thing from, and use whatever configuration option their installer has exposed to say “install X from Y”.

pradyunsg · July 14, 2023, 8:10pm

OK, this PEP has been lingering for quite some time and there hasn’t been any movement in the technical design of this lately.

Do we know of any stakeholders that we might want to consider, before putting this up for a decision?

If no one says anything, @dstufft here’s your nudge.

dstufft · July 19, 2023, 8:49pm

Yup sorry.

I’m now asking for a pronouncement on this PEP again

pf_moore · July 19, 2023, 10:47pm

Cool. I’ll take a look as soon as I can. I have a pip release and some personal business this weekend, but if I’ve not made a decision by the end of next week, feel free to chase me.

pf_moore · July 24, 2023, 12:00pm

I’m going to provisionally accept PEP 708.

The specific reason for the “Provisional” status is that the PEP as written simply specifies how tracking data is made available. It imposes no requirements on installers to actually enforce that data, or to even use it. All of the installer semantics are in the “Recommendations” section. While this is a legitimate approach, it does mean that the PEP can only succeed in its goals if the tracking data can be used successfully in a real-world installer.

It seems to me that there will be a number of hard implementation choices to be made when designing the pip implementation for this feature (for example, how will the wheel cache interact with this?) and it would be inadvisable to freeze the specification before we have a working client for it.

In particular, I would like to see the following completed before the PEP is made Final:

An implementation of the PEP in PyPI (Warehouse) including any necessary UI elements to allow project owners to set the tracking data.
An implementation of the PEP in at least one repository other than PyPI, as you can’t really test merging indexes without at least two indexes
An implementation of the PEP in pip, which supports the intended semantics and can be used to demonstrate that the expected security benefits are achieved. This implementation will need to be “off by default” initially, which means that users will have to opt in to testing it. Ideally, we should collect explicit positive reports from users (both project owners and project users) who have successfully tried out the new feature, rather than just assuming that “no news is good news”.

I don’t want to leave this PEP in “provisional” status indefinitely, so I would like to see a proposed timeline for getting the above criteria addressed. But that’s critically dependent on whoever will be doing the work - @dstufft are you able to propose a realistic deadline for this?

In spite of the above reservations, I’d like to thank @dstufft for putting this PEP together. It does a good job of addressing a complex situation. I’d like to call out in particular the following statement from the PEP:

This is made particularly tricky in that there is no “right” answer; there are valid use cases both for wanting two repositories merged into one namespace and for wanting two repositories to be treated as distinct namespaces.

The underlying problem here is not something that admits simple solutions, and I think this PEP provides a very good balance of safety vs convenience (something that IMO many “security” proposals do badly on, by assuming that safety is so important that convenience is largely irrelevant).

Thanks to everyone who provided feedback and discussion, both here and on previous proposals in this area. The PEP is significantly better for all of the input.

Congratulations, @dstufft!

steve.dower · July 31, 2023, 1:33pm

Would this concern be addressed if the section I proposed earlier were added?

pf_moore · July 31, 2023, 3:34pm

It’s not so much about the intended behaviour of installers not being clear, more about it being unproven in real life. I think the PEP is probably right to leave installer behaviour in the “Recommendations” section - it’s very hard to draw the line between “proposing a standard” and “dictating tool UX” when there’s only one tool involved (i.e., installers, where that’s basically pip).

The point of provisional status here is to give us some flexibility based on real-world implementation experience. Defining the required installer behaviour doesn’t help with that, in fact the opposite, it just gives us something else that we might want to review if we hit implementation questions.

EpicWink · July 31, 2023, 9:57pm

Proxy indexes are potentially another consumer: they would be free to reject results which fails a test in this PEP. I intend proxpi to implement this as a consumer in the future

dstufft · August 7, 2023, 3:15pm

Thanks!

I’m hoping to get the Warehouse and pip implementation done in the next couple of weeks.

Agreed, I think the final result came out much stronger than the initial proposal.

cofiem · August 8, 2023, 9:43am

Thanks all for the work on this - it looks like it should improve the default security for package repositories.

I was reading through the PEP, and I found this sentence fragment a bit confusing:

“so this cannot rely on any information that isn’t present in the “extending” repository itself”

Would it be reasonable to say that this can be re-phrased as:

“so this must rely on information that is present in the “extending” repository itself”?

If these two aren’t equivalent, could someone explain the differences?

pf_moore · October 23, 2023, 8:45pm

I was just reminded of this - what’s the current status here? Presumably the big issue is finding time to work on it?

pf_moore · November 5, 2023, 9:27pm

@dstufft any news here? Or anyone else? Is anyone working on implementing this PEP?

As the PEP is only accepted conditional on being implemented in Warehouse, pip, and “another index”, it’s now been in limbo for over 3 months, and there’s no sign of any activity towards final status. Given that this is intended to mitigate dependency confusion attacks, I don’t think we can reasonably leave the PEP in a “provisional and not yet implemented” state indefinitely.

I’m not happy about the possibility of marking this PEP as rejected, because if we can’t get the resource to implement this solution, it seems unlikely that any other solution will get implemented either. But nor am I happy with the message that leaving this outstanding indefinitely would send.

At a minimum, I’d like someone to commit to a date (any date, at this point) when we will either have made significant progress in implementing this PEP^[1], or we reject it. If no-one can do so, I guess I’ll have to pick a date myself (which is not ideal, as I don’t know what the resourcing constraints are).

By which I mean at a minimum, complete PRs against pip and Warehouse, that are ready to merge, plus evidence of at least one other index implementation starting to look at adding support. ↩︎

dstufft · November 6, 2023, 1:23am

Sorry, I am working on getting it implemented, I’ve just had some personal things come up the last few months that have taken up a fair amount of my free time.