PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks

pradyunsg · November 27, 2023, 6:41pm

If this isn’t on a volunteer’s radar to implement at this point, let’s add this to GitHub - psf/fundable-packaging-improvements: Packaging improvements that could be funded.

I think it is quite useful for institutional users and having this be a clearly marked “this needs someone to throw money at the problem” could be useful here to help resolve the resource/developer availability problem here.

EpicWink · December 4, 2023, 3:06am

I’m thinking about implementing this for proxpi as a consumer, and I’m wondering about how to deal with project versions and file size and upload-time from API v1.1 (specified in PEP 700 (Additional Fields for the Simple API for Package Indexes)).

As far as my understanding goes, PEP 700 says that if response says API version 1.1, then it must have versions and size, but if the response has any version above 1.1, then those fields are optional. This seems to be the case in the examples for PEP 708

pradyunsg · December 4, 2023, 5:49am

What is this based on?

PEP 700 only modifies the JSON view’s schema, and for that it states:

The api-version must specify version 1.1 or later.

So, for API version 1.1 (or later):

With the JSON view, those keys are mandatory.
With the HTML view, those keys are not permitted.

EpicWink · December 4, 2023, 10:06am

Taking PEP 700’s text literally, it says if you implement API v1.1 ^[1], then you must set api-version to 1.1 or later; it does not say if you specify api-version as 1.1 or later, then you must implement API v1.1.

While this seems hyper-pedantic, and I suspect the intention was for all future minor versions to require API v1.1, this does highlight a point of confusion.

The examples in PEP 708 don’t include any of the required fields from PEP 700. In addition, I can see scenarios where I want to use PEP 708 to say I’m tracking another index, but where I don’t have project versions or file sizes.

Yes, I’m only talking about JSON responses, sorry.

by this, I mean the required versions and file size on the project response ↩︎

pf_moore · December 4, 2023, 11:00am

OK, as PEP author of PEP 700, and PEP delegate for PEP 708, I will officially say that if you implement version 1.2 of the API, you MUST implement the fields required by version 1.1 as well. And as a general requirement, if you implement any version of the API, you MUST implement all previous versions.

In other words API versions are cumulative.

I consider this sufficiently self-evident that if you want to modify the text to explicitly state any of the above, I’ll accept it as a textual clarification (in the sense of our process - I personally consider it as nothing more than a “readability improvement” but I’ll approve it as a “text-only change” if people think it needs that).

At some stage, the various PEPs that define the index API should be consolidated into a proper document on packaging.python.org, and that would clear this up once and for all. But no-one has yet found the time to do that. Contributions would be welcomed, as usual

jeanas · December 4, 2023, 11:22am

For reference, this is Ensure that all current Python packaging interoperability standards are on packaging.python.org · Issue #1093 · pypa/packaging.python.org · GitHub

jeanas · December 10, 2023, 10:22am

@EpicWink has done that effort! It’s awaiting review at Add simple repository API specification by EpicWink · Pull Request #1442 · pypa/packaging.python.org · GitHub

pf_moore · December 10, 2023, 2:50pm

The PR doesn’t seem to include this PEP (708, API version 1.2). I don’t think provisional status is a reason to omit it from the documentation - the PEP is accepted, provisional status just allows for the possibility that implementation experience might result in changes (and the docs can easily be changed if this happens).

EpicWink · December 11, 2023, 2:23am

That PR is just adding a spec from the existing referenced PEPs. I have another PR which adds PEP 708 (in draft right now)

pf_moore · March 1, 2024, 5:19pm

It’s now over 6 months since this PEP was provisionally accepted, and there has, as far as I’m aware, been no progress on implementing it in either PyPI or pip. Furthermore, solutions based on index priority have been implemented^[1] in poetry and PDM, and are being planned in uv. While index priority is a less effective solution to the dependency confusion issue, it is nevertheless a solution that actually exists, and which does not depend on resource being found to add a new feature to PyPI, and so right now it seems much more useful in practice.

At this point, I’m seriously considering changing the status of this PEP to “Rejected”. This is not something I want to do, but I think it harms the credibility of the standards process if projects are ignoring a standard and creating their own solutions. If anyone has any suggestions on how we can move this standard forward I’d really appreciate hearing them.

@dstufft @dustin @EWDurbin Realistically, what is the likelihood of getting a PyPI implementation in a reasonable timescale?

As a pip maintainer, I will also say that I’m not aware of anyone coming forward with the intention of implementing this in pip, and I don’t believe any of the maintainers have expressed an interest in working on it (with the exception of @dstufft and I’d rather he puts any time he has available into the PyPI side of the work). So this isn’t just a PyPI resourcing issue.

they may have already been available before this PEP was provisionally accepted - I didn’t check ↩︎

notatallshaw · April 19, 2024, 9:35pm

uv’s default behavior is now safe, but there was significant demand for more pip like behavior that --index-strategy unsafe-any-match was added and there is now an open PR for unsafe-highest to even more closely match pip’s unsafe behavior.

So while clients can improve safe default behavior, there is clearly significant user demand for the old unsafe behavior which if users turn on the clients do not mitigate this kind of attack.

cofiem · June 30, 2024, 5:38am

I’ve just finished the first pass for a PR in PyPI to implement the Alternate Repository Location metadata.

I think the next step is for the PR to be reviewed by a PyPI maintainer.

cofiem · August 3, 2024, 5:31am

I’ve got a work-in-progress PR for implementing PEP 708 in pip.

While it is still in progress, I think I’ve got enough in place to make it worthwhile getting feedback from anyone interested in this topic.

Specifically, I’m interested in feedback on:

implementation: is there anything that makes my approach unlikely to be accepted or cause problems?
docs: at the moment, I’ve got a bunch of TODOs in the code, many of which require decisions or highlight nuances that will need to be documented somewhere. My best guess is the explaination should end up in the package-finding page?

EWDurbin · September 19, 2024, 2:02pm

Provisional implementation of PEP 708 in PyPI is now merged and live on test.pypi.org and pypi.org. Thanks @cofiem!

You can view the UI for managing Alternate Locations Metadata for projects at https://pypi.org/manage/project/{project_name}/settings/

steve.dower · October 3, 2024, 6:34pm

While responding to the PEP 759 discussion I came up with a scenario that I’m not 100% sure is handled well by this PEP. It doesn’t appear to be covered by the text, so I’m wondering whether it’s deliberate or just handled in some way that I don’t see:

If I publish my own packages on my own index, and expect users to specify both my index and PyPI, but I haven’t claimed my package names on PyPI, a malicious/naive user could claim those names and break installs. There doesn’t seem to be any metadata my index can provide to say “I am the real one, ignore other sources” (and clearly it couldn’t). This situation is probably okay, but it’s more the context than problem.

Now, the obvious way to avoid this happening is to also claim my packages on PyPI and set the “tracks” metadata. That way when they’re both listed, it will be allowed (because metadata says they’re the same package), and so I can ensure that the one on PyPI is a lower version and it will never get installed.

But assuming that I can’t put my actual package on PyPI, what should the contents of that one be? Users who forget my index will see a successful install without real content, lockfiles generated without my index will be nonsense, and if my PyPI credentials get exposed there’s still a path even to users who are specifying the index.

The best ideas I’ve come up with is to put a setup.py on PyPI that prints a helpful error and fails (but I’d love a better system for “fail to install with a custom error message”), or to have some way^[1] to treat PyPI as a fallback index, and so packages that appear on two indexes where one is PyPI (or another “fallback index”) will be taken from the other index regardless of any metadata.

Other thoughts? Or did I miss something that actually makes this a non-issue?

I’m hand-waving on purpose because I don’t much care how. If you force me to choose, I’m saying to make --index-url for “essential” indexes and --extra-index-url for “fallback” indexes. But other ideas are welcome. ↩︎

EpicWink · October 3, 2024, 8:52pm

I fear that this would be considered a “stub package”, which PyPI friends upon.

The correct solution I’ve heard is to setup simpleindex with custom rules for the relevant packages, which is a fairly heavyweight solution.

ncoghlan · October 4, 2024, 4:41am

While disapproval is the default stance, there’s some leeway based on why the stub packages exist. For example, I published stub packages for a few Fedora-related names for years before the folks running those projects finally took the PyPI names over and started publishing real packages (and yes, I checked via other Red Hat sources to confirm that those contacts were legitimate).

Having tracks metadata pointing at the true source repository seems like a case to me where leeway is likely to be granted.

Given that PEP 708 operates at the project API level, I believe you should be able to publish a stub package, and then yank it (or even delete it outright).

The client UX of that would then depend on how client tools handle indexes which list tracks metadata with no associated files.

steve.dower · October 4, 2024, 10:38am

Ooh, yanking it is an idea. If it’s the only release, it may not even show up in the resolution query and so I’ve protected the name on PyPI without even conflicting with the entry in my own index.

(Deleting is no good, that would free up the entire project for someone else to claim again.)

hclark · October 23, 2024, 2:35pm

Basically, you have hit the core of the matter. Most of the discussion has skirted round this point, suggesting various crazy workarounds, or suggesting that wanting to have a private repo with private packages is a dirty corporate thing to do.

I think the reluctance is based on the fact that pip was built on the basic assumption that all repos are equivalent, and their package lists can, and should, be pooled, with the newest one taking precedence. This works fine when the aim is to provide redundancy and resilience for the Python package delivery system formed from an arbitrary list of mirroring peers. The requirement to make some repo sources not be peers is “very hard” and so no-one is interested in solving it. And so, we are left with this security hole.

notatallshaw · January 17, 2025, 4:33pm

An update, the author of the open PR for PEP 708 to be implemented in pip has withdrawn from working on it: https://github.com/pypa/pip/pull/12813. So pip is needing someone to work on an implementation.

I’m still very unclear on the specifics of the UX flow of this PEP, so I can’t do a full PR review, but I am willing to help with any common friction of submitting a PR to pip.