PEP 766: handling multiple indexes (Index Priority)

UPDATE: This is PEP 766 – Explicit Priority Choices Among Multiple Indexes | peps.python.org

Greetings Pythonistas. In discussions about how to extend the metadata that we can support, the topic of index priority came up. @pf_moore recommended that index priority be submitted as a standalone topic that metadata might possibly build on. My NVIDIA coworkers and I humbly submit an attempt at codifying index priority behavior, such that tools can share a common vocabulary and common behavior.

Index priority has already been rejected as part of PEP 708 as a way of ameliorating the dependency confusion attack problem. To seed this discussion, we would like to answer the reasons why Index Priority may still be helpful in other ways, even though it was rejected as a fix for dependency confusion attacks. This text is not in the PEP, but I’d be happy to incorporate it if you think it fits.

Thank you for your time. I’m looking forward to a vigorous discussion.

Reconsidering PEP 708 rejection of index priority

PEP 708 rejected index priority for several reasons. The text in bold is copied from the reasons that PEP 708 rejected index priority as a solution for the dependency confusion attack problem.

  • We’ve spent 15+ years educating users that the ordering of repositories being specified is not meaningful, and they effectively have an undefined order. It would be difficult to backpedal on that and start saying that now order matters.

The time spent educating users that ordering is not meaningful was not wasted. The lack of ordering is an essential part of pip’s behavior. That behavior is useful in many use cases, but some use cases need the ability to order indexes. There is a tradeoff that users must make when they use multiple indexes. It’s not that order of indexes never matters, nor that it should always matter. Users need to be able to choose when it matters, and they need to know when they are making that choice.

  • Users can easily rearrange the order that they specify their repositories in within a single location, but when loading repositories from multiple locations (env var, conf file, requirements file, cli arguments) the order is hard coded into pip. While it would be a deterministic and documented order, there’s no reason to assume it’s the order that the user wants their repositories to be defined in, forcing them to contort how they configure pip so that the implicit ordering ends up being the correct one.

Configuration hell is nothing new. Because this PEP makes it more viable to have extra indexes permanently listed in configuration files instead of as one-off command line arguments, it dramatically improves the user experience and simplifies docs. It seems likely that the time spent figuring out the right configuration pattern would be outweighed by the time saved in not having to debug what strange things were brought in from an extra index URL or what desirable extra index URL packages were replaced with a pip command that didn’t include the extra index URL.

  • The above can be mitigated by providing a way to explicitly declare the order rather than by implicitly using the order they were defined in; however, that then means that the protections are not provided unless the user does some explicit configuration.

This is more PEP 708 territory, and I think PEP 708 does this in a better way. PEP 708 handles this on a global repo level, but is harder to configure for individual users. Index priority is for the sake of improving the predictability of where a package will come from, and it absolutely implies some necessary configuration. Then again, to use a non-default repo or multiple repos already implies some explicit configuration.

  • Ordering assumes that one repository is always preferred over another repository without any way to decide on a project by project basis.

Right now the repo that a package comes from is nominally undefined, but it is predictable as an implementation detail. Changing this so that the predictability is user-configurable is an improvement. If a person needs to decide on a per-package basis, that’s an argument for allowing the repo as part of a spec. This is probably roughly equivalent to the namespaces idea. It was mentioned as the optimal solution in the PEP 708 discussion, but seems onerous to express with single-package granularity.

  • Relying on ordering is subtle; if I look at an ordering of repositories, I have no way of knowing or ensuring in advance what names are going to come from what repositories. I can only know in that moment what names are provided by which repositories.

If you need specific things from specific repos, then you need a way to specify that. Index priority is not that granular. That doesn’t mean that index priority doesn’t improve the overall situation.

What you do get from index priority is confidence that whatever set of packages you get is going to be a self-consistent set, to the limit that any given index is complete. By saying that there can’t be any order among indexes, it is ambiguous what mixture of packages from any given index you get, and it varies by published package versions. Published package versions are rarely directly under the user’s control when using PyPI, but they are often under control with custom indexes. This is the same idea as projects like devpi, artifactory, or simpleindex, except that index priority uniquely makes this configurable as part of the client, not as a service that must be run and configured separately. This facilitates greater flexibility with per-environment configuration.

  • Relying on ordering is fragile. There’s no reason to assume that two disparate repositories are not going to have random naming collisions—what happens if I’m using a library from a lower priority repository and then a higher priority repository happens to start having a colliding name?

What happens today without priority? Which index wins in this situation? The version numbers are meaningless, because they’re for separate projects. However, the version will probably be the decider here, and moreover there’s no way to pick one repo over the other, aside from a version constraint that is really conflating version with package identity. Predictability is an improvement, not a weakness. Configurability is an improvement.

  • In cases where ordering does the wrong thing, it does so silently, with no feedback given to the user. This is by design because it doesn’t actually know what the wrong or right thing is, it’s just hoping that order will give the right thing, and if it does then users are protected without any breakage. However, when it does the wrong thing, users are left with a very confusing behavior coming from pip, where it’s just silently installing the wrong thing.

Why is it silent? Is it not showing which index it is installing things from? We are currently not keeping track/showing which index a package came from, but this proposal notes that we should be doing that.

CC @charliermarsh

3 Likes

I think a PEP establishing a suggested way index priority should be handled by installers is a great idea.

Quickly reading over the PEP I have some initial feedback:

  1. Too much pip specific details

It specifies a lot of the pip specific API (find-links, extra-index-url, requirements.txt, pip.conf, pip environmental variables etc.), but none of these are standards and there’s no reason to think they should be shared amongst installers. What if my installer configuration looks like this:

[tool.myinstaller]
extra-index = [“foo”, “bar”]
find-links = [“baz”]

I don’t think the PEP is compatible with my installer configuration and I don’t think the PEP should be telling installers how they should lay out there configuration, preventing them from innovating.

I think the PEP should try and avoid installer specific APIs and recommend to use an implicit order from the configuration if possible (and maybe give examples) and optionally allow for the user to provide an explicit order. Which brings me on to:

  1. This PEP only allows for an implicit order

This PEP only suggests order is built implicitly from the CLI, environmental variables, and config. But it should be allowed that the installer can have configuration to set it explicitly, e.g in their config:

[tool.myinstaller]
index-order = [“baz”, “bar”, “foo”]

extra-index = [“foo”, “bar”]
find-links = [“baz”]

  1. No information on how installers should handle errors

One thing that’s not well defined right now is what HTTP responses should produce what action. Should it be any 4xx that forces the installer to look at the next index or should it be wider or more narrow than that? What about a lower level network error? I’m sure Charlie will have stuff to say about that.

This to me seems like it would be the most helpful part to standardize.

  1. No mention of the current use case of mirrors

At the moment a user can use extra-index urls to specify mirrors of their main index. In fact it’s the only use case that’s actually safe for in pip.

How will this affect that use case?

  1. Wheels vs. Sdists mentions

I’m not sure if this section is meant to apply for all resolutions or it’s saying even if a preference like “–prefer-binary” is enabled it should still prefer an sdist from a higher priority index?

  1. Not clear enough that it is orthogonal to PEP 708

While it mentions this is not solving the same problems as PEP 708, it should be clear that this should happily co-exist with PEP 708. And maybe give an example?

I have lots more thoughts, but I will wait for others to give feedback also rather than writing an overly long response.

1 Like

Here is the preview:

1 Like

Good point. I should be clearer that all of the pip stuff is an example, not a standard, and that other installers are free to have different configuration names, concepts, et al. What they should provide is a way that users can control the order in which channel/repository/index/whatever are considered.

As much as pip should not be the limit of what other tools can do, I think pip should the lowest common denominator of support. Other tools may go beyond this PEP, but this PEP should not recommend behaviors that pip can’t reasonably implement.

My reservation on this is that you seem to be defining the available indexes twice. I would be in favor of a design that allows association of names to indexes, like Poetry does, and then using those names in an explicit order configuration.

The reservations that I have are:

  • What happens if something specified in index order does not exist elsewhere? Is its behavior undefined? Do you assume its behavior by trying different possibilities?
  • What happens if something is only specified in extra-index/find-links? Do you fall back to the implicit order definition?

I think of extra-index and find-links as parameters that define the type of a source, or the mode of interaction with that source. I like your idea of explicit ordering, because it separates the type of a source from explicit ordering.

A design that allows URLs in both places seems prone to confusion about where to set a source identifier and why.

There is some discussion of that here: PEP 766: Define terms for priority strategies among multiple indexes (index priority) by msarahan · Pull Request #4123 · python/peps · GitHub. Unfortunately, the design choices that make sense for pip are opposite for the design choice I’d recommend for index priority.

I’m not sure what the most intuitive choice is here. Do you choose the trust of an index with the inconvenience and risk of arbitrary code execution with the sdist over the convenience of a binary from a less-trusted index? I err on the side of index trust over anything else, but I can definitely see use cases where that would not be intuitive.

When trying to reason through the ordering of these things, I definitely had the feeling that finer granularity of these choices would be nice. It may be worth trying to come up with a general scheme that is totally configurable, but I fear it would be too deep of a rabbit hole.

Good point, I will work on developing this content.

1 Like

My initial thought is that this PEP doesn’t actually specify any behaviour. It defines a number of terms, and gives a broad description of how each of them works, but doesn’t actually make any requirements on tools. So maybe it would be better as an informational PEP rather than a standards-track one?

Like @notatallshaw I’m not particularly comfortable about the level of pip specific details in here. The package selection rules need to be defined in implementation-neutral terms. For example, the whole idea of find-links is not standardised, and shouldn’t be assumed to work the same across tools. I wouldn’t even say that pip’s implementation is necessarily the right definition - it’s grown over time and probably has a bunch of historical quirks that make no sense for a standard.

The whole business about merging information from CLI options, environment and config is far too implementation dependent. The principle that standards shouldn’t dictate UI applies here - a PEP has no business saying how a tool collects index search configuration, it should simply say what information needs to be available.

I disagree. This PEP can, and should, specify behaviours that are useful[1]. It should do that without reference to any tool. Specifically, unless the PEP mandates that all tools must implement certain behaviours, you shouldn’t assume that pip (or uv, or PDM) will change in response to the PEP at all.

That’s entirely pip-specific. For a standard, you need to start from standardised concepts. For that, what you have is that installers will have the following available to them:

  1. One or more indexes supporting the index protocol.
  2. A bunch of distribution files made available by unspecified means with no way of imposing any sort of organisation on them beyond “they can be used if you want to”.
  3. One or more explicitly specified distribution files or source trees that the user has specifically stated are to be installed.
  4. A list of requirements (in the sense of dependency specifiers) that the user has requested be installed.

That’s all you can assume in advance. Everything else needs to be specified in the PEP.

This is why the PEP shouldn’t be based on what pip does, but on what you believe to be the right behaviour (based on the use cases and justifications you include in the PEP).

Oh, boy is it too deep of a rabbit hole. This is very much why we’ve been discussing these questions for literally years on the pip tracker. There’s no magic answer here, and writing a PEP doesn’t make the problem any easier - it just makes the stakes higher if you make the wrong decision :slightly_frowning_face:


  1. One of the other issues with the PEP is that currently there’s not a lot of justification for any of the behaviours, and basically no use cases to motivate anything ↩︎

1 Like

My reservation on this is that you seem to be defining the available indexes twice. I would be in favor of a design that allows association of names to indexes, like Poetry does, and then using those names in an explicit order configuration.

My example was purposefully ambiguous as whether it was names or urls.

I don’t think the PEP should limit installers on how they let the users choose the order, only that the installer resolves from the user some unique ordered list.

IMO the questions of how the installer allows and validates the configuration should be left to the installer.

There is some discussion of that here: PEP 766: Define terms for priority strategies among multiple indexes (index priority) by msarahan · Pull Request #4123 · python/peps · GitHub. Unfortunately, the design choices that make sense for pip are opposite for the design choice I’d recommend for index priority.

I think the reason for the difference in design choices stem from whether you are using the extra index as a mirror or as a location for alternative packages.

As some users use these features for mirror behavior right now I think that use case must be addressed by this PEP. Will mirror like functionality stop working? Should installers provide a separate option for mirrors? Etc.

I think this question becomes a lot more relevant with PEP 708 (the one that makes installers fail when multiple indexes offer the same package name), since I expect there’s almost no chance that all indexes will have consistent metadata to make it succeed (all it takes is one index not specifying that their six is the same as PyPI’s six to break an install).

In this context, I think it does make sense to have at least two tiers of priority, such that an installer can decide either “if multiple feeds have package spam, always take indexA if it’s one of them” or “if multiple feeds have package spam, ignore PyPI and see if it’s only one feed now”.

In other words, a top priority and/or a fallback index, and only for excluding candidates when there are naming conflicts. This approach isn’t specified in 708, but it didn’t need to be as it falls under “installer UX affordances”[1]. And as both options could take multiple values (at risk of errors due to conflicts), configuration merging ought to not be any different from handling the current list of indexes.

But my main point is that PEP 708 breaks a lot of the risks around this proposal, and that any new proposal probably ought to be written as a design document for a specific installer (rather than a PEP) in terms of PEP 708.


  1. Or whatever term we used for this. ↩︎

1 Like

Thank you all for the helpful feedback. I have revised the draft. Specifically:

  • Removed most mentions of pip implementation
  • Added mention of how mirroring might work for index priority
  • Changed to Informational PEP type

I tried to clarify some PEP 708 stuff, but I think I still need to spend more time describing how this PEP would interact with PEP 708.

@steve.dower I’m afraid I don’t understand what you mean by “PEP 708 breaks a lot of the risks around this proposal”. PEP 708 is more server-side than this. It is limited to erroring out when it sees possible confusion. This is much more about giving end-users a tool that they can use to express preference, especially in the presence of confusion.

1 Like

If there’s no confusion, there’s no need for a preference :slight_smile:

PEP 708 is primarily a client-side change that requires either client-side configuration or server-side configuration to resolve. The PEP focuses mostly on the server-side configuration, because the change itself is very simple (refuse to choose between the same package name on multiple indexes without additional info) and the client-side design is up to individual tools.

So don’t be fooled by the text’s focus on server-side details. It’s only because those are the things that need specification that they take up so much of the text.

I’m curious how this proposal would affect your thinking on PEP 766? I feel like the two proposals could be very complementary, but also might drive some changes to how you’d go about implementing certain aspects of 766?

I had an idea for how to implement index priority for pip in a non-disruptive way. PR for a partial implementation is at partial implementation of source groups by msarahan · Pull Request #13210 · pypa/pip · GitHub - comments and criticism are most welcome.

3 Likes

I think standardising the terminology has a lot of value, as the reality is that index priority is the established and only(?) practical way to solve dependency confusion today (PEP-708 being unimplemented other than in PyPI, from what I can see), unless you are very proactive in name squatting (which simply doesn’t scale globally).

Perhaps it is worth referencing the approach taken in Artifactory is also to use index priority (very tersely documented at JFrog Help Center).

Ways that a request falls through to a lower priority index

  • All distributions from higher priority index filtered out due to version specifier

This sounds very dangerous to me. As does:

wheel vs sdist: Should the installer use an sdist from a higher-priority index before trying a wheel from a lower-priority index

more platform-specific wheels before less specific ones

Personally I see no argument for either of these. If you specify a priority, it is because you shouldn’t bleed through the lower priority’s packages. If you want blending, then give the two repos equal priority.

if a reference implementation to a Python-based tool is necessary, we, the authors of this PEP, will provide one

If helpful, there is an implementation of “index priority” (named PrioritySelectedProjectsRepository in the implementation) available at simple-repository (in simple_repository.components.priority_selected) which is fairly self contained (naturally the data model comes from simple-repository). This implementation has been running in production for a number of years, with several hundred users accessing an internal repository backed by this daily. I also note that it is entirely possible to use this exact implementation client side (e.g. in an installer), as we do in simple-repository-server (despite the project being called simple-repository-server, it is actually also a repository client/consumer).

FWIW, my overall take on the PEP is that having something like this standardised is a good thing, even if the PEP itself is more verbose than I would have liked.

2 Likes

Putting effort into implementing PEP 708 in installers is another way. There’s a PR for pip which needs someone to pick it up. I don’t know whether uv has started on this, but I’m sure they would welcome contributions as well.

I’m not sure putting effort into a new PEP on index priority, with all the discussions that would be involved, as well as following that up with implementation work on installers[1], is likely to be any less work than completing the implementation work on PEP 708.

My personal view is that it will be incredibly difficult to come up with a PEP that actually defines something that we can standardise. Tools will quite reasonably push back hard on being told what their UI must look like, and without a UI, it’s hard to see how you’d even set index priorities.

Speaking specifically for pip, the blockers around index priority are all to do with user interface and backward compatibility - a PEP won’t make any difference to those issues, it will simply change the problem from “pip doesn’t implement index priorities” to “pip doesn’t implement PEP 766” until they are resolved. We’re perfectly willing to discuss index priorities as a feature, but we have to acknowledge the current reality that the “all indexes are equal” model is baked into pip at a very fundamental level, and (like basically everything in pip) changing that will break some users[2].

And has anyone even asked uv if they would be willing to change how they implement index priority if a new standard conflicted with their current model? Or is PEP 766 expected to be nothing more than “make uv’s behaviour a standard”? Because if that’s the case, a standard is pointless - much better to just go to pip with a proposal “implement uv’s model for index priority” on the basis that it’s proven useful and popular in practice.

If I’m absolutely honest, I think the motivation behind trying to make index priority into a PEP is mostly to avoid having those conversations with the individual tool maintainers…


  1. and note that we can’t assume that changes wouldn’t be needed to uv, as there’s no guarantee that the final form of any PEP would match uv’s current approach ↩︎

  2. When we implemented the new resolver, a big part of the work was managing the impact on users who were relying on old behaviour that was flat-out wrong. How much worse will it be with index priority, where the current behaviour is perfectly valid? ↩︎

As usual Phil, your comments are well-timed and insightful. Coincidentally, there were some in-depth discussions at the “WheelNext” summit, which NVIDIA and Meta organized this past week to coincide with the GTC conference. @charliermarsh and @konstin were in attendance, among many others. I have been working with @atalman on some index priority proof of concepts. @atalman put up a demonstration of index priority in pip that matches uv’s behavior. We both agree that changing pip’s behavior is too disruptive, and not likely to succeed. What I came up with instead is a generalization of Poetry’s primary and supplementary package source “Groups”.

I note that self-hosted indexes that implement virtual indexes made up of other indexes have been proposed often in this context, but usually people aren’t enthusiastic about needing to run their own servers. I know that’s not what you’re saying here, I’m only acknowledging a lot of past discussion on the topic.

I probably didn’t express this well enough. I think that maybe you are thinking of conda’s strict channel priority, where the presence of a given package name on a given channel means that the search will not look on other channels. I can see the use case for that. I also think that there is a use case in-between that and treating indexes equally. I see it as

  • Maximum lock-down - for example, ONLY use IT-supported packages, except where they don’t have a supported package
  • Soft fall-back - for example, prefer IT-supported packages unless they don’t meet your requirements
  • Equal priority - combine all indexes at once, choose “optimal” version, regardless of where it comes from

As a user, I think the middle one is “least surprise.” Under-constrained packages will be safely preferred from the IT-approved pool. More advanced users who specify constraints will have their escape hatch if necessary.

I wholeheartedly invite you to channel your inner lumberjack/Elon and take a chainsaw to it. Just stay away from government buildings. I will say that I have learned a lot from my efforts to implement “index groups” and I hope to cut down the PEP because the idea of index groups is simpler than trying to switch entire UI’s at once.

I agree on this point completely. I don’t think that PEP 708 is sufficiently configurable for users, so while I definitely respect it, I think that PEP 766 is not duplicating that work, nor intending to compete with it, although they share significant motivation. In my mind:

  • PEP 708 allows package maintainers to express their official servers that are safe, and provides a way for installers to handle the case where non-safe servers get utilized.
  • PEP 766 allows users to express trust hierarchy among their sources, but does not delineate safe/not-safe servers

Indeed, each tool has different options, even if the core idea of what an index or find-links location consists of. We’re also finding it quite hard to come up with a sane UI for this naturally hierarchical data in pip’s existing ini format configuration. What we broadly agreed on in “meat space” was that:

  • It is convenient to have one chunk of index priority configuration that users can carry between environments
  • A configuration format that allows expression of multiple “things” and separate expression of orders of those “things” - opentelemetry is one example that comes to mind

I believe that this is so core to pip that any change to this behavior would be non-viable. The index group idea that I have mentioned a couple of times now preserves this behavior, while providing index priority in an orthogonal way.

@charliermarsh asked this same question at the WheelNext summit. My answer was that if we can come up with a common expression of multiple indexes and options, then the ecosystem would benefit greatly from un-splitting some of our mental load. He didn’t outright say “Yes, uv will definitely do this work,” but he did seem agreeable to the idea.

In any case, the idea is NOT to make uv’s behavior a standard, because it is NOT PEP 766’s goal to cause changes in the daily workflow of existing users. It is PEP 766’s goal to provide a new way for users to express trust hierarchies between sources, and to have their installers respect those relationships.

Historically speaking, index priority started out for me as an implementation detail for variant metadata.

1 Like

I would very strongly request that those discussions get written up and published, with notes on who was present, as in the past, face to face discussions like this have been incredibly productive, but also difficult to build on. The biggest problem is that it’s necessary to re-explain and justify all of the insights established in the face to face meeting with the wider community - and the difficulty of doing that is precisely why the face to face meeting was productive in the first place. We’ve seen this in the past with discussions at the packaging summit.

Cool. As long as PEP 766 makes that point, and doesn’t set itself up as being in competition with PEP 708, I don’t see a problem[1]. Ideally, PEP 766 should discuss how it integrates with PEP 708 to provide “defense in depth” for dependency confusion attacks. It will also need to discuss how it impacts the semantics of PEP 708 - for example, the file discovery algorithm in PEP 708 will need modifying to take account of index priorities. If installers other than pip have done any work on implementing PEP 708, they may have insights into what is needed here (and if they haven’t, maybe PEP 766 should be deferred until they have, so we have some real-world experience integrating PEP 708 with index priorities!)

Is this https://github.com/pypa/pip/pull/13210? That’s had no feedback from the pip maintainers, and while I can’t speak for the other maintainers, for me it’s hard to know what to say because there are no examples of what it would be like to use the proposed feature in practice - it’s not like we’re short of people with use cases for index priority, it would be nice to see how index groups would handle those use cases.

Thanks for the link reminding me of that previous discussion. I think I’m still of the same view - if we’re putting constraints on how resolution algorithms must work, in terms of prioritising indexes, that would definitely need a PEP. But I think the problem here is that people assume it’s possible to avoid the need to deal with resolution algorithms by working at the “how do we map project name/version to the file we’ll install?” level. And the problem is that that level is already controlled by tool UI (things like pip’s --prefer-binary and --only-binary/--no-binary flags, as well as flags like --platform and --abi, which influence wheel selection). So we can’t write a standard that says “for foo-1.0, pick a file from index X” without discussing how that interacts with the existing tool UI around what file gets picked. And we can’t do that because there’s no reason to assume tools have implemented the same (or even compatible) UIs. There’s also uv’s options to prefer newer or older versions of a package when resolving - how will that interact with index priority? Do you prefer the index over the age, or vice versa? These sorts of questions can’t be answered in a PEP because they relate to behaviour that’s not the subject of standards.


  1. Clearly it doesn’t achieve that yet, as @pelson got that impression ↩︎

My personal opinion is that I’m somewhat skeptical that this should be a PEP, and, instead, that this problem should be solved in pip or uv or other tools as they see fit. I don’t know that I see significant value in standardizing the behavior, and that standardization will come at a cost (namely, that it removes room for tools to innovate or explore or impose their own design constraints on the problem).

Reading through the PEP itself (and some of the discussion above), though, there’s kind of a spectrum of goals. If the PEP is mostly about defining terminology, then it seems less objectionable but also less impactful (e.g., the PEP might decide that tools should give users an ability to define index priority; but in that case, if pip implements that behavior, it could look totally different from uv’s implementation or configuration schema, etc.). If the PEP is instead about defining specific behaviors that tools should implement, then it brings me back to my concerns above.

(As an example: in uv, one of the most important features we’ve implemented here is “index pinning”, whereby users can say that a certain package must come from a certain index (and, similarly, that a certain index should only be used if packages are explicitly pinned to it, and never implicitly). So, how does that fit into the goals of the PEP? Does the PEP need to be adjusted to account for it? Are we in violation of the PEP if we keep this behavior?)

(Similarly, at least as of now, I don’t want to add a “groups” concept to uv’s index design – I prefer to solve that problem with index pinning. If a PEP enforces a design on us, what benefits is it providing in return for that cost?)

5 Likes