PEP 766: handling multiple indexes (Index Priority)

UPDATE: This is PEP 766 – Explicit Priority Choices Among Multiple Indexes | peps.python.org

Greetings Pythonistas. In discussions about how to extend the metadata that we can support, the topic of index priority came up. @pf_moore recommended that index priority be submitted as a standalone topic that metadata might possibly build on. My NVIDIA coworkers and I humbly submit an attempt at codifying index priority behavior, such that tools can share a common vocabulary and common behavior.

Index priority has already been rejected as part of PEP 708 as a way of ameliorating the dependency confusion attack problem. To seed this discussion, we would like to answer the reasons why Index Priority may still be helpful in other ways, even though it was rejected as a fix for dependency confusion attacks. This text is not in the PEP, but I’d be happy to incorporate it if you think it fits.

Thank you for your time. I’m looking forward to a vigorous discussion.

Reconsidering PEP 708 rejection of index priority

PEP 708 rejected index priority for several reasons. The text in bold is copied from the reasons that PEP 708 rejected index priority as a solution for the dependency confusion attack problem.

  • We’ve spent 15+ years educating users that the ordering of repositories being specified is not meaningful, and they effectively have an undefined order. It would be difficult to backpedal on that and start saying that now order matters.

The time spent educating users that ordering is not meaningful was not wasted. The lack of ordering is an essential part of pip’s behavior. That behavior is useful in many use cases, but some use cases need the ability to order indexes. There is a tradeoff that users must make when they use multiple indexes. It’s not that order of indexes never matters, nor that it should always matter. Users need to be able to choose when it matters, and they need to know when they are making that choice.

  • Users can easily rearrange the order that they specify their repositories in within a single location, but when loading repositories from multiple locations (env var, conf file, requirements file, cli arguments) the order is hard coded into pip. While it would be a deterministic and documented order, there’s no reason to assume it’s the order that the user wants their repositories to be defined in, forcing them to contort how they configure pip so that the implicit ordering ends up being the correct one.

Configuration hell is nothing new. Because this PEP makes it more viable to have extra indexes permanently listed in configuration files instead of as one-off command line arguments, it dramatically improves the user experience and simplifies docs. It seems likely that the time spent figuring out the right configuration pattern would be outweighed by the time saved in not having to debug what strange things were brought in from an extra index URL or what desirable extra index URL packages were replaced with a pip command that didn’t include the extra index URL.

  • The above can be mitigated by providing a way to explicitly declare the order rather than by implicitly using the order they were defined in; however, that then means that the protections are not provided unless the user does some explicit configuration.

This is more PEP 708 territory, and I think PEP 708 does this in a better way. PEP 708 handles this on a global repo level, but is harder to configure for individual users. Index priority is for the sake of improving the predictability of where a package will come from, and it absolutely implies some necessary configuration. Then again, to use a non-default repo or multiple repos already implies some explicit configuration.

  • Ordering assumes that one repository is always preferred over another repository without any way to decide on a project by project basis.

Right now the repo that a package comes from is nominally undefined, but it is predictable as an implementation detail. Changing this so that the predictability is user-configurable is an improvement. If a person needs to decide on a per-package basis, that’s an argument for allowing the repo as part of a spec. This is probably roughly equivalent to the namespaces idea. It was mentioned as the optimal solution in the PEP 708 discussion, but seems onerous to express with single-package granularity.

  • Relying on ordering is subtle; if I look at an ordering of repositories, I have no way of knowing or ensuring in advance what names are going to come from what repositories. I can only know in that moment what names are provided by which repositories.

If you need specific things from specific repos, then you need a way to specify that. Index priority is not that granular. That doesn’t mean that index priority doesn’t improve the overall situation.

What you do get from index priority is confidence that whatever set of packages you get is going to be a self-consistent set, to the limit that any given index is complete. By saying that there can’t be any order among indexes, it is ambiguous what mixture of packages from any given index you get, and it varies by published package versions. Published package versions are rarely directly under the user’s control when using PyPI, but they are often under control with custom indexes. This is the same idea as projects like devpi, artifactory, or simpleindex, except that index priority uniquely makes this configurable as part of the client, not as a service that must be run and configured separately. This facilitates greater flexibility with per-environment configuration.

  • Relying on ordering is fragile. There’s no reason to assume that two disparate repositories are not going to have random naming collisions—what happens if I’m using a library from a lower priority repository and then a higher priority repository happens to start having a colliding name?

What happens today without priority? Which index wins in this situation? The version numbers are meaningless, because they’re for separate projects. However, the version will probably be the decider here, and moreover there’s no way to pick one repo over the other, aside from a version constraint that is really conflating version with package identity. Predictability is an improvement, not a weakness. Configurability is an improvement.

  • In cases where ordering does the wrong thing, it does so silently, with no feedback given to the user. This is by design because it doesn’t actually know what the wrong or right thing is, it’s just hoping that order will give the right thing, and if it does then users are protected without any breakage. However, when it does the wrong thing, users are left with a very confusing behavior coming from pip, where it’s just silently installing the wrong thing.

Why is it silent? Is it not showing which index it is installing things from? We are currently not keeping track/showing which index a package came from, but this proposal notes that we should be doing that.

CC @charliermarsh

2 Likes

I think a PEP establishing a suggested way index priority should be handled by installers is a great idea.

Quickly reading over the PEP I have some initial feedback:

  1. Too much pip specific details

It specifies a lot of the pip specific API (find-links, extra-index-url, requirements.txt, pip.conf, pip environmental variables etc.), but none of these are standards and there’s no reason to think they should be shared amongst installers. What if my installer configuration looks like this:

[tool.myinstaller]
extra-index = [“foo”, “bar”]
find-links = [“baz”]

I don’t think the PEP is compatible with my installer configuration and I don’t think the PEP should be telling installers how they should lay out there configuration, preventing them from innovating.

I think the PEP should try and avoid installer specific APIs and recommend to use an implicit order from the configuration if possible (and maybe give examples) and optionally allow for the user to provide an explicit order. Which brings me on to:

  1. This PEP only allows for an implicit order

This PEP only suggests order is built implicitly from the CLI, environmental variables, and config. But it should be allowed that the installer can have configuration to set it explicitly, e.g in their config:

[tool.myinstaller]
index-order = [“baz”, “bar”, “foo”]

extra-index = [“foo”, “bar”]
find-links = [“baz”]

  1. No information on how installers should handle errors

One thing that’s not well defined right now is what HTTP responses should produce what action. Should it be any 4xx that forces the installer to look at the next index or should it be wider or more narrow than that? What about a lower level network error? I’m sure Charlie will have stuff to say about that.

This to me seems like it would be the most helpful part to standardize.

  1. No mention of the current use case of mirrors

At the moment a user can use extra-index urls to specify mirrors of their main index. In fact it’s the only use case that’s actually safe for in pip.

How will this affect that use case?

  1. Wheels vs. Sdists mentions

I’m not sure if this section is meant to apply for all resolutions or it’s saying even if a preference like “–prefer-binary” is enabled it should still prefer an sdist from a higher priority index?

  1. Not clear enough that it is orthogonal to PEP 708

While it mentions this is not solving the same problems as PEP 708, it should be clear that this should happily co-exist with PEP 708. And maybe give an example?

I have lots more thoughts, but I will wait for others to give feedback also rather than writing an overly long response.

1 Like

Here is the preview:

1 Like

Good point. I should be clearer that all of the pip stuff is an example, not a standard, and that other installers are free to have different configuration names, concepts, et al. What they should provide is a way that users can control the order in which channel/repository/index/whatever are considered.

As much as pip should not be the limit of what other tools can do, I think pip should the lowest common denominator of support. Other tools may go beyond this PEP, but this PEP should not recommend behaviors that pip can’t reasonably implement.

My reservation on this is that you seem to be defining the available indexes twice. I would be in favor of a design that allows association of names to indexes, like Poetry does, and then using those names in an explicit order configuration.

The reservations that I have are:

  • What happens if something specified in index order does not exist elsewhere? Is its behavior undefined? Do you assume its behavior by trying different possibilities?
  • What happens if something is only specified in extra-index/find-links? Do you fall back to the implicit order definition?

I think of extra-index and find-links as parameters that define the type of a source, or the mode of interaction with that source. I like your idea of explicit ordering, because it separates the type of a source from explicit ordering.

A design that allows URLs in both places seems prone to confusion about where to set a source identifier and why.

There is some discussion of that here: PEP 766: Define terms for priority strategies among multiple indexes (index priority) by msarahan · Pull Request #4123 · python/peps · GitHub. Unfortunately, the design choices that make sense for pip are opposite for the design choice I’d recommend for index priority.

I’m not sure what the most intuitive choice is here. Do you choose the trust of an index with the inconvenience and risk of arbitrary code execution with the sdist over the convenience of a binary from a less-trusted index? I err on the side of index trust over anything else, but I can definitely see use cases where that would not be intuitive.

When trying to reason through the ordering of these things, I definitely had the feeling that finer granularity of these choices would be nice. It may be worth trying to come up with a general scheme that is totally configurable, but I fear it would be too deep of a rabbit hole.

Good point, I will work on developing this content.

1 Like

My initial thought is that this PEP doesn’t actually specify any behaviour. It defines a number of terms, and gives a broad description of how each of them works, but doesn’t actually make any requirements on tools. So maybe it would be better as an informational PEP rather than a standards-track one?

Like @notatallshaw I’m not particularly comfortable about the level of pip specific details in here. The package selection rules need to be defined in implementation-neutral terms. For example, the whole idea of find-links is not standardised, and shouldn’t be assumed to work the same across tools. I wouldn’t even say that pip’s implementation is necessarily the right definition - it’s grown over time and probably has a bunch of historical quirks that make no sense for a standard.

The whole business about merging information from CLI options, environment and config is far too implementation dependent. The principle that standards shouldn’t dictate UI applies here - a PEP has no business saying how a tool collects index search configuration, it should simply say what information needs to be available.

I disagree. This PEP can, and should, specify behaviours that are useful[1]. It should do that without reference to any tool. Specifically, unless the PEP mandates that all tools must implement certain behaviours, you shouldn’t assume that pip (or uv, or PDM) will change in response to the PEP at all.

That’s entirely pip-specific. For a standard, you need to start from standardised concepts. For that, what you have is that installers will have the following available to them:

  1. One or more indexes supporting the index protocol.
  2. A bunch of distribution files made available by unspecified means with no way of imposing any sort of organisation on them beyond “they can be used if you want to”.
  3. One or more explicitly specified distribution files or source trees that the user has specifically stated are to be installed.
  4. A list of requirements (in the sense of dependency specifiers) that the user has requested be installed.

That’s all you can assume in advance. Everything else needs to be specified in the PEP.

This is why the PEP shouldn’t be based on what pip does, but on what you believe to be the right behaviour (based on the use cases and justifications you include in the PEP).

Oh, boy is it too deep of a rabbit hole. This is very much why we’ve been discussing these questions for literally years on the pip tracker. There’s no magic answer here, and writing a PEP doesn’t make the problem any easier - it just makes the stakes higher if you make the wrong decision :slightly_frowning_face:


  1. One of the other issues with the PEP is that currently there’s not a lot of justification for any of the behaviours, and basically no use cases to motivate anything ↩︎

1 Like

My reservation on this is that you seem to be defining the available indexes twice. I would be in favor of a design that allows association of names to indexes, like Poetry does, and then using those names in an explicit order configuration.

My example was purposefully ambiguous as whether it was names or urls.

I don’t think the PEP should limit installers on how they let the users choose the order, only that the installer resolves from the user some unique ordered list.

IMO the questions of how the installer allows and validates the configuration should be left to the installer.

There is some discussion of that here: PEP 766: Define terms for priority strategies among multiple indexes (index priority) by msarahan · Pull Request #4123 · python/peps · GitHub. Unfortunately, the design choices that make sense for pip are opposite for the design choice I’d recommend for index priority.

I think the reason for the difference in design choices stem from whether you are using the extra index as a mirror or as a location for alternative packages.

As some users use these features for mirror behavior right now I think that use case must be addressed by this PEP. Will mirror like functionality stop working? Should installers provide a separate option for mirrors? Etc.

I think this question becomes a lot more relevant with PEP 708 (the one that makes installers fail when multiple indexes offer the same package name), since I expect there’s almost no chance that all indexes will have consistent metadata to make it succeed (all it takes is one index not specifying that their six is the same as PyPI’s six to break an install).

In this context, I think it does make sense to have at least two tiers of priority, such that an installer can decide either “if multiple feeds have package spam, always take indexA if it’s one of them” or “if multiple feeds have package spam, ignore PyPI and see if it’s only one feed now”.

In other words, a top priority and/or a fallback index, and only for excluding candidates when there are naming conflicts. This approach isn’t specified in 708, but it didn’t need to be as it falls under “installer UX affordances”[1]. And as both options could take multiple values (at risk of errors due to conflicts), configuration merging ought to not be any different from handling the current list of indexes.

But my main point is that PEP 708 breaks a lot of the risks around this proposal, and that any new proposal probably ought to be written as a design document for a specific installer (rather than a PEP) in terms of PEP 708.


  1. Or whatever term we used for this. ↩︎

1 Like

Thank you all for the helpful feedback. I have revised the draft. Specifically:

  • Removed most mentions of pip implementation
  • Added mention of how mirroring might work for index priority
  • Changed to Informational PEP type

I tried to clarify some PEP 708 stuff, but I think I still need to spend more time describing how this PEP would interact with PEP 708.

@steve.dower I’m afraid I don’t understand what you mean by “PEP 708 breaks a lot of the risks around this proposal”. PEP 708 is more server-side than this. It is limited to erroring out when it sees possible confusion. This is much more about giving end-users a tool that they can use to express preference, especially in the presence of confusion.

1 Like

If there’s no confusion, there’s no need for a preference :slight_smile:

PEP 708 is primarily a client-side change that requires either client-side configuration or server-side configuration to resolve. The PEP focuses mostly on the server-side configuration, because the change itself is very simple (refuse to choose between the same package name on multiple indexes without additional info) and the client-side design is up to individual tools.

So don’t be fooled by the text’s focus on server-side details. It’s only because those are the things that need specification that they take up so much of the text.