UPDATE: This is PEP 766 – Explicit Priority Choices Among Multiple Indexes | peps.python.org
Greetings Pythonistas. In discussions about how to extend the metadata that we can support, the topic of index priority came up. @pf_moore recommended that index priority be submitted as a standalone topic that metadata might possibly build on. My NVIDIA coworkers and I humbly submit an attempt at codifying index priority behavior, such that tools can share a common vocabulary and common behavior.
Index priority has already been rejected as part of PEP 708 as a way of ameliorating the dependency confusion attack problem. To seed this discussion, we would like to answer the reasons why Index Priority may still be helpful in other ways, even though it was rejected as a fix for dependency confusion attacks. This text is not in the PEP, but I’d be happy to incorporate it if you think it fits.
Thank you for your time. I’m looking forward to a vigorous discussion.
Reconsidering PEP 708 rejection of index priority
PEP 708 rejected index priority for several reasons. The text in bold is copied from the reasons that PEP 708 rejected index priority as a solution for the dependency confusion attack problem.
- We’ve spent 15+ years educating users that the ordering of repositories being specified is not meaningful, and they effectively have an undefined order. It would be difficult to backpedal on that and start saying that now order matters.
The time spent educating users that ordering is not meaningful was not wasted. The lack of ordering is an essential part of pip’s behavior. That behavior is useful in many use cases, but some use cases need the ability to order indexes. There is a tradeoff that users must make when they use multiple indexes. It’s not that order of indexes never matters, nor that it should always matter. Users need to be able to choose when it matters, and they need to know when they are making that choice.
- Users can easily rearrange the order that they specify their repositories in within a single location, but when loading repositories from multiple locations (env var, conf file, requirements file, cli arguments) the order is hard coded into pip. While it would be a deterministic and documented order, there’s no reason to assume it’s the order that the user wants their repositories to be defined in, forcing them to contort how they configure pip so that the implicit ordering ends up being the correct one.
Configuration hell is nothing new. Because this PEP makes it more viable to have extra indexes permanently listed in configuration files instead of as one-off command line arguments, it dramatically improves the user experience and simplifies docs. It seems likely that the time spent figuring out the right configuration pattern would be outweighed by the time saved in not having to debug what strange things were brought in from an extra index URL or what desirable extra index URL packages were replaced with a pip command that didn’t include the extra index URL.
- The above can be mitigated by providing a way to explicitly declare the order rather than by implicitly using the order they were defined in; however, that then means that the protections are not provided unless the user does some explicit configuration.
This is more PEP 708 territory, and I think PEP 708 does this in a better way. PEP 708 handles this on a global repo level, but is harder to configure for individual users. Index priority is for the sake of improving the predictability of where a package will come from, and it absolutely implies some necessary configuration. Then again, to use a non-default repo or multiple repos already implies some explicit configuration.
- Ordering assumes that one repository is always preferred over another repository without any way to decide on a project by project basis.
Right now the repo that a package comes from is nominally undefined, but it is predictable as an implementation detail. Changing this so that the predictability is user-configurable is an improvement. If a person needs to decide on a per-package basis, that’s an argument for allowing the repo as part of a spec. This is probably roughly equivalent to the namespaces idea. It was mentioned as the optimal solution in the PEP 708 discussion, but seems onerous to express with single-package granularity.
- Relying on ordering is subtle; if I look at an ordering of repositories, I have no way of knowing or ensuring in advance what names are going to come from what repositories. I can only know in that moment what names are provided by which repositories.
If you need specific things from specific repos, then you need a way to specify that. Index priority is not that granular. That doesn’t mean that index priority doesn’t improve the overall situation.
What you do get from index priority is confidence that whatever set of packages you get is going to be a self-consistent set, to the limit that any given index is complete. By saying that there can’t be any order among indexes, it is ambiguous what mixture of packages from any given index you get, and it varies by published package versions. Published package versions are rarely directly under the user’s control when using PyPI, but they are often under control with custom indexes. This is the same idea as projects like devpi, artifactory, or simpleindex, except that index priority uniquely makes this configurable as part of the client, not as a service that must be run and configured separately. This facilitates greater flexibility with per-environment configuration.
- Relying on ordering is fragile. There’s no reason to assume that two disparate repositories are not going to have random naming collisions—what happens if I’m using a library from a lower priority repository and then a higher priority repository happens to start having a colliding name?
What happens today without priority? Which index wins in this situation? The version numbers are meaningless, because they’re for separate projects. However, the version will probably be the decider here, and moreover there’s no way to pick one repo over the other, aside from a version constraint that is really conflating version with package identity. Predictability is an improvement, not a weakness. Configurability is an improvement.
- In cases where ordering does the wrong thing, it does so silently, with no feedback given to the user. This is by design because it doesn’t actually know what the wrong or right thing is, it’s just hoping that order will give the right thing, and if it does then users are protected without any breakage. However, when it does the wrong thing, users are left with a very confusing behavior coming from pip, where it’s just silently installing the wrong thing.
Why is it silent? Is it not showing which index it is installing things from? We are currently not keeping track/showing which index a package came from, but this proposal notes that we should be doing that.