Dependency notation including the index URL

As far as I know pip does not allow the --index-url flag inline in requirements.txt files, so it is not possible to write something like the following:

Alpha
--index-url https://pep503.bravo.dev/simple/ Bravo
--index-url https://pep503.charlie.dev/simple/ Charlie
Delta

Also tox is deprecating its indexserver setting, which allows something like:

[tox]
indexserver =
    bravo = https://pep503.bravo.dev/simple/
    charlie = https://pep503.charlie.dev/simple/
[testenv]
deps =
    Alpha
    :bravo:Bravo
    :charlie:Charlie
    Delta

But seems like there is a need to be able to enforce specific dependencies to be fetched from specific servers. There is always the issue that if there are 2 (or more) projects with the same name on different servers, then it is rather hard to control which one will ultimately be fetched and installed. Usually it is the project for which there is a distribution with the highest version number. But it is not necessarily what the user wants. Often the user wants a private dependency to be fetched from their own private index, even if there is a project with the same name and a higher version number on the public index.

I do not have the links right now, but I have already seen multiple questions to that effect (on Stackoverflow and others):

I feel like it could be useful to have a dependency notation that includes the URL to the index. Is there any? Is there any other solution to solve these kinds of issues?


I feel like maybe a dependency notation like the following could help:

Library @ https://pep503.tango.dev/simple/

which would translate to:

python -m pip install --index-url 'https://pep503.tango.dev/' Library

As with PEP 440’s direct references, packages containing such dependencies should be rejected from public servers.

I do not know if something like that has already been considered (please provide links if yes).

2 Likes

Personally I feel this is against the design purpose of indexes. The index mechanism is designed around the idea that they are (at some level) interchangable for a given package, to enable index mirroring and proxying. This configuration therefore belongs to the application/user level, and enabling packages to enforce its dependencies to being only downloadable from a specific index would defeat the purpose of the entire design. I would say that if any code that require this mechanism, it is a sign that the code should not be distributed as a Python package in the first place. Not all Python code need to be distributed as Python packages.

1 Like

I would say that if any code that require this mechanism, it is a sign that the code should not be distributed as a Python package in the first place. Not all Python code need to be distributed as Python packages.

Let’s remember about the origins of that need - the issue is caused by the lack of capability to define the precedence of indices, which is something really fundamental, so people are just looking for any alternatives to handle that. The above solution of course has wider scope of use-cases, but the problem I mentioned could be the most critical one.

But as noted, indices should be interchangeable, so precedence doesn’t make sense.

Maybe there is a need for a mechanism (not necessarily an index) that has a precedence, but I don’t honestly understand the requirement there (at least, not in any way that goes beyond “Not all Python code need to be distributed as Python packages” that @uranusjr mentioned.

Should be by current design, but current design doesn’t have to be infallible. I see you were involved in both mentioned GitHub issues, especially that one:


so I assume you are familiar with the use-cases and why just recommending devpi is not always a solution. I can add another example, when we can use private PyPI repo, but without caching of PyPI due to platform limitations, so then all the issues of package overlapping etc. can occur in our case. Of course in a perfect world all the people would name their packages with proper prefixes, so the probability of being “overwritten” by package in PyPI would be much lower (but still not zero - it could be quite interesting way of hacking / interrupting systems btw), but… we are still on Earth :wink:

What I take out of this: use devpi or pydist (or something similar). I will try to push this solution harder next time similar questions come.


What if PyPI allowed anyone to simply reserve a namespace such as myusername-* and/or myusername.*? Is that possible? Wouldn’t that solve a big portion of such issues?


Not all Python code need to be distributed as Python packages

I am not sure how it relates to the current topic. Maybe it was meant as “not all Python code needs to be distributed on PEP 503 indices”. That I would understand, there are probably other ways to distribute Python code than PyPI, PEP 503, etc. this is what the community has right now, take it or leave it. Fair enough :slight_smile:

Personally I’m in favour of a solution along this line. There were multiple threads around this exact topic:

The discussion has died down, but I think people are still interested in seeing that happen. The biggest roadblock is likely someone with enough incentives to drive the discussion forward.

1 Like

Yes, that may be a better characterisation. Generally, these discussions start with a comment along the lines of “my application does X”. But PyPI is designed around distributing packages (i.e., libraries) and applications are different.

I’m not in a position to complain, as pip is an application distributed via PyPI, but I do think that we need to acknowledge that applications are a different situation, with different requirements, and face up to the fact that the story for distributing Python applications is pretty bad. I’d love to see a really good solution for that, and honestly I think that trying to make package distribution channels work as application distribution channels is holding us back in finding one.

Simple example, I have a 1-file .py script that uses requests and beautiful soup. How do I distribute it to users? Why is “bundle it as a library and tell them to install it in a virtualenv and use python -m myscript” the best answer we have??? The problems only get worse as the application gets bigger…

1 Like

I’d be all in favor of myusername.*. Namespace packages are very easy nowadays. People would sure complain that it is more to type (or whatever) but import acme.library as lib is also very easy anyway. For reasons I can not pinpoint right now, in my mind acme-library feels inferior to acme.library. Probably because people would be more tempted to have a top level package library in project acme-library and imports would clash at some point later. It is of course also possible to have a top level package library in a acme.library project, but maybe less tempting.

Is it worth opening a new discussion or should it be added to the existing one “PyPI as a Project repository vs. Name registry (a.k.a. PyPI namesquatting, e.g. for Fedora packages)”?

As far as I could see the issues raised are very much valid for libraries (I am probably missing some knowledge to realize how the distinction between applications and libraries is relevant here).


Side note regarding applications (it is worth a discussion of its own):

Seems to me like it is more of a packaging and installation issue rather than a distribution issue (i.e. PyPI and indices in generally would still play a relatively passive role here).

My wish is to be able to upload on PyPI something along the lines of a zipapp (*.pyz files) with the same tags as wheels. Could be pex, or shiv or whatever. Would not solve all use cases, but for many use cases it would be nice enough. Probably this has already been considered somehow (as always I am grateful for links pointing me to past or current discussions on the topic).

I don’t know if I’ve ever written anything up on this (partially because I am not especially motivated to do it any more), but my “simple” solution to namespacing private-only packages was to add into PEP 508 the possibility of private namespaces in the package name for things that only appear on your own private index.

The idea would be that you could do pip install mycompany::blah to install a package blah taken from the mycompany namespace. PyPI would have no top-level namespace, so any query to PyPI starting with a namespace would fail. You don’t have to worry about clashes between these namespaces, because they’re only intended to work on private indexes anyway. The minimum that pip would need to do would be to allow specifying dependencies that contain ::, though one could also imagine a future where pip is smart enough to know the mapping between a namespace and an index server (and refuse to send any queries to one that doesn’t match). PyPI would not need to do anything.

This doesn’t solve the problem of namespace clashes within a program — if you make mycompany::requests and then try and use the upstream requests, you will get conflicts, and of course there’s always the possibility that you create a package called glorb and then someone creates a popular open source project called glorb, but if you use a namespace package for your company and use index-server namespaces, the chances of real conflicts are minimal (and easy to solve).

The benefit of this is that you don’t have to register anything with PyPI and it makes it so that you are not subject to an attack where someone uploads malicious code in a package called glorb to PyPI, then someone has a misconfigured pip that pulls in the malicious code — if you have a misconfigured pip, the download for mycompany::glorb would simply fail.

1 Like

The namespaces should be standard URIs. There is no need to develop a different parser and escaping mechanism.

Perhaps orgname/blah instead of orgname::blah. PyPI routes for groups:

One problem: { Groups, index servers, signed collectiosn of namespaced collections of versions of packages }


How will this work with TUF keys? Are there groups for those ACLs?

Is this fundamentally different from just managing requirements.txt/environment.yml/Pipfile.lock in a namespaced git repo and creating a package to install those?


Does this not work (anymore?):

# requirements.txt
-i https://localhost:8001/index/server
pip
requests
pdbpp

-i https://devpi.localdomain
ourlocalapp

I would recommend:

  • Add a setup.cfg, setup.py, README.md (by hand or from a cookiecutter)
  • Create a console scripts entry point that calls a function that returns nonzero on error (or raises SystemExit)
  • pip install twine; twine check
  • pip install -U pipx
  • pipx install myonefilescript

And then try and remember to run this periodically:

  • pipx list; pipx upgrade-all

The alternative is to package for e.g. brew, chocolatey, deb, rpm, conda so that things get upgraded when the system package manager UPGRADE_ALL command gets run.

I am interested in knowing a bit more about that. Do you mean that deploying something like devpi would not help? Or do you mean that it is simply not feasible to deploy a custom server?

I assume that if --extra-index-url is involved, it means there is an internal private repository somewhere. Why could that repository not be configured as to enforce the installation of a specific named dependency from a specific source?

FWIW, these are called “Per Project Index” Indices/Indexes in the packaging.python.org docs:

A private or other non-canonical Package Index indicated by a specific Project as the index preferred or required to resolve dependencies of that project.

https://packaging.python.org/glossary/#term-Per-Project-Index

PEP 508 is not a URI and already needs its own parser, but I guess I don’t really care that much what the separator is.

In my proposal PyPI would do nothing except return an error. This would just be a standard for custom servers to implement (and the easiest implementation is just to have the custom server have a single namespace that is acceptable to it and return an error if a different namespace is specified).

(I deleted the quoted post, because it’s basically just wrong)

Thinking about this a bit further, the fundamental issue with “prefer a particular index” is that we’re breaking the key invariant in packaging, that any copy of version X.Y of package foo is functionally identical. That is baked into many areas of packaging, and changing it could cause all sorts of weird breakages (for example, pip’s cache is keyed on name/version).

If you’re saying that it’s important to you that we get foo-1.0 from your index, as opposed to somewhere else where it may exist, then you’ve misunderstood that fundamental principle.

It’s also worth pointing out anyone with a problem that is solved by preferring a local index over PyPI, can (as far as I am aware) just as easily solve their problem by using a devpi instance as their pip index, and configuring it to serve “private” projects first, falling back to a PyPI mirror. So it’s never true that anyone “must” prefer one index over another.

That’s not to say that there aren’t problems for which namespace reservation wouldn’t be the right solution, just that it’s not needed for the “prefer a particular index” issue that is typically how these discussions start.

1 Like

Depends what you mean by “work”. It adds the two indexes to the list to be searched, and looks for the given requirements. It doesn’t use particular indexes for particular requirements, in the way that your layout suggests you’re hoping it will.

Is the local version tag meant to help in this situation to disambiguate?

Local versions do help a lot in private package settings when combined with constraints, but fall short when versions cannot be pinned exactly. Say I have foo-1.0+1 and foo-1.1+1 locally, there’s not a good way to specify only these versions are valid, that can guard against a potential public foo-1.1 release on PyPI. Maybe it would help if we introduce OR conditions in specifiers, but that might as well also still only solve the use case partially, while opening a gate to new problems.