But seems like there is a need to be able to enforce specific dependencies to be fetched from specific servers. There is always the issue that if there are 2 (or more) projects with the same name on different servers, then it is rather hard to control which one will ultimately be fetched and installed. Usually it is the project for which there is a distribution with the highest version number. But it is not necessarily what the user wants. Often the user wants a private dependency to be fetched from their own private index, even if there is a project with the same name and a higher version number on the public index.
I do not have the links right now, but I have already seen multiple questions to that effect (on Stackoverflow and others):
Personally I feel this is against the design purpose of indexes. The index mechanism is designed around the idea that they are (at some level) interchangable for a given package, to enable index mirroring and proxying. This configuration therefore belongs to the application/user level, and enabling packages to enforce its dependencies to being only downloadable from a specific index would defeat the purpose of the entire design. I would say that if any code that require this mechanism, it is a sign that the code should not be distributed as a Python package in the first place. Not all Python code need to be distributed as Python packages.
I would say that if any code that require this mechanism, it is a sign that the code should not be distributed as a Python package in the first place. Not all Python code need to be distributed as Python packages.
Let’s remember about the origins of that need - the issue is caused by the lack of capability to define the precedence of indices, which is something really fundamental, so people are just looking for any alternatives to handle that. The above solution of course has wider scope of use-cases, but the problem I mentioned could be the most critical one.
But as noted, indices should be interchangeable, so precedence doesn’t make sense.
Maybe there is a need for a mechanism (not necessarily an index) that has a precedence, but I don’t honestly understand the requirement there (at least, not in any way that goes beyond “Not all Python code need to be distributed as Python packages” that @uranusjr mentioned.
Should be by current design, but current design doesn’t have to be infallible. I see you were involved in both mentioned GitHub issues, especially that one:
so I assume you are familiar with the use-cases and why just recommending devpi is not always a solution. I can add another example, when we can use private PyPI repo, but without caching of PyPI due to platform limitations, so then all the issues of package overlapping etc. can occur in our case. Of course in a perfect world all the people would name their packages with proper prefixes, so the probability of being “overwritten” by package in PyPI would be much lower (but still not zero - it could be quite interesting way of hacking / interrupting systems btw), but… we are still on Earth
What I take out of this: use devpi or pydist (or something similar). I will try to push this solution harder next time similar questions come.
What if PyPI allowed anyone to simply reserve a namespace such as myusername-* and/or myusername.*? Is that possible? Wouldn’t that solve a big portion of such issues?
Not all Python code need to be distributed as Python packages
I am not sure how it relates to the current topic. Maybe it was meant as “not all Python code needs to be distributed on PEP 503 indices”. That I would understand, there are probably other ways to distribute Python code than PyPI, PEP 503, etc. this is what the community has right now, take it or leave it. Fair enough
Yes, that may be a better characterisation. Generally, these discussions start with a comment along the lines of “my application does X”. But PyPI is designed around distributing packages (i.e., libraries) and applications are different.
I’m not in a position to complain, as pip is an application distributed via PyPI, but I do think that we need to acknowledge that applications are a different situation, with different requirements, and face up to the fact that the story for distributing Python applications is pretty bad. I’d love to see a really good solution for that, and honestly I think that trying to make package distribution channels work as application distribution channels is holding us back in finding one.
Simple example, I have a 1-file .py script that uses requests and beautiful soup. How do I distribute it to users? Why is “bundle it as a library and tell them to install it in a virtualenv and use python -m myscript” the best answer we have??? The problems only get worse as the application gets bigger…
I’d be all in favor of myusername.*. Namespace packages are very easy nowadays. People would sure complain that it is more to type (or whatever) but import acme.library as lib is also very easy anyway. For reasons I can not pinpoint right now, in my mind acme-library feels inferior to acme.library. Probably because people would be more tempted to have a top level package library in project acme-library and imports would clash at some point later. It is of course also possible to have a top level package library in a acme.library project, but maybe less tempting.
As far as I could see the issues raised are very much valid for libraries (I am probably missing some knowledge to realize how the distinction between applications and libraries is relevant here).
Side note regarding applications (it is worth a discussion of its own):
Seems to me like it is more of a packaging and installation issue rather than a distribution issue (i.e. PyPI and indices in generally would still play a relatively passive role here).
My wish is to be able to upload on PyPI something along the lines of a zipapp (*.pyz files) with the same tags as wheels. Could be pex, or shiv or whatever. Would not solve all use cases, but for many use cases it would be nice enough. Probably this has already been considered somehow (as always I am grateful for links pointing me to past or current discussions on the topic).
I don’t know if I’ve ever written anything up on this (partially because I am not especially motivated to do it any more), but my “simple” solution to namespacing private-only packages was to add into PEP 508 the possibility of private namespaces in the package name for things that only appear on your own private index.
The idea would be that you could do pip install mycompany::blah to install a package blah taken from the mycompany namespace. PyPI would have no top-level namespace, so any query to PyPI starting with a namespace would fail. You don’t have to worry about clashes between these namespaces, because they’re only intended to work on private indexes anyway. The minimum that pip would need to do would be to allow specifying dependencies that contain ::, though one could also imagine a future where pip is smart enough to know the mapping between a namespace and an index server (and refuse to send any queries to one that doesn’t match). PyPI would not need to do anything.
This doesn’t solve the problem of namespace clashes within a program — if you make mycompany::requests and then try and use the upstream requests, you will get conflicts, and of course there’s always the possibility that you create a package called glorb and then someone creates a popular open source project called glorb, but if you use a namespace package for your company and use index-server namespaces, the chances of real conflicts are minimal (and easy to solve).
The benefit of this is that you don’t have to register anything with PyPI and it makes it so that you are not subject to an attack where someone uploads malicious code in a package called glorb to PyPI, then someone has a misconfigured pip that pulls in the malicious code — if you have a misconfigured pip, the download for mycompany::glorb would simply fail.
I am interested in knowing a bit more about that. Do you mean that deploying something like devpi would not help? Or do you mean that it is simply not feasible to deploy a custom server?
I assume that if --extra-index-url is involved, it means there is an internal private repository somewhere. Why could that repository not be configured as to enforce the installation of a specific named dependency from a specific source?
PEP 508 is not a URI and already needs its own parser, but I guess I don’t really care that much what the separator is.
In my proposal PyPI would do nothing except return an error. This would just be a standard for custom servers to implement (and the easiest implementation is just to have the custom server have a single namespace that is acceptable to it and return an error if a different namespace is specified).
(I deleted the quoted post, because it’s basically just wrong)
Thinking about this a bit further, the fundamental issue with “prefer a particular index” is that we’re breaking the key invariant in packaging, that any copy of version X.Y of package foo is functionally identical. That is baked into many areas of packaging, and changing it could cause all sorts of weird breakages (for example, pip’s cache is keyed on name/version).
If you’re saying that it’s important to you that we get foo-1.0 from your index, as opposed to somewhere else where it may exist, then you’ve misunderstood that fundamental principle.
It’s also worth pointing out anyone with a problem that is solved by preferring a local index over PyPI, can (as far as I am aware) just as easily solve their problem by using a devpi instance as their pip index, and configuring it to serve “private” projects first, falling back to a PyPI mirror. So it’s never true that anyone “must” prefer one index over another.
That’s not to say that there aren’t problems for which namespace reservation wouldn’t be the right solution, just that it’s not needed for the “prefer a particular index” issue that is typically how these discussions start.
Depends what you mean by “work”. It adds the two indexes to the list to be searched, and looks for the given requirements. It doesn’t use particular indexes for particular requirements, in the way that your layout suggests you’re hoping it will.
Local versions do help a lot in private package settings when combined with constraints, but fall short when versions cannot be pinned exactly. Say I have foo-1.0+1 and foo-1.1+1 locally, there’s not a good way to specify only these versions are valid, that can guard against a potential public foo-1.1 release on PyPI. Maybe it would help if we introduce OR conditions in specifiers, but that might as well also still only solve the use case partially, while opening a gate to new problems.