I appreciate the analogy, but as always they can only go so far. In this case, the analogy breaks down since the possibility of DoS attacks weren’t introduced by the design of the secure transport layer. What I’m trying to get across is that PEP-708 is trading a severe (arbitrary code execution) exploit for a (significantly) less severe developer productivity one (which can’t easily be solved by the developer). The thing that I’m focusing on is that this may not necessarily be a trade-off we have to pay in order to solve the underlying problem.
I don't want to get sucked into solutioneering in this thread, since it is explicitly about PEP-708, but I don't see that this statement HAS to be true... (this is where I get sucked in)
If the index is treated as the configured namespace provider, then every namespace-less project name (incl. dependency definitions) can be reasonably assumed to default their namespace to that index. In other words: “if you’ve uploaded it to pypi, we assume your dependencies are also on pypi”. The constraint would be that no index has the right to define new projects on a different namespace (so the torchtrition case would have failed in the client from the outset, and would have required that the namespace of the extending index be used for torchtrition, and an explicit namespace declaration would be required to use it as a dependency; at least until the point that torchtrition was registered on pypi.org). This is perhaps your rejected Require all projects to exist in the “default” repository, with the addition of a namespacing concept and replacing the word “default” with “configured index” (pypi.org by default, defined by index-url).
From the original “dependency confusion” use case’s perspective (a pypi.org proxy + a local index), you would then expect to be running an internal index (configured in pip via index-url) which is its own namespace AND is a broker of other namespaces (e.g. pypi.org). That index would be responsible for tracking the namespace of a project, and this would be the default value used for its dependencies (declared without a namespace).
I don’t know if it is necessary to prohibit projects of the same name with different namespaces from being installed together - it isn’t obvious that this is solving the underlying problem that project != package name(s), though it is a reasonably strong indication that pypi.org::prjA and my-index::prjA are likely to have package name collisions. Ultimately, this is another (and existing) form of “dependency confusion”, which would require core langage level (i.e. import mechanism) changes to solve properly (possibly re-using the namespace concept). Though we could avoid making the situation worse by explicitly prohibiting projects of the same name from different namespaces being installed together.
From a user perspective, it would be entirely reasonable to take the default name from the index-url. Similar syntax to what you propose would be necessary only to retrieve projects from a different namespace.
This would also be fully backwards compatible - existing project dists would continue to work with updated build and client tools (until you reach a dependency confusion, at which point they would error). To support namespacing the build tools and clients would both need to be extended to support namespacing (esp.) when declaring and requesting dependencies. However, newly built packages with namespace declarations would most likely not be compatible with old clients.
I would be happy to engage on this, perhaps that should be in a separate topic though?
The objective for me in this thread is not to propose a solution, rather to highlight that the PEP is:
- very nuanced (i.e. it is easy to misunderstand, which can lead to index misconfiguration and… dependency confusion)
- not like other solutions out there in the wild (e.g. scopes / groupId) - this will need special casing for each of the “software repository” tools (e.g. artifactory, nexus, azure artifacts, devpi, etc.) if they want to support this “extra-index” use case (perhaps they won’t bother, since you can just create a repo which does the index grouping in a configurable way)
- doesn’t fully “solve” dependency confusion (since index operators still need to use mechanisms such as priority ordering, not mechanisms proposed in PEP-708, to actually resolve name conflicts), it simply prevents the code execution part of dependency confusion
- introduces its own (significantly less severe) “dependency confusion” problem (one day you can install
internal-project-x, and the next day somebody registersinternal-project-xon pypi and it suddenly stops working, with no remedy proposed in this PEP).
A few questions I ask myself regarding this PEP:
- Does it prevent a real and important problem? Yes, by making the client secure by default
- Does it iterate us towards a solution for the original dependency confusion problem? No, I don’t think so (even though it clearly is better to raise/stop than it is to blindly install dependency confused projects, and we shouldn’t let perfection be the enemy of the good). I don’t believe that the PEP will be useful to solve the underlying problem (but does provide additional tools to solve the problem for the
--extra-index-urlcase with today’s index name ambiguity). - What would I do if I were the PEP delegate (I’m not, and grateful that such a tough decision is Paul’s to take
)? I would want high confidence that the effort that would go into implementing and following-up with the PEP couldn’t be instead invested in solving the underlying project name ambiguity through repo-level namespacing (of some kind). Perhaps it has been discussed and rejected in detail elsewhere (happy to be pointed to a canonical reference if it exists! FWIW, it isn’t Namespace support in pypi, as that is about having multiple namespaces within a single repo). I will happily start a new discussion on this topic, if that would be worthwhile?