We somewhat frequently see problems like this one arise, and proposals like Archspec (and I’ve seen more that have never made it public). But I honestly can’t see any of them being integrated into every install tool that we may care about.
Essentially, the problem comes down to this: based on something about the target environment, users should select a different wheel.
To use the example in the first link above, they should have selected fluidsim[pythan, mpi]
if they have Pythan and MPI support, and fluidsim[purepy]
otherwise. Other examples we’ve seen include GPU-specific packages or based on the availability of certain CPU features.
The fundamental problem is that users can’t just “install (package)” - they have to read some documentation and go and figure out which of a set of packages they require and then specify that name. (See https://pytorch.org/get-started/locally/ for a very concrete example.) This prevents specifying the package as a dependency, since you cannot reliably acquire it by name alone.
So here’s an alternative proposal:
A selector package is a wheel tagged with a new platform (e.g. “mypackage-1.0.0-selector.whl”) that indicates it is a selector.
Selector packages are not installed into the target environment and do not specify any install dependencies (meaning they are trivial to resolve when specified as a dependency of other projects).
During overall dependency resolution (at some point that I need help finding), the selector package is extracted and executed in the target environment’s interpreter. It prints a set of requirements to the console, which is captured and fed back into the dependency resolution process in place of the selector package. The selector package files are then deleted. (Update: It’s probably better to always use the latest selector package and pass in the requested version, as this means that finding the selector only requires the name and not full dependency resolution.)
This allows a reasonable amount of flexibility to choose a specific package based on what is currently (or about to be) installed, and anything that can be determined through the standard library or vendored code. The resulting environment does not depend on the selector package, which means if you freeze and reproduce the env elsewhere, you’ll bypass it completely.
Structurally, this would mean that anyone using a selector package will actually have multiple packages, e.g. “pytorch” (the selector), “pytorch-cpu”, “pytorch-cuda-9.2”, etc., where one of the latter is required to actually get the pytorch
module. In some ways, it operates similarly to extras, and also similarly to environment markers, but allows people to handle the more complex cases that those don’t.
Things mentioned above that I don’t have strong preferences about:
- exactly how we mark a package as a selector package (I suggested a platform tag)
- how the package is executed (executing a well-known .py filename with package name/version/other reqs on argv seems reasonable)
- exact format of the requirements (obviously needs to be specified though)
Thoughts?