Selecting variant wheels according to a semi-static specification

msarahan · May 30, 2024, 1:22am

How about “variant”? Not trolling. Seriously. I can’t come up with anything better.

I’ve been assuming that these are additions to the platform tag, like you describe above:

I’ve been assuming that publishers should be able to define their own. In trying to cobble together some examples, I’m not so sure it makes sense. I have been tinkering with a few different use cases:

“accelerators” - this is the CUDA/ROCm/TPU use case. My assumption here is that users can specify one or more accelerator families to enable, and that they are not mutually exclusive.
“cpu” - the SIMD stuff that Oscar has described so well.
MPI - where implementations are mutually exclusive. Users must specify which implementation they want.
OpenMP - implementations are mutually exclusive, but users probably don’t need to specify which implementation they want. Instead, the first implementation installed should “win” and activate that variant, which would make it impossible to install a package built with other implementations.

In these use cases, I started out with a meta-selector for “accelerators” and then had selectors for CUDA version and arch. I don’t think that separation really makes sense. I think it makes sense to lump in CUDA, ROCm, TPU and whatever else into one logical unit that people collaborate on. Maybe I’ll regret that idea, but there’s also nothing stopping anyone from making their own selector.

I think you mean some kind of advertisement of variants that are available across the collection of packages. Is that accurate? The installer workflow that I’ve had in mind goes something like:

Installer requests index metadata
index metadata contains list of variants (dimensions, axes, labels, whatever, NOT values).
Installer uses that list to initialize variant values from the system/environment
Installer uses variant values to create tag candidates

If variants are encoded in the system platform tags, I’m not sure this is relevant. There’s no such thing as depending on a variant (unlike what we have today with something like cudf-cu12).

As I mentioned above, I think there are several behaviors that we’d need to account for. I think these would need to be accounted for in the detection program, and as such, that detection program would need to be aware of the environment that it was running in.

I’m operating under the assumption that the core metadata file must be similar across all wheels. I expect that differences in requirements will have to be expressed with environment markers. That does seem to point to a need to extend the environment marker scheme so that it is also using information from the variant detector programs.

Users definitely should have ultimate overrides over anything, and I also think that overriding detected values will be a key part of package building. I was imagining the usual environment variable schemes, along with maybe entries in pip.conf or some new config, since the idea is to be independent of any single installer. This new config should be shared by any “detector program”

I don’t think there’s anything special here - locking tools can use the variant detector stuff the same as installers. There is potential for override, though, or for locking with some matrix of values, such that a lockfile might have many variants to choose from, and only one gets used when creating an environment with the lockfile. This is somewhat like the multi-platform lockfiles that conda-lock does.

Is an sdist available? If so, try to build it. If not, provide an error message that clearly states which package(s) did not have a matching set of variants, and also provide the current values for the variants.