Implementation variants: rehashing and refocusing

dhellmann · June 7, 2024, 2:30pm

My ideal answer would be that users don’t have to care about variants and the “right” ones are selected automatically when the use pip or uv or whatever to install a wheel. There will undoubtedly be cases where users have to specify some input manually (whether because they encounter a selector bug or they are doing something like using a lock file to select a variant other than what would be picked by default).

The installer should use the algorithm it’s already using to select candidates based on versions and tags. Then it should look at how many variants are available for a candidate, and use the variant rules to select the correct or best one. We should expand on what that part of the algorithm looks like, and maybe that’s what you’re asking?

When there is no compatible variant, for simplicity we could treat that case as the version being incompatible with the current host/environment. Although earlier Oscar had proposed feeding the variant information into the build tool when building that version from source, I think that complicates things even further and introduces more edge cases. I think we’ll get further by focusing on the use cases of making it easy for publishers to provide variants and for users to get the right one as a pre-built artifact.

jamestwebber · June 7, 2024, 2:35pm

Yeah, exactly. I think everyone’s ideal is “the installer does the optimal thing automatically” but I’m unclear on the specification for that behavior. Importantly, what kind of burden does this put on the installers?

Is there a fixed set of variants under discussion, or can any project introduce a new type of variant? If the set is fixed, is the decision tree for the installer well-defined and easy to express? If it’s not fixed…how the heck does this work?

steve.dower · June 7, 2024, 2:44pm

I made one proposal above - basically, represent the “available features” as already installed packages (i.e. installing those packages manually defines the available features), and then packages can specify “I require this existing package but if it’s not there, exclude me rather than installing it”.

Then what we need is the ability for installers to choose from a broader set of packages - they’d all have the same wheel name today, and that needs to change, but without forcing the metadata that’s currently embedded in the wheel name to change. So some new kind of index field that allows wheels to be ordered, but then excluded based on the new constraints. First one that isn’t excluded gets used. If they all get excluded, the installer errors out.

ncoghlan · June 7, 2024, 3:24pm

I think this is also going to be a good way to illustrate why new syntax and new metadata fields is going to be desirable from a backwards compatibility point of view. This may seem counterintuitive, but it makes sense if the following guiding principles are adopted:

the semantics of all existing fields remain unchanged
new syntax is only permitted in new fields (at least on PyPI, but potentially everwhere)
the end-to-end process of publishing and installing default variants should be unchanged

That way projects would be able to publish metadata for their default build variants that was still compatible with older client tools, even while adding the new build variant metadata for newer clients that could make use of it.

The design sketch below follows these principles. While we wouldn’t necessarily do things exactly the way it proposes, I think it’s illustrative enough to suggest that something along these lines could get us quite a long way (just as extras have).

Also, to be clear, it would be possible with the design sketch below to make installation fail by being overly specific with build variant dependencies. This isn’t massively different than the status quo, where overly specific version dependencies are likely to result in unresolvable dependency trees. I see that as a quality of implementation concern at the level of individual projects, and believe it can be handled the same way overly specific version dependencies are: either avoiding affected projects (if there are viable alternatives with better dependency management), or else discussing the problem with the projects involved, and perhaps offering them PRs to improve the situation (if they’re amenable to that).

Dependency declarations

Dependency declarations would change to allow a new (variant) field to appear between the distribution name and the [extras] field.

Depending on name and depending on name() mean different things: name will accept any build variant, while name() explicitly requires the default variant. It will primarily make sense to depend on name() when that’s just a shorthand for a particular named build variant declared as Default-Variant in the underlying project’s metadata. (Edit: inadvertently omitted this paragraph in the initial writeup)

Environment markers would also gain access to a new variant == ... clause.

Using a new dependency syntax and environment marker clause means that older clients will fail fast when given a set of requirements that depends on the client understanding build variants for it to be correct. The source metadata rules below are also designed such that older clients won’t be invadvertently exposed to the new syntax via transitive dependencies.

It also means that the installers themselves know they need to check for Provides-Variant metadata if the package is already installed in the target environment.

By contrast, if we try to reuse the existing extras syntax and metadata fields for this purpose, then we’d be forced to bump the major metadata version to keep older clients from being exposed to it, which means that projects would have to make a choice between supporting older installation clients and making use of the new build variant support, and we know from past experience that imposing such a requirement is enough to kill a new capability before it ever gets anywhere (if anyone is fortunate enough to not know what I’m referring to: explore the fate of the old metadata 2.0 PEP which languished for years before finally being put out of its misery with the publication of metadata 2.1 which actually came with a viable transition plan).

Metadata updates

Source metadata

New fields:

Provides-Variant: like Provides-Extra, but for build variants. To avoid confusion, default would be disallowed as a variant name (being reserved as the name of the otherwise anonymous variant indicated by an empty build variant string).
Default-Variant: normally a dependency on distname() would refer to the anonymous default variant, this field allows it to mean something else (e.g. numpy() might be equivalent to numpy(openblas)
Requires-Dist-Variant: allows the new name(variant) syntax in dependency declarations and the new variant == ... expression in environment markers (so older clients will never see the new syntax).

The idea here would be to allow projects to publish source metadata where their default variants continued to be backwards compatible with old installation clients, while still allowing new variants to be more specific.

Numpy for example, might declare:

Provides-Variant: openblas
Provides-Variant: mkl
Default-Variant: openblas

While Scipy declared:

# Default build variant continues to bundle its own copy of OpenBlas and works with any NumPy
Provides-Variant: openblas
Provides-Variant: mkl
Requires-Dist-Variant: numpy(openblas); variant == openblas
Requires-Dist-Variant: numpy(mkl); variant == mkl

When it comes to combining build variants for optional features that aren’t mutually exclusive, I think it could reasonably be handled by allowing comma-separated lists anywhere a variant name is specified. When compared, these lists would be converted to sets first so the order didn’t matter (i.e. Provides-Variant: FeatureA,FeatureB would match a dependency declared as name(FeatureB,FeatureA)), and the check would be that the request is for a subset of the required features (i.e. Provides-Variant: FeatureA,FeatureB would match dependencies declared as name(FeatureA) and name(FeatureB) ). Combined variants would also satisfy variant == ... clauses for any of the features they contain. (Edit: clarified that the actual compatibility check would be for subsets, not equality)

However, if the distribution metadata doesn’t explicitly list a combination of features as supported, installation tools would assume those features are mutually exclusive.

Wheel metadata

Same fields as the source metadata, except:

Provides-Variant: only the variant matching the wheel is kept
Requires-Dist-Variant: any entry with a variant == ... clause that doesn’t match the wheel is omitted

Default-Variant is retained (if set) so name() dependencies against already installed packages can be checked by installers.

For backwards compatibility, wheels for the default variant would omit the variant portion of the wheel name (however that ends up being spelled).

Installation METADATA file

Matches the wheel metadata (and is the actual reason for defining the wheel metadata that way)

Installation process

If a dependency set ends up requiring a specific build variant of an already installed package, other than the one that was currently installed, the behaviour would depend on whether package upgrades/replacements were allowed or not (similar to what happens when an installed package doesn’t meet the determined version requirements).

If replacement is allowed, swap out the installed variant for a variant that satisfies the requirement set. If replacement is not allowed, fail the install and report the conflict.

ncoghlan · June 7, 2024, 3:31pm

This feels like a Do-What-I-Mean requirement, and I don’t think it’s feasible (at least in the general case). “Best” isn’t well-defined since different use cases will want to optimise for different things (e.g. if I’m building an environment that I intend to package up and deploy as a monolithic bundle to multiple machines with a variety of hardware, then I’m going to want lowest common denominator solutions that work on the widest range of systems, while if I’m running on a HPC cluster, then I’m probably going to want to wring every last ounce of performance out of that cluster as I can and tailor everything to my specific situation).

Once the base metadata system is able to at least express which build variants are compatible with which other build variants, then it may be possible to build dependency auto-selectors on top of that now well-defined compatibility metadata to optimise selections for various purposes.

msarahan · June 7, 2024, 3:33pm

There are at least 4 concrete use cases that I’m aware of, and they each have different ideal user interactions:

CPU features - as Oscar mentions, these exist happily with multiple values in one “context” - where the context is likely a (virtual) environment. The value for one package does not affect the needs of any other package. There may still be mutual exclusivity within classes of variants. For example, where both ARM CPU feature set version and x86_64 CPU feature set version aren’t fundamentally incompatible, it also wouldn’t make sense to have more than one CPU type in an environment.
GPU API/ABI version, such as CUDA version. This must have one value, or at least a range of compatible values, and there is a strong notion of incompatibility or exclusivity. If a package is already present in an environment with a particular value, then anything subsequent to be added needs to match, or be excluded, as Steve describes. As a side note, the notion of changing a variant value in an environment is conceptually easy, but in practice is extremely difficult with transitive dependencies. This is one of the worst user experiences in the Conda ecosystem.
GPU microarchitecture, which in CUDA-speak is the SM or arch version, sometimes also called “compute capability.” This is likely more constrained, in that a given SM or arch must be supported all the way throughout the software stack, and must match the hardware that is present. Updating hardware would likely break an environment with some value here, and a graceful migration path would be ideal.
Standard interfaces, such as OpenMP and MPI, where there are multiple implementation options. Although it is technically feasible to have multiple implementations in one environment, if the symbols are not mangled to avoid unexpectedly using a different implementation, then they can cause crashes and worse (unpredictable numerical results). For simplicity’s sake, these should be aligned and mutually exclusive within an environment. BLAS fits in here. BLAS is a standard interface, and although implementations have several places where they do their own thing in an isolated way, there are still standard interfaces that can collide.

I should mention that while it is possible to mix implementations of standard interfaces with name mangling, which was key to @njs’s pynativelib proposal, we should aim to avoid these mixes and promote a single variant within the bounds of a given process.

In my mind, this was the job of the package that computes custom tags. That idea has since been proven nonviable (using platform tags for the variant metadata and relying on the existing finder, anyway), I think, but the idea of putting this responsibility in the hands of the installable extension that handles the extra chunk of metadata still holds. In Steve’s words:

Something has to do that excluding, and in my mental model, that has been some (admittedly abstract, not well-formed) process that comes prior to the finder, which can use the existing resolver code, but fed by variant values and constraints instead.

The installer doesn’t do the optimal thing automatically - it provides a framework for how extensions can express exclusivity and priority, and then it gives the package builder and package consumer ways to use those extensions to express what they provide/desire. Packages don’t provide arbitrary variants, they use arbitrary variant packages (which is kind of splitting hairs, but it’s about defining variants as behaviors that live outside of a single package that use them).

The fly in the ointment is that the package builder/user must somehow manually specify/add extensions, and the user experience here seems invariably suboptimal.

@pf_moore you keep on wanting a step-by-step worked example. Is that something you want to see in code? I can talk through any (or all) of the 4 examples above in greater detail. I think this discussion board is probably not the right medium for it, but I will start up either different threads here, or a Google doc, or a github repo to work through questions and answers. The streaming conversations here are just very poor for iteratively revising a model in place.

dhellmann · June 7, 2024, 3:43pm

The design is only going to be future-proof if variants can be arbitrarily defined by package publishers.

It would be the responsibility of anyone defining a variant to provide a tool for determining the value of the selector value and setting that as a selection-time dependency for the target package.

I’ve proposed that selector values should be variables that can be used in the same expression syntax as requirements markers, and that we use that syntax for defining the rules for choosing a variant. There has been more exploration of using the tag system, instead, but I think the number of combinations of tags is going to make that untenable. Using tags also means that plugins modifying the list of tags supported by a system have to somehow cooperate for prioritization. Using variables and rules lets the package publisher provide that prioritization on a per-package basis, based on what they know about how they’ve built the wheels.

So, the PyTorch community might define a variant for “hardware_accelerator” and publish packages with “hardware_accelerator=cuda” and “hardware_accelerator=rocm” (these are over-simplified examples). They would then be responsible for building something that pip, uv, etc. could install (or require to be pre-installed) to answer “what is the value of ‘accelerator’ for the current environment?”.

We also know we have cases where the user needs to provide that value directly (lock files, build systems, etc.). When the value is provided directly, its value should override any discovered value (discovery could probably be skipped entirely, but it’s not clear if we can link the selector package requirement to specific variables).

Paul has expressed a desire that the tools be separate executables, and that they run in an isolated environment. I wonder if it would be easier if we required them to be pre-installed, and the installer exited with an error if it couldn’t find the tool? It’s less fully automatic, but maybe it’s an early version of the implementation?

Oscar’s original proposal had some more detail about how to prioritize for cases where multiple matches happen (as is the case with CUDA versions, for example).

jamestwebber · June 7, 2024, 3:50pm

This seems like the best option if it can work. I could also imagine a centralized registry for variants would work–it would protect against typos, merge redundant names, etc. But then someone has to run it. That situation is reminiscent of PEP 725 external dependencies (which has a lot of overlap with this discussion).

dhellmann · June 7, 2024, 3:51pm

Yeah, unfortunately I don’t think a central registry scales (to size or over time) because of the need for moderators to manage it.

dhellmann · June 7, 2024, 4:04pm

The example you’ve given is for SciPy linking against a different underlying library. That’s one axis of variation. In a lot of the other cases, though, variants will have multiple reasons for being different from each other (GPU type & version as well as CPU optimizations, for example). So to select the right one, the tools need to know both of those values separately, and to pick the variant for which both values of a specific distribution match the current system.

Maybe that looks something like

NumPy:

Provides-Variant: math_library=openblas
Provides-Variant: math_library=mkl
Default-Variant: math_library=openblas

and SciPy

# Default build variant continues to bundle its own copy of OpenBlas and works with any NumPy
Provides-Variant: math_library=openblas
Provides-Variant: math_library=mkl
Requires-Dist-Variant: numpy(math_library=openblas); math_library == openblas
Requires-Dist-Variant: numpy(math_library=mkl); math_library == mkl

and for something like torch, which has multiple axes of variation

Provides-Variant: accelerator=cuda cpu_optimization=v1
Provides-Variant: accelerator=cuda cpu_optimization=v2
Provides-Variant: accelerator=cuda cpu_optimization=v3
Provides-Variant: accelerator=rocm cpu_optimization=v1
...
Provides-Variant: accelerator=gaudi cpu_optimization=v1
...
Provides-Variant: accelerator=none cpu_optimization=v1
Default-Variant: accelerator=none cpu_optimization=v1

ncoghlan · June 7, 2024, 4:23pm

Do the axes of variation actually need names in the general case, though?

For projects with only one axis of variation that they care about, they can use those values as build variants directly.

Higher level projects with multiple axes of variation could just prefix each one with a common string (e.g. numpy_openblas, numpy_mkl, cpu_avx, cpu_avx2) and declare the valid combinations.

When a particular feature is a superset of another, that could be declared in the publishing metadata as:

Provides-Variant: feature_baseline
Provides-Variant: feature_baseline,feature_extended

Note that I’m not disputing the existence of multiple axes of variation, I’m just wondering if they need dedicated syntax, or if emergent use of common prefixes on variant names will be sufficient.

msarahan · June 7, 2024, 4:46pm

This is somewhat headed down the train of thought that Steve had of just lumping new variants in with existing platform tags. In that thought, I was inclined to obey some kind of systematic order in forming the extended platform tags, such that the existing filtering by tag ordering would still be useful (and presumably, fast). Given the information that Paul has provided on there being no recursion to alternate variant selection in the case of a failed resolver step, I have thought that we need to avoid such a composite take on things, and instead treat variant values as individuals in an optimizable set of values.

I think the emergent use of common prefixes on variant names you propose may be sufficient, but I think treating the problem more generally as an optimization problem will be simpler by keeping them separate.

dhellmann · June 7, 2024, 5:37pm

The different values are orthogonal. Combining them into arbitrary strings binds them together in a way that either needs a tightly defined standard (like wheel filenames) or else every package author is going have to publish their own unique selector package, so they can own all of the definitions of the variants.

If we use separate names for the unique values, (edit, typo) making the expression parseable, then the code that runs locally to identify what GPU is present and the code that runs locally to identify what level of CPU optimization is supported can be completely independent. Ideally, the ecosystem would end up with one “GPU identifier” selector package and it would be reused by publishers, as would the “CPU optimization” selector package. That’s not a guaranteed thing, but it would be nice if it was possible.

(edit, adding example)

Suppose we have 3 packages, each with compiled C extensions. Those extensions might be compiled for CPU optimization and for hardware acceleration like GPUs. Package A only supports CPU optimization, so it expresses variants using that value. Package B only support GPU accelerators, so it expresses variants using that value. Package C supports both, and expresses both.

If the variants are arbitrary strings, then each publisher needs their own selector package to express only the axis/axes of variation for their package. If the variants are described as an expression made up of named values, the ecosystem as a whole could exist with only 2 selector packages, one for each axis, and all 3 publishers could reuse it.

Now, perhaps we would end up with 1 selector package that knew about both CPUs and GPUs and it could express the combination of those tags. Then add another package like SciPy that needs the math library axis. That means 2 (or more) variations of each (or some subset) of the existing tags from that selector package with the accelerator. So, combining them also introduces the combinatorial effect that trying to add variants to the existing tags structure would have.

Keeping each value standalone in the variant defintion avoids both issues.

dhellmann · June 7, 2024, 5:58pm

Alyssa Coghlan:

dhellmann:

We want the best combination of packages selected based on compatible variants.

This feels like a Do-What-I-Mean requirement, and I don’t think it’s feasible (at least in the general case). “Best” isn’t well-defined since different use cases will want to optimise for different things (e.g. if I’m building an environment that I intend to package up and deploy as a monolithic bundle to multiple machines with a variety of hardware, then I’m going to want lowest common denominator solutions that work on the widest range of systems, while if I’m running on a HPC cluster, then I’m probably going to want to wring every last ounce of performance out of that cluster as I can and tailor everything to my specific situation).

Once the base metadata system is able to at least express which build variants are compatible with which other build variants, then it may be possible to build dependency auto-selectors on top of that now well-defined compatibility metadata to optimise selections for various purposes.

I think of it more as, “do your best to pick the right things” requirement. I agree the problem space is hard. I don’t believe we have to solve all of it to make useful improvements.

We definitely need the escape hatch of the user explicitly specifying information to override dynamically determined values, up to picking the exact file to install. If we assume that we have the escape hatch, and focus on solving for the basic case that was originally described, of someone just wanting to install one of these libraries or applications, we could make a huge improvement over what we have today.

ncoghlan · June 7, 2024, 6:25pm

OK, I can see that. Building on the design sketch I posted earlier, I’d suggest allowing each project to have a default anonymous axis of variation, while also allowing the “category=variant” syntax.

In environment markers, “variant == …” would cover the default variant axis, while “variant.category == …” would cover named axes.

pf_moore · June 7, 2024, 6:55pm

I think the key thing for me is that the various proposals or ideas always seem to gloss over critical details, and it’s making it difficult (for me, at least) to get a good feel as to how things would work in practice. So we’re ending up with a variety of similar but distinct proposals, all with different gaps.

You explained the 4 use cases you know of, and that helped, but if I try to work through in my head how any of the proposals would handle all 4 of these, things start falling apart. For example, we got quite deep into the numpy/scipy/BLAS scenario a few messages back, but if I look at what was being said there, I can’t see how any of it would work for a CPU feature example, where there’s no need for tight coupling, and in many (most) cases, any variant would be fine, it’s just a matter of getting the one that’s best optimised for my system.

So what I’m really looking for (and maybe the problem is that it’s too soon to have this) is for a proposal to be described in sufficient detail that it covers which parts of the machinery it provides are needed for each of the 4 cases you mention, and for the practical issues like “who writes the code to detect the CUDA version, how does that code get onto the end user’s PC, what happens if it’s not there, etc, etc” are explained, and not just left as background that’s to be filled in later (because I fear that when “later” arrives, we’ll be committed to a proposal and finding out that it’s unworkable at that point would be problematic).

What I’d actually like is for someone to propose something (and to an extent, I don’t care what) in sufficient detail that we can debate implementation details and difficulties. Yes, I pre-emptively stated that I don’t like an in-process hook that requires isolation the way PEP 517 hooks do (because experience implementing PEP 517 showed that it was harder than we’d thought). I’m certainly not insisting on hooks as executables - that has its own problems around how we pass structured data, and how we locate hooks. I think what’s important is that we get a sufficiently detailed proposal on the table to allow us to point out issues and pitfalls without that immediately triggering a re-think of the approach. It might be too early for that level of detail, but conversely I’m not sure that glossing over the details is productive at this point either.

This is a perfect example of what I mean by “provide a worked example”. Can you show what tools that determine the value of selectors for CPU features, GPU ABI, GPU microarchitecture, and OpenMP/MPI interfaces would look like? To the level of pseudo-code and application/library structure at least? Those are real examples of use cases, and someone is going to need to be able to write the selectors - if the people writing the proposal can’t do so, what hope do 3rd party developers have?

The way I’m hearing things, variants aren’t owned by individual projects, they are a shared context that exists independently of the projects that rely on them. If anyone “owns” the variant name, it’s the maintainers of the tool that detects if that variant is present. Whether we have a central registry, or an organically developed set of well-known names, isn’t that important. What is important is that there needs to be a clearly identifiable authority for what the variant “openblas” means. Maybe a project like numpy acts as its own “variant issuing authority”^[1] but there still has to be one owner for a name.

to be excessively grand about it ↩︎

dhellmann · June 7, 2024, 7:47pm

What is the benefit of codifying ‘variant’ as a special name? How does the installer decide what the value of ‘variant’ is? Remember, for most cases we don’t want the user to have to tell the installer, we want it to be a discoverable thing. How does the installer know what selector package to use to do that discovery?

Does the dotted syntax for the category names make parsing easier somehow?

I’ve been assuming (what may be a rather naive implementation) model where the installer installs all selector packages required by the target package, then invokes them all to get all of the values they provide (possibly more than one per selector package), and then all of those values are fed into the marker expression evaluation logic to evaluate rules in order. When a rule evaluates to true, that file would be selected.

dhellmann · June 7, 2024, 8:06pm

For the accelerator detection, the code would have to iterate over the list of devices visible to the OS, identify any it knows about, and then turn those into the defined values for the variable. I’m not actually a hardware expert, so I don’t know how that would definitely be implemented. But I do know there are ways to ask an OS what devices it has, and I know that it’s possible to write code to do that because if you install the wrong one of these optimized wheels you get an error about a missing device. I trust that the details are solvable.

For CPU optimization features, it would do something similar. Look at what CPU it’s running on and use a table built into the code (or try to use some instructions to see what it’s capable of) and map the results to the well-defined strings that the author of the selector package documents for wheel publishers to use.

Library compatibility detection between 2 packages is harder. I could see the selector examining what’s already installed, but that would be more challenging if it runs in an isolated environments.

Yes, this. Again, ideally that selector package/tool could be used by multiple “real” packages, to avoid re-implementation. In practice, we’ll have to see if people are willing to do that.

The author of the target package will specify which selector to use (whoever its author is), and that choice couples their package to the metadata produced by the selector package. So there is no need for a central authority, or large-scale agreement on names. There only needs to be an agreement between the author of the target package and the author of the selector package.

If the community creates 50 different packages for detecting which CPU optimizations are used, that shouldn’t confuse anything because only 1 will be specified as a selector package for any given target package, and therefore only the metadata from that single tool will be used to choose the wheel for the target package.

(edit for an additional example) Think of this like the build backends. Each tool does similar things but takes inputs in different ways. It’s confusing for package authors to have to choose one, but once they do, they just have to follow the instructions for that tool to get their source turned into a wheel. The end user of the package doesn’t have to care about that at all, though, only those of us trying to rebuild them.

pf_moore · June 7, 2024, 8:40pm

So the variable has multiple values? Or the detector returns a list of compatible selectors? I wasn’t so much thinking about the technical details of detection (as you say, those are likely solveable, although may be difficult in pure Python), more about the API of the selector, how it’s called, etc.

This is precisely the sort of detail I was unclear about - does the selector require access to the environment where things are being installed? If so, does it need to be installed in that environment itself (which has some pretty messy implications if the selector has dependencies of its own) or can it be installed in an isolated environment and then query the target environment? Does it matter what order things get installed, and is it necessary to re-query every time (so that when installing numpy and then scipy, scipy can see what numpy variant is installed^[1])?

How would a selector handle changing the variant names it managed, in a backward compatible way? To give a somewhat frivolous example, if Intel rebranded MKL as AILib () would it be possible to rename the variant from mkl to ailib? I suspect this isn’t any worse in practice than renaming or forking a popular package, but that is something that occasionally happens.

Thanks, that helps. Although people complaining about all the choices and confusion around build backends isn’t something I think we should be consciously emulating, so maybe we need to think a little about whether there’s a less confusing way? And (again somewhat like build backends) I imagine end users will get exposed to variants, like it or not - someone who can’t install a package because there’s no compatible variant is likely to be frustrated if they can see that wheels exist, and weren’t picked - we get that right now with packages that don’t have wheels for the user’s Python version, for example. And as a pip maintainer, I can easily imagine having to tell people whose packages won’t install that this isn’t an issue with pip, it’s an issue with the selector (maybe not detecting the user’s environment correctly) and they need to report the issue to the selector. That’s not the best UX (we see it a lot with build backend errors) so again it would be good to minimise the need for it.

Note that this is incompatible with how pip actually does installs at the moment ↩︎

dhellmann · June 7, 2024, 9:52pm

It is possible for multiple cards to support multiple different levels of optimization, yes. So the selector code may return an ordered set of values for a given variable, and the installer should use that order as a precedence preference. That way the selector can prefer the most optimized code, but if the packager didn’t supply that then a later value can provide a match using a less optimized version. The examples given elsewhere for both CUDA and CPU optimization levels would use this, for example.

Paul Moore:

dhellmann:

Library compatibility detection between 2 packages is harder. I could see the selector examining what’s already installed, but that would be more challenging if it runs in an isolated environments.

This is precisely the sort of detail I was unclear about - does the selector require access to the environment where things are being installed? If so, does it need to be installed in that environment itself (which has some pretty messy implications if the selector has dependencies of its own) or can it be installed in an isolated environment and then query the target environment? Does it matter what order things get installed, and is it necessary to re-query every time (so that when installing numpy and then scipy, scipy can see what numpy variant is installed[1])?

I think it would be possible for a selector installed anywhere (pre-installed system wide, installed by the the installer into an isolated environment, whatever) to probe the target environment, if it’s allowed access. In the case of library compatibility, for example, it could probe the .so file directly to find what it’s linked to.

Whether that’s a good idea is another question, and one more of policy, I think, than a technical decision. Maybe someone more familiar with the SciPy case can expand on how they think that could be made to work.

It matters if one package is installed before another, which was one of the use cases for this. If NumPy isn’t installed, I don’t know how the selector would indicate which version of the package should be selected.

Perhaps this library version compatibility case is a different class of variants? Maybe the user must always specify the backend? I’m definitely not as familiar with this case, so I don’t know what the best user experience is.

I think this would be handled in the same way as other requirements. Ideally the selector code would be backwards compatible and emit all of the various brand names. Package publishers would need to understand which names to use, and if they choose the new ones to ensure they specify a minimum version of the selector that can emit that name. If the selector author broke compatibility at some point by removing the old names, that would break installation of packages relying on the selector until they updated that minimum requirement and the metadata they use when publishing (or set a cap, I suppose).

I understand that it’s more likely to look like the installer is having the issue, but I don’t know how much a standard can do to eliminate that problem. Reporting detailed reasons for not selecting packages would help at least point to the cause of the problem not being the installer itself.

Paul Moore:

dhellmann:

Each tool does similar things but takes inputs in different ways. It’s confusing for package authors to have to choose one, but once they do, they just have to follow the instructions for that tool to get their source turned into a wheel. The end user of the package doesn’t have to care about that at all, though, only those of us trying to rebuild them.

Thanks, that helps. Although people complaining about all the choices and confusion around build backends isn’t something I think we should be consciously emulating, so maybe we need to think a little about whether there’s a less confusing way? And (again somewhat like build backends) I imagine end users will get exposed to variants, like it or not - someone who can’t install a package because there’s no compatible variant is likely to be frustrated if they can see that wheels exist, and weren’t picked - we get that right now with packages that don’t have wheels for the user’s Python version, for example. And as a pip maintainer, I can easily imagine having to tell people whose packages won’t install that this isn’t an issue with pip, it’s an issue with the selector (maybe not detecting the user’s environment correctly) and they need to report the issue to the selector. That’s not the best UX (we see it a lot with build backend errors) so again it would be good to minimise the need for it.

Sure, users will be exposed to them. Again, detailed reasons in output messages would help. Something like “File X was not selected because it does not have compatible settings accelerator=a cpu_optimization=b…”.

Maybe part of the selector API standard needs to be a way to ask for a message to tell the user for when package selection fails? That way pip, uv, etc. could add to generic error messages with helpful text written by someone who knows about the selector space and can include messages like “you might not have installed your device driver” or whatever.

Encouraging package authors to publish a “default” variant might help, too, since it would mean there would always be some option. Is there a way to make that a requirement as part of the standard? I know that could be made to work in some of the hardware selection cases, but I don’t know about that library compatibility case with SciPy.

And encouraging the ecosystem to share selector implementations, as much as possible, will go a long way because then there will be much more focus on making that code robust. My gut says some of the selectors based on hardware type are going to be complex enough to discourage casual re-implementation.