WheelNext & Wheel Variants: An update, and a request for feedback!

@jonathandekhtiar In reading through the PEP-to-be, this stands out:

Dependency Management

  • How should dependencies be expressed when a package depends on a variant of another package?

I am not understanding the intended relationship between package variants, the install-time matching mechanism, and the solver. These things are coupled.

The matching algorithm that the PEP describes says it’ll search for the “best” compatible variant available, and that it’ll fall back to, e.g., a CPU version if a GPU version is not available.

Doesn’t this assume that if I have several packages that support variants in my DAG, that any combination of variant configurations for those packages must compatible? And is that always true?

Say the CPU and GPU versions of some package are not binary compatible, and you have to use the CPU version of one with the CPU version of the other, and the GPU version of one with the GPU version of the other. If you have packages A → B, B releases a GPU variant, but A doesn’t have one yet (maybe b/c it first needs to be tested with C’s?), won’t the matching semantics described in the PEP give you a bad installation, with the CPU version of A depending on the GPU version of B? I would think you’d want to be able to specify to use CPU versions of both if GPU versions of both are not available. I could think of other combinations here that might fail if, say, wheels aren’t released with GPU variants in lockstep.

I think this kind of decision really needs to be made in the solver along with version selection, but that would mean you’d need to modify pip’s solver to handle variants, and you’d need to get the provider plugins up front to figure out your target arch (so the solver would know). The second problem seems easier than the first.

4 Likes

I’m afraid that if you’re looking for a 1:1 solution that would work for Python, there simply isn’t one. The Python packaging ecosystem is different than other ecosystem facing similar problems, notably because:

  1. It has very high stability / backwards compatibility expectations. Ideally, we are looking for a solution that would continue to work for years to come, with little changes required. What we were really trying to avoid is introducing a breaking change for a “local” solution that may turn out to be insufficient in 2-3 years, and require further breaking changes.
  2. It has very little centralization in terms of maintenance, with lots of stakeholders maintaining their own packages with little central coordination. This is very much unlike Linux distributions or conda-forge where generally others can update your packages when necessary.
  3. And finally, it also has very high compatibility requirements for old package versions. Unlike your average distribution which generally expects only a subset of recent package versions to work, on PyPI we generally want wheels published in the past to work too; and often with no ability to “fix” them the way, say, you could add a new build of an old package version to conda-forge.

What makes sense here is to evaluate existing solutions within their specific context and use case, and learn from their successes and deficiencies. However, in the end we are to some degree exploring new seas and we need to explore new ideas that are better suited to wheels and PyPI.

3 Likes

Some critical context that appears to be assumed in that doc is that conda will choose packages based on what’s already installed, whereas most installers in PyPI-land will try to change what’s already installed to match the best available package.[1] So it doesn’t (currently) quite work the same to tell PyPI-land installers “resolve against this imaginary __cuda package” because they’ll (currently) decide to upgrade __cuda :wink:


  1. Requires-Python is the closest thing available. ↩︎

There’s two things I don’t think this response adequately captures.

  1. Why do other ecosystems not need 15+ different axis while assuming there will be more in the future, requiring selectors.
  2. How does this impact other concerns beyond performance targets, such as security, or the ability for users to reason about this and troubleshoot when a download doesn’t go as they expect?

I can’t say I have answers to that that match why certain decisions were made, but I do think that this is an unreasonable amount of build variation to support in prebuilt binary packages, and that any gains from this rather than picking a few different support levels and commiting to updating those When there is demand for them are not worth the costs as presented.

If we take just the “there are at least 15 dimensions” bit at face value, and say that there are only 2 values or unused per dimension (which is far fewer than people have suggested), that’s still over 14million combinations at a worst case at the initial minimum supported dimensions.

This needs to be pared down, it’s not a reasonable explosion of potentially downloaded files, and this could be abused if someone uploaded one malicious binary release matching a potential victim, while all other files are built normally, and with the amount of build variation here, I don’t think the ecosystem would catch that happening.


It also has a cost on people’s expectations.

People have said things like “it isn’t download+unpack that’s changing, we’re just changing how wheels are selected”, but that misses the point. To download the wheel that is most appropriate, it now involves running code provided by the package to be downloaded, whether it is in that step itself or not. We are regressing on something wheels improved.

Currently, while restricting to wheels and excluding sdists, people have a reasonable assumption of where remote code may be executed and where it will not. The authors have presented this seemingly presuming that it is okay to change that, and that it should be opt-out, not opt-in. I cannot agree with this.

Both of these concerns are addressable if we can agree on a handful of community supported hardware target levels based on real hardware configurations that people have demand for, and commit to refreshing that set of configurations as common hardware changes. Falling back to the closest supported target while still allowing the user to build from source to maximize performance more closely aligns with other ecosystems, and there is likely a reason for that.

The cost of the combinations here is not just on those building and serving the artifacts.

9 Likes

@pitrou I wrote that too quickly and mixed things up, sorry about that. IIRC there was a distro handling this through multiarch but I cannot find it anymore (or I was just wrong). The actual thing I should have pointed at is glibc-hwcaps, which is newer (glibc >=2.33, 2021). That’s what some distros are using for package-level switching between psABI levels. See, e.g., here for OpenSUSE, here and here for Guix. It’s possible for Debian as well, see this overview and this blog post. @mgorny already posted how Gentoo does this kind of package-level switching. For Fedora it seems in the works, see this proposal. Fedora, RHEL 9 and SUSE also moved to x86-64-v2, dropping support for v1 CPUs completely. Ubuntu didn’t yet, but is experimenting.

I hope that clarifies the current status. I’ll edit my previous post to correct the mistake and link to this summary.

1 Like

There’s two major patterns that I’ve seen. One is the virtual package approach that @h-vetinari pointed to. This is the best example for hardware-based dispatch, because the virtual packages are immutable, and they rely on detection of a property that is external to the package manager. This detection is the part that requires execution of code that people are concerned about.

The other major pattern is that the user specifies some particular variant value, and that then becomes a constraining property of the environment. This is used when there’s more than one implementation choice for a package, such as MPI, where the environment should use only one implementation. This does not require execution of code, other than the user needing to specify the variant to the solver. The state is encoded in which packages are installed. Conda-forge has documentation for how MPI is implemented. It is a convention, not a standard, but it’s a long-established pattern that is a good reference.

As Steve pointed out:

This precludes or at least complicates the second major pattern. I don’t think __cuda is a concern, because it is not a package that PyPI-land installers would directly control. Anything like keeping MPI implementation would definitely require new behavior for PyPI-land installers. My humble opinion is that solving the environment completely, including constraints for all installed packages, would be helpful to users regardless.

One design that was discussed along the way was to not do install-time detection of “virtual packages” or equivalent. Instead, the user would manually choose to install some number of hardware detection plugins, and the user would manually run a tool that used plugins to detect, collect, and write the variant information to a file. The PyPI-land installer would then use this file to choose variant packages. This file would also be human-editable. I haven’t been following closely enough to know why this approach was discarded, but I think it is a larger ask of users than the dynamic approach. Again, user experience (minimizing necessary end-user action) vs (reasonable) concerns about code execution and principle of least surprise.

5 Likes

No single package needs 15 axes. The entire system may have that much variance, but any single package will be a much smaller subset. Conda has had this since 2017. It has not exploded. Most often, it is not a means of supporting several values of several dimensions, but rather to allow differentiation of packages that are built against different dependencies. For example, when the ecosystem is transitioning from one major version of a library to another, a variant allows “stable” environments to use the newer build with an older dependency, while also providing a path for cutting-edge environments. This hasn’t been a concern for many PyPI-land users because it’s mostly a problem with binary compatibility. The AI/ML/data science space feels this much more keenly than people who do not rely on compiled extensions. If you try to collapse some set of libraries or virtual packages or whatever (“a handful of community supported hardware target levels…”), you end up with something like manylinux. Manylinux isn’t bad, but it does not capture the inherent complexity of packaging. It requires hacks as discussed ad nauseam in prior DPO threads to shoehorn additional dimensionality. “Community supported hardware target levels” will never be complete and put the maintenance burden of defining those levels on someone like PyPA. It does not provide sufficient description flexibility to support the use cases that I’ve seen with Conda.

4 Likes

ISTM that these approaches are compatible, as long as we specify the client-side variant preference configuration file format. You’re an experienced user who knows exactly what you want in your environment? You can hand-write this config file and no variant plugins are needed.

You’re a beginner who really doesn’t understand the complex stack of software, just want to start your deep learning journey and trust your toolchain or environment to keep you safe? Let the installer run with its plugins to generate that variant description file automatically. Or run that script out of band to generate the file and the installer can just use it without having to run any variant plugins. I could imagine such a script could use PEP 723 inline metadata for an easy experience[1].


  1. and educational too, especially if that tool included some comments to tell you why it chose the variants it chose ↩︎

4 Likes

Yes, they are compatible. The sticking point is the question of whether the installer is allowed to run these without the user’s explicit approval.

4 Likes

Can I ask, what evidence would you want to see that would change your mind? Specifically, I’m wondering if there’s evidence (or elaborations RE why the current solutions are insufficient) that should be included in the PEP’s motivation section that you or other readers would find compelling.

4 Likes

This question concerns me. The selector process has been presented as only being involved in the step where the “correct” wheel for version X.Y of package A is picked. But I’m now wondering whether that’s too simplistic.

Suppose we request numpy and scipy. And suppose that there are BLAS and MKL[1] variants of both, and the BLAS and MKL variants are incompatible (so you can’t install BLAS numpy and MKL scipy). Now suppose scipy only publishes the BLAS version, but the MKL version is preferred if available. Let’s assume the user’s environment is compatible with both BLAS and MKL.

(Side note - if any of the above makes no sense for numpy/BLAS/MKL, imagine a different selector that works like this. I’m trying to keep the example from being too abstract, but I don’t want the underlying problem to be ignored because “MKL and BLAS don’t work like that”).

Now, the resolver comes along and selects numpy to consider first. It picks the MKL version of numpy. So far, so good. Now it looks at scipy, and finds that the only scipy wheel is a BLAS one. There’s nothing in the dependency metadata that says this isn’t compatible with the numpy version selected, so how is the resolver (as opposed to the wheel selector[2]) to know that there’s a problem here? And even if it does, what can it do? The resolver only has the ability to backtrack to an older version of numpy, not to try a different wheel.

And here’s a second potential issue. Suppose the user wants to install A==1.0 and B==2.0. A comes in BLAS and MKL variants. And suppose that the MKL variant depends on B==1.0, but the BLAS variant has no dependency on B. Once again the finder will pick the MKL A, and then report that no resolution is possible, because two conflicting versions of B have been requested. This again isn’t fixable without the resolver knowing about variants, and taking them into account in its backtracking process.

I should note with the second example that the standards allow two distinct wheels for the same version of a package to have different metadata (including dependencies). I believe that uv makes a simplifying assumption that this doesn’t happen in practice, and while I agree that’s a sensible engineering tradeoff for a tool to make, it’s not something we can do in a standard. So we need to consider how the second example should work, no matter how unlikely we think it might be in practice :slightly_frowning_face:

A final note here - I’m sensing a certain level of frustration from some of the proposal authors, that people are trying to find fault with the proposal. I hope that isn’t the case, but I can see how it might feel that way after all the work that’s been put into developing this. Please understand that I’m not trying to wreck the proposal here. Quite the opposite - my goal is to make sure that whatever standard we end up with is as robust and comprehensive as we can make it. But in order to do that, I want to ensure that all the possible edge cases have been considered and addressed - which means that I hope the proposal authors can take these issues seriously, and not simply dismiss them as “never going to happen in practice”[3].


  1. Excuse me if I’ve got the terminology wrong. ↩︎

  2. the finder, in pip’s terminology ↩︎

  3. If I had a penny for every issue that I’ve seen that was never going to happen in practice… ↩︎

10 Likes

I think they are not just compatible but necessarily go together. You can have explicit selection without automatic selection but I don’t think that you can have automatic selection without providing the option for explicit selection. Especially as noted above some choices like OpenBLAS vs MKL cannot be made automatically.

It seems to me that some way of selecting the variants explicitly is a prerequisite before having any automatic selection mechanism. Maybe it makes more sense to focus on getting the pieces needed for explicit selection first.

3 Likes

Do they need to be complete though? Realistically, if we said the dimensions we were adding on this were 3 GPU library versions for each of rocm and cuda, x86-64 v 1-4, and 4 blas implementations, and the logic for how to prefer them, wouldn’t just that matrix cover the vast majority of cases not currently covered? We can pretty reasonably help a large portion of users with much fewer costs to the ecosystem if we can limit the scope more.

I don’t think there’s any reasonable way to reconcile the “open index, open selectors, per package variants” version here with existing concerns and expectations.

With some of what has been presented about blas, and with packages depending on other packages, I’m not sure there’s a meaningful difference here. The entire system having it, and it being per package means either dependency hell, or requiring resolvers backtrack with this and rerun selectors as they eliminate options.

1 Like

Change my mind about what? That I’d be OK with the current state of affairs? Short of me needing to do far more advanced work than I currently do, I don’t see that happening.

I think you’re missing my point though. I’m saying that as a pip maintainer, my personal experience doesn’t give me a basis for deciding what makes a good UX for pip users who need this functionality. And I think the other pip maintainers are in a similar situation. That’s not a problem as such - we cannot expect maintainers of a tool as general as pip to understand every aspect of their users’ business[1]. But it’s why I want standards to be as explicit as possible over what’s needed - as a substitute for the experience the maintainer group(s) lack. Because my responsibility as a pip maintainer includes keeping the code base maintainable, and part of that is not including extra complexity when I don’t see a need for the functionality it introduces.

As the (potential[2]) PEP delegate, my position is different. In that context, I want the standards to say what’s needed because if everything is a tool choice, I foresee yet another endless UI debate on the pip tracker, with no usable functionality available to users for ages. And I don’t think that’s a good result for the ecosystem or the user community.

It’s hard juggling two roles when one of them forces me to be annoyed with myself in the other one :roll_eyes:


  1. Astral may be a special case - you have money :slightly_smiling_face: My point applies for volunteer projects, though. ↩︎

  2. everything could change if the Packaging Council proposal gets approved, of course! ↩︎

6 Likes

It isn’t too simplistic I think. To answer @tgamblin ‘s question: that is not a problem that we intend to solve here. I’m going to quote from way higher up to answer why:

Steve has got it right here I think.

To illustrate with the numpy-mkl + scipy-openblas example:

  • Both packages should depend on the same blas-provider that defines the blas property and mkl/openblas values for it,
  • If numpy and scipy are installed at the same time with a variant-aware installer (e.g., [uv] pip install numpy scipy), then independently they’ll end up with the same variant choice (both mkl or both openblas)
  • If either the installs are separate and in the meantime something changed (e.g., a new blas-provider was released which changes the priority around) or the user somehow forces choosing numpy-mkl and scipy-openblas, then that’s what that user gets in their environment.
    • That still does not break things for the BLAS example, those variants are not actually mutually exclusive (just suboptimal, just like today)
  • For another hypothetical case (a realistic example from anyone would be great) where variant builds are indeed mutually exclusive: the solution is to construct the environment at once, ideally declaratively, which will avoid the problem.
    • The conda ecosystem has that same problem (to a lesser extent). It was always possible to get a suboptimal solve when doing multiple sequential conda install xxx invocations; for a very long time the advice has to been to avoid that. At some point, environments degrade if you continue to install packages one by one - just don’t do that.

Taking the variant info of all already installed packages into account for resolving new variants has not been implemented, and isn’t seriously considered. It’s something that can be done at the installer level of course - a uv add xxx will resolve the whole environment I believe, and in some cases that may get one a more optimal solve. But it’s not necessary to improve on the status quo.

2 Likes

Thanks for the reply. I’m going to have to think about it some more, as I’m not yet completely convinced that it’s a problem that currently exists (and it’s a very different situation for the proposal to choose not to solve an existing problem, versus to not solve a problem that the proposal itself introduced). But I’ll come back with more detailed comments when I’ve thought things through some more.

I think I was focusing too much on my personal use cases here. To expand a little more, the evidence I’d want to see is issues raised by pip users, ideally on the pip tracker, that showed that missing functionality in pip in this area is making things harder for them. Right now, as far as I can see this is a non-issue on the pip tracker - the only thing that I can see which is even remotely related is the question of providing a way to prioritise indexes. That’s more general than this issue, but one of the cited use cases is more robust selection of the right index for torch packages. And the impression I get from the pip issue isn’t that the current approach of using an index per GPU configuration needs replacing, but rather that the UI around selecting the right index needs improvement.

My view is that we (the pip maintainers) already have far more feature requests than we have bandwidth to handle - not just to implement, but even to review PRs. So we need to be very careful to direct our efforts towards features that are the most important to our users. And there’s no suggestion at the moment that wheel variants (or any change to the status quo around fine grained binary choices) are a high priority to our users.

It’s entirely reasonable to argue that without a reasonable way to distribute multiple variants of wheels, pip users won’t have a need to install them. But that still means that we’d be directing effort at something that might be useful to our users in the future, rather than working on something that’s definitely of use to our users right now. And that’s a hard trade-off to justify. (To be fair, even contributing to this discussion is using time that could otherwise be spent on new pip features, so it’s not a black and white choice…)

1 Like

I wonder how much this is influenced by users just switching to other tools when they realize pip doesn’t do what they need. That’s how I switched to conda all those years ago. It didn’t occur to me to write an issue on the pip tracker to request that they solve a large ecosystem problem that wasn’t under their control.

11 Likes

This seems like it’s kinda broken? The world already exists such that you can have multiple wheels for the same version be compatible with the target environment, and each of those wheels can have different dependencies.

I may be missing something here, but how do you guarantee that they will do this, unless all simultaneously installed packages are released in lockstep, with all possible variant values, without removing or adding any values?

Suppose above that scipy decides to remove MKL support and makes a release, independent of numpy. The user specifies that their favorite BLAS is MKL, but if uv pip install numpy scipy selects variants independently, numpy will come back with MKL enabled and scipy will come back with openblas, right? How do you deal with variant values (or variants) being added or removed over time? We do see issues like this in Spack, but we can specify these types of dependencies.

Put differently – I’m not sure this is an issue only when considering installed dependencies.

1 Like

Let me prefix this by saying I’m also not trying to wreck this PEP, but I have a few more concerns based on how our packages have evolved in Spack.

The main one is here:

BLAS support isn’t really a property, it’s a dependency. You could argue it’s an option on a package (enable blas, or +blas as we’d say in spack world) plus a link-time dependency that’s required when the option is enabled. In Spack we define both – one’s a package in the DAG and the other is a variant on the dependent(s) of that package. So if you ask for foo +blas the solver picks a BLAS for you. If you ask for foo +blas ^intel-mkl, you get intel-mkl and so must everyone else.

It’s that link dependency that forces unification for us – the solver enforces that there is one version of any given package (or virtual package, like blas) in any given runtime graph (basically your transitive link/run dependencies, excluding pure build dependencies). Put differently, we ensure that there is only one configuration of any link dependency for any packages that might end up using it in the same process, and we ensure that the packages that need it are built against it.

Unification is key for C++ and many other compiled languages– it’s what keeps you from violating the one definition rule, and it’s what keeps your ABI consistent.

In Spack for a very long time (well, until like a month ago), compilers were attributes on nodes. i.e., a node would say that it was compiled with gcc@13.0.1, and that was all the metadata we had about that. We had heuristics that would try to match compilers across nodes to make things consistent, but there were all sorts of corner cases. Users would try to mix gcc and intel compilers, something you often want to do, but we couldn’t ensure that intel@x.y.z and gcc@a.b.cwere using the same runtime library, and we couldn’t unify the runtime library across all nodes in the same process. Runtime libraries are unified dependencies, which have different semantics from non-unified node attributes.

To fix this has been a lot of work over many years, and we finally merged a solver that could do it earlier this year. The compiler (or really any package) can say at solve time what runtime libraries need to be linked with its dependents, based on the languages the compiler provides to its dependent. I think we get this right (finally) because it abstracts both the attributes (“cxx or fortran is enabled on this package” – which model the undefined symbols / ABI calls needed by a package) and the dependencies that could satisfy them (which could be different libraries that need to be unified across different packages).

All this is to say that there are a lot of holes when modeling ABIs/runtime libraries/etc. as attributes. You need some way to unify dependencies, and you likely can’t cram all that into attributes.

Maybe for CUDA this stuff works ok – there are a lot of steps NVIDIA has taken to ensure compatibility across compute capabilities. I do not think the same can be said for ROCm at this point. And BLAS implementations are notoriously bad for this.

I do not disagree that this proposal would improve the status quo a lot. What I worry about is whether this will require a huge breaking change to fix in the long run for backward compatibility, when users do want this stuff handled more automatically. It did for us, and I think it was worth it, but our users will put up with more than the typical pip user. I think you should consider how variants would evolve (especially as hardware, particularly AI hardware becomes more diverse) and what it will take to handle the next bit of complexity after more packages start using wheel variants. I am not convinced that attributes alone can manage this in the limit, without a much more sophisticated solver.

6 Likes