WheelNext & Wheel Variants: An update, and a request for feedback!

tgamblin · August 18, 2025, 10:48pm

This view concerns me, as I think the issue this PEP is trying to solve is fundamental to future hardware needs of AI/data analysis/modsim/etc. users. That’s a set of users that is rapidly growing, and fixing this problem would be transformative for them. The problem has been a fundamental issue for HPC for a long time, and HPC has always been easy to dismiss as a niche, but it’s real this time – AI is HPC and hardware actually matters.

I like to think that something could just come along and replace pip and solve this problem, and maybe uv or spack or something could do that, but on the Spack side at least we cannot keep up with the pace of package development in Python. To scale, we will likely need to rely on PyPI metadata in the long run. That metadata comes right from the package developers. We can (clearly) make tools that replace pip but it is much harder to replace the metadata that pip uses. If the metadata lacks the information needed to pick the most optimized versions of packages, other tools (not just pip) are going to have a hard time meeting users’ needs.

Hardware is not getting any less diverse – AMD GPUs are coming on the scene and there are zillions of accelerators that also need optimized libraries to work… so I don’t see the combinations decreasing. You’ve also got x86_64, ARM, and RISC-V on the horizon, so there are CPU dimensions to choose from as well. With Moore’s law on the way out, specialization (in hardware) is going to be how people get performance.

So, I think it’s worth some short-term pain to address longstanding tech debt, especially when interest rates (on hardware support) are going up. Without this, things are going to be much more painful for users in the future, enough to outweigh the benefits of a lot of smaller short-term improvements. This is really a strategic decision, not a tactical one.

charliermarsh · August 18, 2025, 10:54pm

pf_moore:

I think I was focusing too much on my personal use cases here. To expand a little more, the evidence I’d want to see is issues raised by pip users, ideally on the pip tracker, that showed that missing functionality in pip in this area is making things harder for them. Right now, as far as I can see this is a non-issue on the pip tracker - the only thing that I can see which is even remotely related is the question of providing a way to prioritise indexes. That’s more general than this issue, but one of the cited use cases is more robust selection of the right index for torch packages. And the impression I get from the pip issue isn’t that the current approach of using an index per GPU configuration needs replacing, but rather that the UI around selecting the right index needs improvement.

Thanks, that’s helpful, though I’m wondering if it’s unproductive, then, for me to try and argue that this is a problem, and to explain why (in my opinion) “selecting the right index” doesn’t solve it. As a different tack, we get a lot of issues in uv that ultimately derive from these problems, and those issues are not specific to uv, and would also affect pip users. If that sounds relevant, we could compile a list.

tgamblin · August 18, 2025, 11:03pm

There are constant issues in Spack around this set of problems, on both the source configuration and binary installation sides. As we move to make binaries “more” default there are going to be more. We introduced ABI substitution (splicing) (see here and stay tuned for the arXiv version) in 0.23 and plan to use it more. MPI will soon have an ABI and I suspect people will start to want to package MPI things in Python.

So, we could try to compile a list as well.

willingc · August 18, 2025, 11:23pm

Thanks @tgamblin for repeating this.

Meeting the needs of science, data science, and AI is “not a nice-to-have”; it’s a “must-have” for Python to continue to grow and evolve. If Python fails to address these fundamental needs, we will see greater attrition to Rust.

Does that mean that this solution is perfect? No, it’s not, but right now it’s the best solution on the table. A cross-section of our community has put great thought and effort into an approach and prototype. I recognize that readers will need some time to digest this information since it is new to many in this thread.

@pf_moore @dstufft and others, you have done a great labor of “love/necessity” in maintaining pip. I am incredibly grateful for the 75% of my work that I can get done using pip. Yet, there is still 25% of the time where I must reach for tools other than pip.

From the bottom of my heart, the past 2 years have given me hope that we can create packaging standards that go beyond being tool-centric.

I encourage everyone to keep finding common ground and iterating toward something better than the “status quo”.

dstufft · August 19, 2025, 1:34am

I’ve barely been involved in pip’s development in a long time now, others have done a lot more there than me

I do agree though ^[1] though that we need to figure out a solution to this, but I doubt it’s something that’s likely going to be surfaced in the pip issue tracker, because I don’t think it’s a pip issue really, and I think the lack of a solution is creating issues throughout the ecosystem.

One part of that is the proliferation of “variant” packages on PyPI. If we look at the PyPI stats page ^[2], we can see a lot of -gpu, -cpu, -cuNN, -rocm, -intel, etc. This not only makes PyPI harder to use because the same project is being spread over several projects based entirely on compiler flags ^[3], but it trains users to expect things that make certain forms of typo squatting easier. It’s becoming normal for projects to have a bunch of adhoc-ly named clones of themselves on PyPI, which makes it easier for an attacker to go publish something like tensorflow-cu14 and appear to be legitimate ^[4].

It also makes more work for PyPI, as each new variant of these projects often end up needing to have it’s limits increased to match the previous variants limits.

I was curious, so I did a google search for pip gpu site:reddit.com, and these are some of the results I got.

User who can’t get a package to compile against CUDA and attempting to use the prebuilt binaries always uses the cpu. (link)
A user who couldn’t get GPU support enabled for llama-cpp-python, and one of the commenters indicate that they dropped Python completely due to how painful it was to get it installed. (link)
A user who is using the custom indexes, who can’t seem to get the GPU enabled version of PyTorch to install at all. (link)
A user that wrote a guide to actually getting all of this installed, including a script he runs on venv activation just to make sure GPU support didn’t accidentally get uninstalled (and includes pip install to make it work). (link)
A user that has spent 5 hours trying to get things working. This wasn’t solely Python packaging issues, but was partially caused by that. This includes several people commenting that they use docker containers to completely avoid having to figure out how to install this software in a way that works. (link)
Another user, whom I’m not entirely sure what the problem actually ended up being, but mentioned that this was the most annoying library to try and install. (link)
Another user asking why installing tensorflow is so hard, mentions that pip “worked” but GPU support wasn’t enabled. Several commenters indicating they either used docker to side step it or complaining getting everything setup and working is difficult. (link)
Someone posting how they got their tensorflow to have GPU support, and commenter mentions that their post helped them after 10 hours of trying to figure it out. (link)
Someone who is very angry about how hard it is to install all of these things, and listed a lot of the problems he had (not all of which are Python packaging). (link)

Some quotes:
- is it “sageattion” is it “sageattn_qk_int8_pv_fp8_cuda” is it “sageattn_qk_int8_pv_fp16_cuda”? etc.
- Now you need to get WHEELS and install them manually
- Cuda and PyTorch is absolute bananas.
- The only sane way is to use docker
- I only got Triton to work by manually downloading and installing pre-compiled wheel.
- For torch, I keep few different .whl versions of it on drive. [describes bypassing repositories and using these downloaded wheels directly].
- “Fucking hell.” Perfectly sums up my experience with python and torch as well.

At this point I stopped looking, but I think it’s fair to say people are struggling with the current situation, and those people are pip’s users as well. I think they don’t typically associate it as a problem with pip, but as a problem with the given library they’re trying to install since pip generally works fine for them for every other kind of library they’re trying to install.

And as I said earlier, I am employed by NVIDIA, but this is my personal opinion. ↩︎
Chosen because a lot of these packages are also the biggest packages, so it makes it easier to find them. ↩︎
Or what is effectively compiler flags ↩︎
Of course this was always something that they could do, but it used to be “weird” to do that, so it looked unexpected, but now we’re training people that it’s normal and expected. ↩︎

rgommers · August 19, 2025, 2:22am

@tgamblin I understand exactly what you’re asking and why, and could write an essay-length response, but I’m afraid that it’s going to raise even more questions from others who aren’t as deep into this stuff as you are. So I’ll keep it to some essential points here; we may want to jump on a call to talk through some of this and how this will relate to Spack possibly consuming wheels in the future, and if and how that may need to be addressed in design and/or PEP, or summarized for the audience on this thread.

Re BLAS: please have a look at the first figure on BLAS, LAPACK and OpenMP - pypackaging-native . The “wheels vendor a BLAS library” isn’t assumed/required to change with wheel variants (it could, but unvendoring is an orthogonal topic/choice to which BLAS library is used at build time). To do that, we’d indeed either need a BLAS package which acts as a mutex somehow that guarantees uniqueness (like conda-forge), or direct support in the resolver for achieving the same (like Spack’s +blas), rather than only a variant property. In this design/PEP, we definitely cannot say or prescribe anything for resolvers in this regard. And a package can be written, but there’s no mutex concept nor an OR operator in the grammar for dependencies to express something like blas_impl_mkl OR blas_impl_openblas OR blas_impl_accelerate).

Also note that we actually do have OpenBLAS wheels (scipy-openblas32/scipy-openblas64 packages) on PyPI. But we’re not using them as runtime dependencies, yet at least, only as a distribution mechanism to consume as a build-dependency. That is an orthogonal topic to wheel variants. I’d really rather not dive deeper into that here - that gets us back to Enforcing consistent metadata for packages .

ABI consistency is a broader example of the need for uniqueness, but otherwise similar to BLAS. We don’t tackle that, and my take is that it’s inherently unsolvable since that’s just not how PyPI and wheels work, see PyPI's author-led social model and its limitations - pypackaging-native . Spack, Conda, Linux distros, etc. all solve this problem in very different ways but do all end up with a consistent set of packages leaning on shared libraries. PyPI/wheels do not, they “solve” it by hiding the ABI and vendoring the shared libraries. Trying to change that requires things like mutable metadata, being able to rebuild packages in a centralized fashion, and other such topics that are each individually very hard and/or already vetoed by this community multiple times in the past (often for good, PyPI-specific, reasons).

mgorny · August 19, 2025, 3:44am

To be honest, I don’t really see why it would be. Users who are less versed in the ways of Python packaging are more likely to seek help on PyTorch support forums, etc. — after all, they wanted to install PyTorch, and they’re having a problem with that. Users who are more versed realize this is an ecosystem problem, not a pip problem.

mgorny · August 19, 2025, 3:48am

This also touches on my pet peeve: it further encourages publishing multiple packages that install the same files and overwrite one another. This is bad for security (think malicious leaf packages overwriting commonly used dependencies), and usually for UX too (installed files depending on installation order). And it seems to have reached the “normal way of doing things“ status on PyPI.

pf_moore · August 19, 2025, 7:25am

I have a feeling that my comments are being taken out of context here. It’s important to realise that the sub-thread that started this was based on my insistence that the PEPs need to specify required installer behaviour, and not leave it to tools to decide how to support an abstract wheel variant mechanism. I was explaining why, as a pip maintainer, I don’t feel qualified to make such decisions.

In the ecosystem context, I 100% understand that this is an important problem that needs to be solved. And with my PEP delegate hat on, I want to see it solved. But I’m strongly of the belief that in order to solve it, we have to state explicitly what the user experience will be for all installation scenarios, regardless of which tool the user is using.

willingc · August 19, 2025, 4:21pm

In the ecosystem context, I 100% understand that this is an important problem that needs to be solved. And with my PEP delegate hat on, I want to see it solved. But I’m strongly of the belief that in order to solve it, we have to state explicitly what the user experience will be for all installation scenarios, regardless of which tool the user is using.

Thanks for clarifying your agreement that this is a significant problem. On reading the comments, I felt that the importance was being questioned, not just by you but by others as well. I’m glad that we have common ground here.

Do we have to state the user experience for all installation scenarios explicitly? Or is it sufficient to state what the expected result should be if the standard is followed?

I have a feeling that my comments are being taken out of context here. It’s important to realise that the sub-thread that started this was based on my insistence that the PEPs need to specify required installer behaviour, and not leave it to tools to decide how to support an abstract wheel variant mechanism. I was explaining why, as a pip maintainer, I don’t feel qualified to make such decisions.

Thanks for explaining this vital point that, as a pip maintainer, you don’t feel qualified to make decisions about how pip would support an abstract wheel variant mechanism. If I’m understanding your words correctly (writing is more challenging than speaking in person): we, the packaging community, need to provide better rules for “if an abstract wheel variant is used, then what is the expected result (independent of which tool is used)”.

I feel this thread may be overfocusing on the “how” at the expense of building more common understanding of the “what”.

Thanks @pf_moore for sharing your thoughts. It definitely helped me toward understanding your perspective.

Liz · August 19, 2025, 4:59pm

Not for nothing, but if this behavior is standardized, rather than left up to per tool behavior, it significantly helps address my concerns about user expectations and ensuring existing use cases remain valid, but it would require continuing to value things people have put a lot of effort into.

There are at least 3 things in opposition to each other here.

Some people want this to be automatic.
Some people want static resolution for multiple reasons.
The same people that want it to be automatic don’t seem open to standardizing enough static info to preserve use cases other than theirs.

There’s really not much room to bend on static resolution. It either is or isn’t static, and it being static or not has a direct impact on both security posture and methods for reproducibility. Whether people think this is only for the “high security” use case or not, it does also impact tools meant for static analysis because it changes where arbitrary execution happens.

If it’s expected that variant selectors must have stable behavior when run on the same host system, then there is no reason this needs to involve downloading code to happen other than trading convenience for some use cases against reproducibility and security concerns. All of the characteristics that variants are allowed to select using should be possible to standardize.

It’s also not possible to be sure that users are aware their tools are changing ahead of time, and even if some are, how does someone write a script that works today and on whatever future version they need different behavior? Current tools reject flags they don’t know about, and the user would need to know the flags in advance.

I’m fine with it being automatic and static.
I’m fine with it being opt-in, and require agreeing to run a selector, otherwise requiring the user specify (The default without either opt-in or specifying being to error informing the user of what they need to do.)
I’m not fine with it being automatic and running a selector automatically.

It seems like standardizing on what people can select on is the easiest way to still solve the hardware specialization use case, while also continuing to acknowledge existing use cases, but there are other ways to preserve existing use cases.

ncoghlan · August 19, 2025, 7:34pm

This reminded me of one of the main answers to the question “Why introduce wheel variants as a new concept, instead of publishing multiple projects?”

One of the constraints on wheel variants is that unlike separately named projects (and even formally unlike wheels for different platforms), variants of a wheel are all required to declare the same dependencies.

steve.dower · August 19, 2025, 7:39pm

This sounds like a restriction we’d come to regret, or more likely, a number of publishers will force themselves to work around it and we’ll never hear about their issues until it’s so embedded that it can’t be changed.

It seems pretty obvious that some variants are going to need different dependencies? This limitation would make unbundling OpenBLAS from numpy/scipy probably impossible (or highly complicated, as I mentioned).

notatallshaw · August 19, 2025, 7:57pm

I’m pretty sure universal resolvers are going to require that the wheels variants have the same dependencies for them to work well.

Currently for different platforms universal resolvers require the wheels to use environment markers to express when different requirements need different dependencies to work well with universal resolution.

I assume the same will be true for variants, and environment markers appear to be part of the propsal: Wheel Variant Support - WheelNext

steve.dower · August 19, 2025, 8:15pm

An environment marker that can’t be resolved until a variant is selected may as well be independent metadata.

And an environment marker that can be resolved prior to selecting a variant means the installer must have prior knowledge of all possible variant dimensions and values.

I think we’ll regret the first case (for being unnecessarily complicated), and regret the second case (for putting too much burden on the installer).

dstufft · August 19, 2025, 8:35pm

Unless my brain isn’t working correctly (which is possible!) I don’t think you can universally resolve dependencies if variants are in play?

Existing environment markers are effectively immutable for a given target, but the same isn’t true for variants.

It appears the proposal tries to get around this by having the environment markers evaluate against the variant data from within the wheel– but that also seems wrong to me? A single wheel artifact may be able to handle any of multiple axis of variation, but the system itself may only be able to support a single of those options.

This might be a wrong example, but I imagine you could have a wheel that works for cu10, cu11, and cu12 (using dynamic dispatch or something), so it would satisfy the variants for all 3 of those within the wheel itself. However, the system itself might only support cu11, and if that wheel needs to depend on a different dependency based on whether cu10, cu11, or cu12 is being selected… they just can’t?

TBH having the environment markers use the contents of the file selected feels very off to me (at that point it’s no longer an “environment” marker anymore, since the artifact isn’t part of the environment) and feels like a violation of what environment markers are.

notatallshaw · August 19, 2025, 8:36pm

I’m not sure how this affects the case of universal resolution?

For example if I have a platform marker now in a requirements file:

foo
bar; platform_system == "Windows"

Universal resolution doesn’t need to “resolve” that environment marker, it can fork the resolution so that bar, and all of bar’s dependencies and transitive dependencies include the marker platform_system == "Windows", and then it merges the forks at the end of resolution.

How would that change with variants as part of environment markers? Maybe I’m missing something about this proposal.

steve.dower · August 19, 2025, 8:45pm

Okay, fair, it probably doesn’t (it just massively bloats the solution). Universal resolution is something that I’ve never had need to care about, so I tend to need it to be mentioned twice before I catch on that we’re not doing a specific resolve

Any PEP is going to need a fully worked through example of both how the resolver will solve this, and how an installer will interpret the solution in light of the reality of the system it’s installing into. If only to show that the process (and implicitly, the debugging of failures of the process) is of reasonable complexity.

pf_moore · August 19, 2025, 9:20pm

I will note that universal resolution is far from being a well-understood concept anyway. Pip doesn’t do universal resolution at all, and from what I recall of the lockfile discussions, the tools that do didn’t particularly agree on the process either (some needed a resolver at install time, others didn’t). Maybe now we have lockfiles that standardise what it means for a lock (resolution) to apply to multiple environments, things are clearer? I haven’t been keeping up with how tools have implemented the new standard lockfile.

That’s not to say it can’t be well defined, just that if we want to ensure that users get the same results regardless of what tool they use (and that feels to me like it should be self-evidently a minimum requirement) then we’re getting into very murky territory.

Actually, this leads on to another question, because a universal resolution generates a lockfile, and the lockfile format was explicitly designed to impose minimal complexity on a compliant installer (not needing a resolver was a key example of this). But the lockfile standard requires that selection of packages to install is done purely using environment markers. I don’t see how wheel variants will fit into that - and indeed the linked proposal document notes that lockfiles are still an open issue. I’d be concerned if adding wheel variants into the lockfile spec meant that we now needed a bunch of complexity around running selectors, managing environments to install those selectors into, etc. Because this would violate the lockfile design principle of being installable with simple installation tools.

mgorny · August 20, 2025, 6:00am

I admit that there are probably valid use cases for what you’re proposing. However, the proposal focused on the specific use cases we had at hand, and these use cases were focused around defining dependencies per specific variant wheel rather than per arbitrary data from provider plugin.

Think of it like this: if NumPy didn’t link statically, the OpenBLAS variant would have a dependency on scipy-openblas, and mkl variant would have a dependency on mkl. Technically, both variants are supported, but you only need the dependency for the variant you’re installing, and asking people to do enumerations like ”blas::variant::openblas” in variant_properties and not “blas::variant::mkl” in variant_properties and not … is not really good UX.