WheelNext & Wheel Variants: An update, and a request for feedback!

steve.dower · August 14, 2025, 8:06pm

Yes, that’s what I’m suggesting. But the difference is that I’m suggesting to change the name of the package, and not adding additional opaque fields or metadata.

The main thing I care about is not having to change the filename format, and being able to fully and precisely express a resolved environment (not a universal lock, a specific lock) without having to update every tool that already exists.

Yeah, this is basically what I meant by “integrate the selection logic”. We can lock everything, but still need to re-run the selection logic every time the lockfile is installed in order to select the actual package that’s going to be used. That’ll upset the “never arbitrary code” people, but they don’t have to use universal locks

I think this is misinterpreting what I was intending by my statement - I was suggesting we should force some people to do more stuff manually, specifically, the package developers. Making them specify “when these conditions are present on the system, choose my package xyz” is making them do manual work, which some suggestions seem to imply are unnecessary (standard suffixes being used without the package developer’s “approval”, or opt-in vs. opt-out at install time).

It’s not about requiring fully automatic, it’s that we don’t need to pretend to be fully automatic and should quite happily make (some) people do manual labour. There just aren’t that many packages that need this functionality that we need to worry about burden - they’re all under burden already, so we’re just redirecting it into (hopefully) more sustainable effort than what they’re currently having to do.

pf_moore · August 14, 2025, 8:26pm

I think that it’s too subtle of a nuance for the average user to understand. Most people think (entirely reasonably, given the current promises we make) that if they prevent sdists, they are safe from arbitrary code execution. Whether they “block sdists” by using --only-binary :all:, or by having a local package index that only hosts wheels, or by some other means altogether, is irrelevant here (and that’s precisely why I don’t think the comments about --only-binary :all: are relevant - it’s only one way of limiting installs to wheels). What matters is “if I only have wheels, I have achieved a baseline level of security”.

My biggest concern here is the dynamic selection of plugins. IMO, the “never runs arbitrary code” install path is a critical requirement for many people, and for those people, having the installation of plugins triggered by package metadata will be unacceptable.

Why couldn’t we simply require the user to manually install the needed plugins, before doing an install? There might need to be some means of pre-scanning an install request to determine which plugins are needed, or maybe have a set of “well known” plugins that people would expecty to install for common use. And if an install needs a plugin that isn’t available, it fails asking the user to install that plugin. This isn’t like PEP 517 build backends - sdist building already allowed arbitrary code to run, so installing and running a build backend based on metadata was no less secure than the status quo. But selecting and installing wheels has always been a purely static operation until now, so adding automatic installation and running of plugin code is a fundamental change to the security characteristics of that operation.

I don’t want to (nor do I have the time to) review all of the discussions that took place offline, on the wheel-next discussion group, but I find it hard to have confidence in the design that has come out of that group when a security question as basic as “how do I ensure that I can install this package without running any code downloaded off the internet” doesn’t seem to have been considered.

I apologise if the above sounds harsh. I still haven’t had the time to read the pre-PEP - there’s a lot of concepts and terminology I need to get up to speed with before it makes any sense to me^[1]. But surely locked down environments aren’t so esoteric that these questions weren’t considered?

Maybe what’s needed here is a “toy” example, one that’s unrealistic but doesn’t require understanding the complexities of GPU variants, SIMD instruction sets, or anything like that. Maybe just a plugin that checks “does the user have gcc on their path?” (shutil.which("gcc") is not None). What would various user interactions look like for something like that?

One reason I prefer designs being thrashed out on Discourse is that it allows participants to get familiar with concepts and terminology gradually, as the discussion progresses. By doing the design offline, and coming in with an established proposal, all of that learning curve needs to be navigated at once by new participants. ↩︎

pf_moore · August 14, 2025, 8:52pm

Apologies - messages are arriving as I’m typing, and it’s hard to keep up. I’m going to take a pause and try to let things settle down so I can respond in a batch, but I wanted to address this point.

It’s really important to remember that the idea of “resolution” isn’t part of any standard. Nor, for that matter, is anything to do with “how to install a set of packages”. In fact, the only things that are defined by standards are:

How to install a wheel.
How to build a wheel from a sdist.
How to check if a single wheel is valid in a given environment^[1].

Everything else that installers, lockers, and other tools do, are based on these three standards (and given that building a wheel from a sdist is by far the hardest of these, I imagine that many tools only use (1) and (3), and don’t support sdists - we mustn’t make the mistake of assuming that uv and pip are the only installers - there’s the installer project, as well as who knows how many custom, special-purpose tools).

This proposal appears to be extending (3). That’s all (as far as I can tell - if I’m wrong, please say so!). But we are expecting every tool that relies on (3) to be modified as a result of this proposal.

@steve.dower seems to be arguing for an alternative approach that standardises something else - basically, other parts of the resolution process. I’m not sure how viable that is, or whether it would prove to be less problematic than the wheel variant approach of just^[2] modifying the “is this wheel valid” logic. And I don’t know how it would fit with specialised tools.

And even this is only defined to the point of saying that the installer has to know what sets of tags it considers compatible with the environment. ↩︎
That word is doing a lot of work in this sentence ↩︎

zanie · August 14, 2025, 8:57pm

Frankly, this is comment is unnecessary. Please don’t include personal attacks — you can just say the design does not sufficiently address this question.

While you may disagree with the conclusions, the design does consider this question in the Security Implications section. There are several possible routes to ensure installing a package will not download third-party code listed there. Hopefully through discussion here, we can determine which of those is appropriate, or come up with other ways to harden the design if they are not.

pf_moore · August 14, 2025, 9:10pm

I apologise. While my comment wasn’t intended as a personal attack, I did word it badly. I let my frustration over the fact that a huge amount of work has happened “offline”, and we’re now having to catch up on all of that get reflected in my wording.

Thanks for the link. I did take a look to see if I could find an obvious place where this was discussed, but I missed this. I’ll read it now - as you say, I may well disagree with the conclusions, but I was clearly wrong to say that it hadn’t been considered and I apologise once again.

Edit: OK, having read that section, the points in there are reasonable. As you expected, I disagree that “flexibility and user convenience” are more important than security here, and I’d want the approach to be switched around, with the secure options being the required approach, leaving dynamic plugins as a possible extension that the proposal requires to be explicitly requested by the user.

zanie · August 14, 2025, 9:27pm

Thanks Paul, I appreciate that a lot (and am sure other people working on the project will too).

For what it’s worth, the intent was not to hide work from anyone. The goal was to reach a prototype, which, for a problem this complex, required significant iteration and collaboration from multiple stakeholders. As Charlie said, I’d expect any proposal to go through significant iteration and I hope it’s clear that consensus here remains essential.

jonathandekhtiar · August 14, 2025, 9:28pm

I would like to address the “why” behind this - I think it’s important. You may not agree with our decision to create “WheelNext” but I hope to provide perspectives.

WheelNext was created to engineer a “collaborative space” modeled after the “Faster CPython community” that is able to focus on some packaging challenges that the Scientific Python community feels the weight of everyday.

Now - could we have done this differently ? For sure.

We believe our approach was necessary to design properly and refine a proposal as complex as the “Wheel Variant proposal” touching so many aspects of the packaging ecosystem. And maybe most importantly: to convince ourselves first “It works and correctly address the problem for some one of the most difficult use-cases the community is facing”.

We kept everything in the open - not one commit not one discussion is happening behind closed gates. We had many talks: PyCon 25 (talk & packaging summit), WheelNext Sumit, etc. and every time we presented our work it triggered a fairly extensive rework / redesign of our proposals. And it’s awesome ! And I have no doubt publishing the PEP will be a long process of refinements and adjustments and very possibly changing some fundamental assumptions.

This ability to quickly pivot and focus on something that works - we see it as an essential step to “coming with a good first draft of the PEP” and I hope we will soon convince most of you we have put the significant effort into writing a great starting point to this discussion while proving that the “Wheel Variant concept” does work and can work effectively and we will able to focus the conversation on how to make it best work with the ecosystem assumptions (like no remote code execution).

WheelNext is only the vehicle that allows us to create a focused community of dedicated people focused on refining and proving these ideas to themselves first. The ultimate goal is to publish a PEP. We do not anticipate you having to go to WheelNext channels and unearth months old discussions (that might already be outdated). This is our job to communicate to DPO community the summary of these conversations and make them digestible to everybody. And if there is anything we can do to make it easier for you and others to “catch up” - please let us know and I’m sure we all do what we can to make it happen.

For now - we are focused on regrouping all our “bits of docs” and “conversations” into a PEP that is shorter than a 7 volume space odyssey saga .

Now - if anyone is very eager to have access to the PEP draft I’ll be pleased to forward it - though please understand this is a WIP and absolutely not ready to be publicly released.

I hope this provides some much needed context.

barry · August 14, 2025, 9:36pm

Given that the package name namespace is almost entirely unconstrained, how do you prevent “variant squatting” in that scenario? I share @zanie’s concern about that, and I’m not sure how

helps prevent that, unless you’re thinking about something along the lines of PEP 752 style namespaces. One of the major benefits IMHO of the variant proposal is that you’re not changing anything about how package naming or package ownership works.

That might be true^[1], but I think one of the main reasons to use --only-binary :all: is to prevent the highly likely failures during the package build process. Extension module builds are increasingly complex, and we have no standard interface for them, so it’s more likely than not that trying to build an sdist with an extension module is just going to fail. Better to fail fast and uncryptically in those cases.

Surely that’s something installer tools could solve, right? Options include hardcoding the exact set of provider plugins they’ll allow, vendoring, command line options to control behavior, etc.

Your installer tool of choice could require that.

I’ll note the wheel-stub package which piggybacks on the sdist build backends to effectively do variant resolution at sdist “build” time, and utilizes external indexes to resolve to the appropriate variant wheel. While a very clever hack given the realities of today’s packaging standards, it’s ultimately suboptimal because it’s inscrutable, with no ability to statically analyze or reason about what’s going on. And of course, because it uses the build backends it executes arbitrary code.

I can’t speak to what most people think here ↩︎

mikeshardmind · August 14, 2025, 9:55pm

There’s more than one problem with this approach. The most obvious one is that if any package could be a selector without the user knowing, then they can’t even download the wheel to inspect it before trusting and running it with the existing tools, as downloading would imply running the selector to get the “Actual” package.

I don’t think the problem space affects enough packages to warrant this. All of the existing “big packages” with this problem would have enough info statically if we had just this information in the distribution info, available to resolvers:

preferred order of variants
required cpu instructions per variant
gpu requirement per variant
required version of bundled libraries (mapping of name to version)

aterrel · August 14, 2025, 9:58pm

Well we’ve been doing that with CUDA for a while and we have these cu11 and cu12 tags all over the place. The problem is that names turn into extra agruments for every downstream dependency.

For example, there are two dozen cupy named projects on pypi. Now if I’m a simple library that selects between algorithms like nvmath-python I could just have one dependency with variants, but instead they have a dozen different extra selectors. Thus anything that wants to have nvmath-python as a dependency now has carry which of the 24 X 12 different combinations it needs to supported just to call a faster matrix multiply.

So yes I think it hurts user experience quite a bit to just rely on names.

steve.dower · August 14, 2025, 10:00pm

Because foo literally contains the names to be used instead:

# foo.py - or some clever metadata that I'd rather not invent but I'm sure people would prefer to have a Turing complete TOML file rather than Python code
if detect_cuda_12_8():
    return "foo-cu128"
else:
    return "foo-cpu"

The only way a package could resolve to a squatted name is by putting squattable logic into the main package (e.g. return f"foo-{os.environ['PROCESSOR']}"). But I’d say that’s a bad idea and you shouldn’t do it

(I guess I should also point out the similarity to the idea of hosting wheels off of PyPI and using an sdist to “build” the wheel by downloading the right one. That was safe enough because the URLs were hard-coded into the build script. If I were going to mock up a demo of this, it’s more or less exactly how I’d do it - a build backend that determines its dependencies dynamically and produces a “wheel” that depends on the specific name you want.)

steve.dower · August 14, 2025, 10:02pm

In my proposal they’d just rely on the selector, and the variant is chosen on install. If the library has a specific requirement, they can depend on the variant directly (because it’s just a package name). If it has to dynamically choose which variant it should depend on, then it needs its own variant selector (but this seems like the least common case, whereas depending on cupy and letting the selector logic sort it out seems most common).

I believe this is exactly the same as the variant proposal given here, apart from not trivially being able to add a dependency to a specific variant (independent of platform - depending on cupy_cu11 is going to work on Windows, macOS, manylinux just fine, while always getting the CUDA 11 variant).

pf_moore · August 14, 2025, 11:10pm

It does, and I’m not saying that the decision to work the way you did wasn’t justified. But it does have its costs, and one of those is that for people like me who don’t have the time to follow another discussion forum alongside Discourse, there’s a lot of hidden context that we have to discover after the fact.

I guess so? But that results in an inconsistent user experience (if pip disallows all plugins, users will get different results when using pip rather than uv). Plus, what does a library like installer do? It doesn’t have the luxury of picking one solution, as its users are tools that quite possibly want to make their own UX choices.

So if pip chose that route, what would happen with the proposal? We’d be saying “pip supports wheel variants” and I wouldn’t expect to be faced with people claiming that what we had wasn’t “proper support”. There’s a bit of a dilemma here - while I’m the first to point out that we can’t assume that there’s just pip and uv, it’s still true that in reality, what pip and uv do is what users see as the implementation of new standards. In practice, people don’t have an “installer tool of choice”, so leaving decisions about key functionality to tools ends up transferring decisions from the standards process to a small group of installer maintainers. I’d rather where possible have the standards make the decisions.

I was almost going to propose that approach as an alternative solution. I was aware of wheel-stub, and I knew it wasn’t well-liked as a solution, but I don’t think the objections you raise necessarily stand up to scrutiny. Particularly if it’s being considered as an alternative to the existing wheel variant proposal which also has people objecting to it

For example, you could enable static analysis and remove the inscrutability problem by defining a new static metadata file, which will always appear in the sdist alongside pyproject.toml (you could even use pyproject.toml itself, if you wanted). The build backend doesn’t have any “inscrutable” logic, it simply implements the selection criteria defined by the metadata.

And of course, “because it uses the build backends it executes arbitrary code” is hardly an objection to this approach when it’s being offered as an alternative to wheel variants, which execute arbitrary code. The advantage of a build backend is that sdist are already well known to allow arbitrary code execution, so this matches people’s expectations much better than wheel variants.

This isn’t a completely serious suggestion - I’m sure that if it were viable, wheel-stub would have had more success. But it’s certainly a credible counter argument to the idea that we have to allow arbitrary plugin execution during the wheel selection process.

Also, if you want a middle ground, how about taking that metadata file I mentioned, and packaging it as a new type of distribution artefact, call it a “selector” for now, which can be published like a wheel but it’s clearly documented as needing to download and run plugins when being used. That avoids the negative connotations of a sdist, while keeping wheels secure. This may be similar to Steve’s selector idea, I’m not entirely sure?

barry · August 14, 2025, 11:13pm

That looks like code that has to get executed to me . Where, when, and by which component would that conditional get executed?

Time machine to the rescue! That’s basically how wheel-stub works^[1].

mentioned above, but possibly buried in a longer response ↩︎

barry · August 14, 2025, 11:20pm

IIRC, we’ve been down that road in other discussions^[1], with the line between what should be standardized and what should be left to the tool being murky at best. Maybe we can’t boldify that line any better than it is now, but it might help to express the principles and thought processes that go into which side of that line a particular behavior belongs. Kind of like a Zen of Packaging?

Perhaps, but I think when the arbitrary code is executed is important here, and given that there are ideas about how to make the variant selection process much more (entirely?) static it might be possible under variants to eliminate code execution for paranoid use cases^[2].

default extras? ↩︎
which I agree are completely valid to worry about ↩︎

oscarbenjamin · August 15, 2025, 12:11am

Would you consider this to be a problem in the opt-in case?

If you have something like

$ pip download foo
downloading foo...
package foo needs selector bar
Run selector bar [y/n]?

then is that acceptable?

This reminds me of how manylinux played out. There was manylinux 1 in PEP 513, then PEP 571 for manylinux 2010, then PEP 599 for manylinux 2014 and then finally PEP 600 for “perennial manylinux”. It seemed that it was possible to enumerate the possibilities but then the PEP process became too much and we needed a PEP that specified a mechanism that could be extended without a PEP.

I agree that the possibilities you enumerated cover the main cases right now but I think it is bad to get into a situation where those special cases have to be hard-coded into a specification or into individual tools without a specification. This is a dynamic problem space but we also need coordination across tools. I think it is better to have an extensible but specified mechanism as is proposed here.

The tension here is between secure-by-default and maximising convenience in the common case especially for novice users. These conflicting aims are qualitatively different though:

Secure-by-default is binary and absolute
Convenience is a sliding scale

The proposal as specified right now maximises convenience and then concludes that secure-by-default is not possible but I think it is better to start the other way round:

Assume secure-by-default from the outset
Consider how to maximise convenience without compromising security.

I can imagine that the convenience starts with an opt-in prompt:

Run selector bar [y/n]?

There can also be flags like --unsafe-run-selectors. Later on there can be blessed selectors that are vendored into the installers and at this point it is appropriate to consider the particular issues for the “big packages” to maximise convenience for common user situations while still maintaining secure-by-default.

There are various comments above about the fact that existing installers will run arbitrary code because they install from sdists. With my long memory I can say that it is (or at least was) a longstanding goal since way back to when wheels were invented to abolish this. Over time we get closer to --only-binary by default being a possibility as more packages have wheels. The proposal here as it stands suggests a backwards move on this front where arbitrary code execution would be promoted as the new way of doing things for some of the most popular packages rather than being a legacy mode for older packages that is retained for backward compatibility.

Some of the packages that provide sdist but not wheel do so because they have simply not been update to provide wheels. Others have legitimate reasons for using sdist because the wheel format does not provide what they need due to lack of arbitrary code execution. It is possible that the proposal here actually solves some of those cases by providing sufficient capability in wheel installation that they can do what is needed. That might bring us closer to the possibility of --only-binary being a plausible default. I think it would be better to consider if this proposal makes it possible to move to secure-by-default rather than using the non-secure status quo to justify introducing the proposal here in an insecure way as well.

Lastly the proposal here discusses using the variant mechanism for CPU capabilities. The proposal seems perhaps inspired by a previous thread in which I suggested exactly this. I just want to be clear here that I think that micro-architecture variants like x86_64-v2 etc should really be handled as part of the CPU tag rather than through the variant mechanism that is proposed here. As a temporary or unusual mechanism variants can make sense for CPU capabilities but this should not replace efforts to improve the way that CPU architecture is represented in the main static metadata.

dstufft · August 15, 2025, 1:50am

Note: I’m now employed by NVIDIA, but these are my own thoughts, which I’m still trying to decide how to I feel about specifics parts of this design.

Perhaps a useful thought experiment.

If the variant plugins were opt-in and users ran into a package that used variants. Would we expect them to typically just allow it?

I’m thinking back to PEP 438, and how people really really hated the requirement to allow certain packages to install, and the feedback was almost universally negative, and in most cases people just blindly did whatever it took to make the thing work.

One thing I’d worry about for an opt-in thing, is what the chances are that the flags to opt in just become some random cruft that users end up having to just cargo cult into their environments to make things work as they would expect.

ncoghlan · August 15, 2025, 3:05am

Full disclosure: @dstufft’s disclosure reminded me that I should also mention that I have a paid interest in this topic via LM Studio. Handling parallel CUDA stacks more gracefully in venvstacks is a not yet solved problem. It’s not impossible to do with things as they are, just awkward and inconvenient due to the need to keep assorted direct URL references both up to date and internally consistent.

A further bit of related background that’s likely to be useful for folks that haven’t previously encountered it is the documentation for uv’s existing variant selection logic for PyTorch: Using uv with PyTorch | uv

The situation in the status quo that the wheel variant proposal is aiming to get away from is the one where a naive pip install project installs a dependency stack for project that technically “works” (in that it runs without crashing), but is in fact so slow that the claim of “working” is debatable. We also don’t want to get into a situation where package installer authors have to be experts in hardware acceleration technologies, nor have every project that publishes hardware accelerated extension modules require such expertise.

Getting to a point where there are a handful of common selector modules, preferably available as both pure Python modules (for potential vendoring in pip and other Python based installers) and as Rust crates (for potential static linking in uv), isn’t going to be easy, it just seems a more tractable problem than directly baking that selection logic into package installers, or distributing variations of the logic across every project with a hardware accelerated extension module.

When downloading for introspection, there’s no need to only analyse the optimal wheel for the current hardware - you would probably want to bypass the selector logic entirely and analyse all the wheels available for the platforms you care about. (That may actually be a reasonable default behaviour for the pip download use case)

pf_moore · August 15, 2025, 8:14am

Taking a step back from the plugin debate for a moment, this proposal seems to imply a new wheel version (the file name format changes, for example, which is incompatible with the current spec). How have you addressed the well known issues with transition to a new wheel version?

steve.dower · August 15, 2025, 8:38am

Yeah, I’m pretty sure they would just run it. What do you get otherwise? A non-working package?

I know you’re pushing towards “malicious code bad”, but the reality is that people who care about malicious code care about the entire package, and so it’s all treated as risky. Whether it runs at install time or at first use (and so much malware on PyPI runs on first use, not on install) isn’t that big a difference.

So my answer is “just like a build backend”, and the reason it’s better than setup.py is we can start with cleaner designs (we aren’t cargo-culting 20 years of dealing with compilers/etc.), and it’s a special case that will immediately attract attention. Metadata would have to indicate a selector package, so that it can be executed immediately instead of waiting for a complete resolution, which means scanners will immediately look at their contents and detect things that aren’t sensible for that scenario (e.g. network access - a big difference from wheel-stub is that a selector package shouldn’t need network access, so any example of it is a red flag, while wheel-stub requires at least some, making it harder to flag the bad examples).

The combination of scanning Python code for sandbox detection and then running it in a sandbox/detonation chamber to detect malware is perfect for the scale and scope of selector packages.

I can only assume that most people aren’t aware of the multiple teams who scan everything that gets uploaded to PyPI, usually reporting new malicious packages within hours of it appearing. I see no reason to assume that this wouldn’t continue to happen for selector packages, and it gets easier if we’re encouraging the logic to be simple, tightly-scoped, networkless, and uploaded to PyPI. (This is another example of my “it doesn’t have to be automatic” position - we don’t have to prevent malware by specification, we can instead trust people and then verify.)