WheelNext & Wheel Variants: An update, and a request for feedback!

pf_moore · August 15, 2025, 7:19pm

There’s no standard that says any isn’t a valid Python tag. I already said this was a pathological case, so I’m not trying to defend the example. Just trying to say that it’s possible to have a wheel filename which can’t be reliably identified as being in the old or new format, based on existing standards.

As I said, I did a lot of scanning of PyPI data at one time. There’s some very broken (but still valid) cases out there.

No, it doesn’t. The one I’m thinking of right now was a bulk scan of all of PyPI, and downloading wheels just to verify the version was impractical. I only wanted the components of the wheel filename, and I did the bare minimum the spec allowed (split on hyphen, assign the 5 or 6 components to the relevant fields). I didn’t expect my code to be bulletproof, but I did ensure that it was correct, and wouldn’t be arbitrarily broken by future changes to the ecosystem, by carefully following the standards. I don’t feel supported by the community if it turns out that care was in vain. Don’t misunderstand - the code is pretty easy to change, and I can adapt it to new standards without much effort (if I even do - my need for the data is not as pressing these days). But like @Liz, I feel that changing the guarantees without a proper transition feels like a broken promise.

OK, but a standard needs to ensure that we don’t break any working code without a justification and a transition plan. As @Liz pointed out, users’ trust in the Python packaging ecosystem is fragile, and we cannot afford to be seen as not caring about stability and compatibility.

The old standards (like the wheel spec) were written in a very different time, when Python was nowhere near as popular, and packages were far less complex. As a result, those standards fall far short of the levels of precision and tightness that we’d expect today. That’s a huge problem, but it’s one we have to deal with. It’s very reasonable to argue that we need a compatibility break in order to fix problems caused by the state of older standards, but we still have to provide our users with reasonable warnings and transition processes. It’s one thing to say “this will hurt, but it’s necessary”, and I’m fine with that - but it’s something completely different to say “we’re not going to help you manage that necessary pain”.

barry · August 15, 2025, 7:30pm

I argued early in the variants discussion that this makes a centrally managed solution untenable, and I still think that’s the case.

I don’t maintain installer tools, but I maintain enough open source that I am always extremely hesitant^[1] to accept changes that I don’t understand or can’t test, and I suspect the pip maintainers would be especially so, given the impact on the ecosystem of a bad change.

I also don’t see this reasonably as being added to the Python stdlib, where again, the risk aversion is very high.

Besides all that, as was mentioned, the release cycle and long tails of both projects just increase the infeasibility of a centrally maintained solution (at least at the interpreter or installer level).

A middle ground could be a central repository, maybe something along the lines of typeshed where the collection of variant resolvers is centrally and collaboratively maintained. This could mean that an installer would only have to trust one provider which would basically be an amalgam of all the individual system capability providers. That would at least reduce the surface area to a single dependency which likely could use modern supply chain guarantees.

to the point of outright rejection ↩︎

barry · August 15, 2025, 7:31pm

They’re also the ones least able to navigate the complexity of system capabilities to craft an efficient and functioning environment to run this complex software stack. There lies the tension.

barry · August 15, 2025, 7:36pm

I think we also have to recognize that we as a community here on DPO are probably unaware of the majority of code out there. There’s just no way to even know if a change you’re making is going to break something. Or better yet, you have to assume it probably will, for someone.

That’s just a risk we have to internalize by building software that millions of other people are going to use, most of whom are completely detached from all these discussions.

steve.dower · August 15, 2025, 7:40pm

That’s closer to what Antoine was suggesting (“runtime dispatch”), but I’ve suggested it in the past, and I think it’s a great way to set the whole thing up. But I’d expect the “fat” wheel to often be a set of partial wheels, so they can all install simultaneously and produce a “fat” install (that dispatches at runtime). e.g. if you have a range of CUDA versions to support and each is ~1GB of wheel, then don’t put them all in a single wheel, but do let them be installable simultaneously and include the internal logic to import the right one (e.g. by making the wrong ones fail with a nice ImportError and catching that).

My main suggestion here was in using the name of the package as the variant identifier, not adding a new field. So rather than foo-1.0.0-cp314-win_amd64-WEIRD_HASH.whl we’d just have foo_cu12-cp314-win_amd64.whl (and packages can then choose whatever granularity they want, because they fully control the “cu12” part - @mgorny covered this well in an earlier post, but trying to control all the dimensions upstream is not going to work, and the easiest way to push it all down to the publisher is to just put it in the name).

In case it’s not obvious,^[1] foo_cu12 doesn’t literally install foo_cu12 and require users to import foo_cu12. It might install site-packages/foo/_cu12 and then site-packages/foo/__init__.py can from ._cu12 import ... with a fallback if the module/package isn’t there. So neither source code nor the list of requirements has to specify anything other than foo, but the package developer can decide what code import foo should actually execute (as today, no change here) and what pip install foo should actually install based on the target system (this is the new bit).

I think it’s very obvious, but the responses suggest that it’s not. ↩︎

barry · August 15, 2025, 8:56pm

I think there are two problems with this approach. One is more global in the sense that even if you could make this work for one capability dimension (e.g. cuda version), it quickly becomes untenable when you have to deal with multiple dimensions, e.g. gpu version, cpu instruction set, etc. across many multiple platforms.

The other problem is more “local”, meaning, let’s say this even worked. Imagine every release now includes dozens if not hundreds of individual component wheels implemented as separate packages, and a “meta” package that had all the right dependencies and did the runtime dispatch. How do you even release this stack in a consistent, coherent way? It might take hours or days to get everything uploaded and tested, and one bad wheel in that stack will be a major headache to fix. PEP 694 could help, but even with stageable upload mechanism, there’s no way to do atomic releases across more than one package^[1]. So now you’ve for sure got race conditions which will break your environments if you’re unlucky.

In a variants world, at least you’re localizing your uploads to a single package so 694 could help a lot in that scenario.

and I’m not even sure PyPI/warehouse could possibly support that even if there was a protocol for expressing in, which 694 definitely isn’t ↩︎

BrenBarn · August 16, 2025, 7:17am

For me this is another case where the desire to improve incrementally handicaps the proposal because the incremental change is not beneficial enough to be worth the disruption (even though the disruption is smaller). As in, an improvement of 100 “goodness units” may be worth paying 100 “disruption points”, but an improvement of only 10 goodness units may not be worth paying 10 disruption points.

I’ll avoid restating a bunch of examples of why I think this, but just to stick to one thing that’s specifically part of this proposal: It is just obvious to me that using the filename to store important metadata is entirely unsustainable. I will go out on a limb and guarantee that down the line we will have some other bit of metadata that we think we really really need, and an 8-character string at the end of the wheel filename will not be enough.

There is no realistic path forward without shifting to a system which is based around having metadata stored, served, and processed separately from the package. This makes some things more complicated because you can no longer “just look at the filename”, but that is the point: the amount of information we eventually may want is definitely going to be larger and more complicated than any single hyphen-delimited string we can “just look at”.

pitrou · August 16, 2025, 8:20am

But on the flip side, shipping separate variants for each CPU level will increase their storage footprint on PyPI even more. I don’t have numbers, but most packages have a relatively small (or even tiny) proportion of dynamically-dispatched code - usually in low-level performance-critical loops.

pitrou · August 16, 2025, 8:36am

Isn’t this an argument against wheel variants as well? Do you really want to generate and host ^[1] the hundreds of slightly different builds of your package for all supported CUDA versions and CPU support levels? It might be much worse than adding dynamic dispatch for the few select portions of code where it matters.

But at the end of the day, the problem might be the existence of mammoth packages that try to include every potentially useful functionality under a single namespace and a single convenient install ^[2]. The “single convenient install” aspiration might be better served by well-advertised meta packages, rather than shipping everything into a single package.

Or let PyPI host ↩︎
Which might be not be that convenient given its size! ↩︎

mgorny · August 16, 2025, 10:20am

The proposal doesn’t store any “important information” in the filename. All the variant information is stored inside the wheel’s .dist-info directory, and replicated into a separate JSON file on the index for dispatching. The only thing stored in the filename is a unique identifier, whose sole purpose is being able to actually have multiple files to dispatch from.

pf_moore · August 16, 2025, 10:59am

Serious question - why not create a brand new format, independent of the wheel format, and use that for the variant proposal (and all the other new wheel features we want). Tools can learn to treat this new format as a second binary format, and we don’t have to stick with the constraints that the wheel format imposes on backward compatibility.

We shouldn’t need a new build backend interface - the wheel format remains fine for builds that target a single environment, and the new format can have the capability to “point to” wheels, just like the existing variant proposal does. We’d need new tools to take a set of environment-specific wheels and assemble a new-wheel file with variant pointers, but that’s fine - this capability is only needed by a limited number of projects, and making them do a little more work by adapting their workflow to add a new “assemble variant distribution” step doesn’t seem unreasonable.

There would be a transition cost, of course, but as I said in an earlier message, I don’t think there’s any solution that can realistically avoid a transition cost, no matter how much we might like to hope that we can.

mgorny · August 16, 2025, 11:28am

In my opinion, creating an entirely new format for a minor addition like this would be an overkill. Of course, if we collected many more features than that, it would perhaps be justified — as in a “Wheel 2.0”. However, with what this proposal involves right now, the new format would basically be “just like wheel, except for this additional file”.

I would also like to repeat that one of the points here was to actually preserve a degree of backwards compatibility. Yes, we do not want variant wheels to be installed accidentally. However, there is no reason to unnecessarily break compatibility with other tools. Admittedly, this is unavoidable with the changed filenames, but at least “dumb” tools that don’t validate the filename should not be affected. And since the “guts” of the format don’t really change, they may continue working just fine, or require absolutely minimal changes.

Admittedly, we can’t predict all possible use cases. There are tools that could work with variant wheels just fine, but will reject them because of the filename. There are also probably tools that won’t handle variant wheels properly, yet will accept them because they don’t validate filename. There is always a risk, but the real question is: is breaking all backwards compatibility really worth it? Being careful may be a good thing, but it may also cause a lot of unnecessary friction and pain.

dstufft · August 16, 2025, 2:24pm

We already can look at the metadata and do not need to rely on just the filename.

The METADATA can be lifted up and fetched independently of the artifact itself (and on PyPI this is happening for all wheel files).
The Simple API has several properties that have been lifted out of the artifact and into the API response itself.
The proposal in this thread adds a variants response to the simple API to allow querying the variant metadata.

From a technical POV, we need something in the filename because PyPI requires unique filenames, but that’s something we could relax if we wanted to.

However, I think it’s still useful to have something in the filename, if for no other reason than it helps provide some indication to a human looking at a list of 50 files what’s different about them.

I have no idea what the % would be here, so I can’t really speak on it. If the increase is tiny then that might be workable. If the increase is large then it may not be.

Currently PyPI has a 1GB hard limit on the size of an individual wheel that cannot be raised IIRC, but bandwidth is our most expensive “bill” so whatever minimizes transferring unneeded “stuff” the most is a win from PyPI’s POV.

My guess is that if dynamic dispatch worked well enough for these use cases that would already be used rather than the awful hacks that exist today, but I have nothing to base that on besides a guess? Someone who understands the specific problem space better would have to answer.

mgorny · August 16, 2025, 3:04pm

Dynamic dispatch cannot solve all the problems users are facing. The specification specifically points to: manylinux_2_34 x86-64 builds produce binaries that are not compatible with all x86-64 CPUs #1725.

You can use dynamic dispatch when your only concern is a specific inline code focusing on additional extensions. I suppose you could even build the whole library twice with different -march values, ship both variants and use dynamic dispatch to switch between them, though it’s going to get really messy.

However, you can’t use dynamic dispatch not to support older architectures. The best you can get is failing at runtime.

barry · August 16, 2025, 4:58pm

I could totally be wrong, and perhaps @EWDurbin can shed light on this, but I think that 1GB hard limit is mostly due to upload request limitations. With PEP 694 providing a mechanism for indexes to provide alternative upload protocols, it could be possible then to e.g. return pre-signed S3 URLs to directly upload, in chunks, files up to the S3 maximum object size of 5 TiB. I’m not saying we should just that we could.

pitrou · August 16, 2025, 5:04pm

This seems to be a red herring. Why would wheel variants be better equipped to solve that problem than standard wheels? If you are able to change the CFLAGS for a wheel variant build, then surely you can also change the CFLAGS for a standard wheel build?

mgorny · August 16, 2025, 5:18pm

Because standard wheels cannot express “this wheel can only work with x86_64-v2 whereas variant wheels provide the ability.

If you’re using prebuilt static dependencies that are built with CFLAGS you have no control over, then no, you can’t. And even if you could, there is no guarantee that you actually want; say, if the compiler optimization is actually causing a significant performance gain.

pitrou · August 16, 2025, 5:41pm

You are moving the goalposts, aren’t you? If your goal is to produce wheels that work on x86-64-v1 machines, then you are free to do so. Any package author can change the compilation flags for their wheel builds, they don’t need a mythical wheel spec change.

What you can’t do currently is produce different wheels for different CPU support levels, but that is an entirely distinct problem from “manylinux_2_34 x86-64 builds produce binaries that are not compatible with all x86-64 CPUs”. Please don’t try to masquerade one as the other.

Ok, so wheel variants are not able to solve the problem, either? In the end, what is your point exactly?

oscarbenjamin · August 16, 2025, 6:07pm

This is where wheel variants aren’t very suitable for CPU architecture levels. It should be possible for all packages to set a minimum CPU architecture level and publish e.g. v2 wheels without needing to have multiple variant wheels or use dynamic installer plugins. In this day and age x86_64-v2 is a reasonable baseline and it should be fine to use a manylinux image that produces such wheels.

dstufft · August 16, 2025, 6:08pm

The goal is to produce a binary that works with v2 (for instance) without having an older CPU that doesn’t support v2 select it and attempt to use it.