PEP 817 - Wheel Variants: Beyond Platform Tags

I’ve been ruminating on the PEP and the various threads for it, and I think there’s sort of a fundamental question that needs answered outside of the specifics in the PEP itself.

Determining what wheels are compatible with a given machine/environment for a given axis requires some amount of code to execute. This is categorically true, regardless of PEP 817 and “Variants” and as such, exists today on the existing axis of compatibility we have. There’s code that installers implicitly execute to determine whether a machine supports amd64 or aarch64, whether it supports Windows or macOS, whether it supports Manylinux, etc.

From a “nuts and bolts” perspective, there’s not really any inherent difference between running some code to determine CUDA version and running some code to determine glibc version.

So fundamental question is, how does an installer enumerate the axis, know what code it needs, and get that code, and do all that in a way that is performant and doesn’t risk arbitrary code execution.

For our existing axis, the answer to that is basically that the spec defines what the axis are, and leaves it up to the installers to figure out the rest [1], although typically someone invested ends up implementing the code and getting it into the major installers (either directly, or via libraries like packaging).

For the primary use cases that are driving variants, I think that they all could be handled the same way. There could be a PEP to add an axis for Nvidia GPUs, another one for adding an axis for AMD GPUs, another one for CPU versions, etc. Each of them would describe their axis, and then leave it up to the installers to figure out how to support them, though like the existing axis, I would expect someone invested to implement the code and get it into the major installers.

I think that is technically feasible, and if we choose to handle it that way, assuming those PEPs were accepted, something akin to the code that is in the various provider plugins would have to be added to each installer, just like similar code was added for manylinux, and iOS, and CPU, etc.

Which is why, fundamentally, there’s no option for an axis that doesn’t involve code that the installer executes [2].

So, if it’s technically feasible for the use cases that motivate variants to be handled like all of our existing axis we’ve introduced, then why does PEP 817 exist? Why not just do that and use the existing process for each of these new axis?

I think there’s a few answers to that question.

  1. I think the PEP process itself (and the manual integration of a completed PEP into all the various ecosystem tools) is seen as burdensome, heavyweight, and slow, particularly for things that are going to basically be a copy/paste of “like X, but for Y”. I think that sentiment is pretty hard to disagree with, and it was one of the motivating factors for the perennial manylinux PEP, and new hardware “things” seem to be popping up at least as quickly as the old date based manylinux was supposed to work.
  2. It was an attempt to remove burden from OSS developers and reduce the friction between them and the corporations [3]. For instance, many of the tooling authors have stated they don’t really know anything about GPUs, but the existing model would require them to explicitly handle them (both in terms of discussing/approving a PEP for a given GPU, but also in reviewing and maintaining the code that supports them).
  3. A realization that the “does a given wheel work on this machine/environment along some axis” is the underlying question behind the platform tag [4] and the motivating use cases for variants, and they are fundamentally the same (run some code to look at system, generate a list of what works on the machine, compare it to the given wheel), which means that an interface for doing this could be created, which would make the whole process easier and less manual in the future.

I think that those reasons are pretty understandable, and hard to disagree with, or at least hard to dismiss as unreasonable or “out there”.

I also think it’s entirely reasonable to say that you don’t think those reasons are enough justification to change how we’ve historically handled defining our axis!

But I don’t think it’s reasonable to suggest that adding a “expected by default” axis through the variant interface is somehow a drastically different trust relationship than adding that same axis through the previous mechanisms. [5].

This all brings us back to our fundamental question. No matter what axis we want to support (including the existing ones!), installers have to execute some code per axis to handle it, so how do we do that?

An earlier draft of this PEP had the installers auto downloading and then executing arbitrary Python packages during resolution. That would have maximally removed the friction to introducing new axis, and allowed both the installers and their end users to be fully “out of the loop” for managing their axis (as they currently are for build backends), which I think in abstract is really great!. Unfortunately, the cost of doing that was way too high– it effectively enabled arbitrary code execution at resolve time, which is a no go for both security and performance.

So someone has to be “in the loop” to decide on what axis are OK to include, which means either end users need to do that or the packaging community needs to do that [6]. For all of our existing axis, the answer has been “the PEP process defines the axis, and the installers decide what code to trust to implement it”.

Personally. I think that asking end users to decide on what axis to trust for anything remotely common is the wrong approach, both from an UX perspective and a security perspective.

As a thought experiment, imagine if pip only support the any platform tag natively, and any other platform tag required doing something like pip install cpu-selectors or passing a --allow-cpu-selector or having the end user pass around a JSON file or something. Even if you say that win32 only requires the stdlib so it’s foundational, the question still applies for manylinux.

I don’t think anyone would reasonably argue that that UX would be fine for win32 or manylinux, and I don’t think it’s reasonable for any axis that is likely to impact a large number of users if we can at all help it!

Why does the manylinux axis does not require any sort of opt in or gymnastics for end users?

Because, installers have chosen to depend on and trust some code that looks at an existing system and determines it’s manylinux support. There’s nothing inherently different from an installer choosing, on behalf of their users, to trust packaging or choosing to trust foobar-variant-provider.

What is inherently different in the variants proposal, is the notion of arbitrary axis, and I do do not think there is any reasonable mechanism by which anyone other than the end user can decide to trust an an arbitrary piece of code.

That being said, I don’t think that all of the ideas in the PEP require arbitrary axis.

  • The idea of standardizing an interface for handling axis seems wholly reasonable to me.
  • The idea of moving some of this data out of wheel file names an into metadata seems like a wholly reasonable idea to me [7].

I’m not yet sure how I feel about the specifics of the implementation of these ideas yet though!


Putting all of the questions of the exact shape of defining an axis aside, I think that leaves us here:

How do we decide what axis an installer is expected to handle by default? Is the existing PEP per axis defining the axis and leaving it up to the installers to figure out how to implement it the right approach? Should each installer decide on their own what axis are expected? Should we offload that decision to a central repository (either in the form of a library or a list of axis)?

Should we provide a mechanism for arbitrary axis to be supported (presumably on an opt in basis) or should we only allow axis that some central-ish body (be that the existing PEP process, the installers deciding, or a blessed library/list) have approved?

Should we standardize an interface for axis (whether we allow arbitrary axis or not) to make it easier to add new ones across the ecosystem, or is there too much overhead in doing that and manual integration is fine?

Should the “by default” axis and the “arbitrary axis” (assuming we add them) share a mechanism, or should they be independent from each other?


For me, I think that the ability to have arbitrary axis is pretty useful! If for no other reason than it allows experimentation and iteration of axis and it means that esoteric axis have a method of support that doesn’t require the ecosystem as a whole to care about supporting them.

I also think that sharing the mechanism between arbitrary and “by default” axis provides a reasonable glide path for standardization. Something like manylinux could have been an arbitrary “opt in” axis at first and could have been iterated with real world use to discover where it fell short, then as it gained popularity, it could have transparently been migrated to by a “by default” axis, reducing the risk of “getting it wrong” in a PEP, since many of our PEPs have to define something without much chance to experiment with it first in “real world” situations.

I don’t have a strong opinion on what the process should be for something to become a “by default” axis, but I am sympathetic to the idea that the PEP process is too heavyweight for it [8], and I think it’s probably perfectly fine if we offload that to some blessed library or central list with some basic guidelines for what is acceptable to become a “by default” axis.

And for all of that to work, we kind of need a defined interface that is shared between the “arbitrary” axis and the “by default” axis.

So for me, I think we should have a defined interface (whether it’s variants or not) for “axis”, allowing arbitrary axis to be distributed, but in an opt-in manner and that interface should be our preferred mechanism for new axis [9]. We should pick a mechanism for selecting “by default” (or maybe “standard” is a better word) axis, and any axis on that list should be expected to work out of the box with any compliant installer [10].

Thanks for coming to my TED talk [11]


  1. Some of the specs call out explicit stdlib APIs as how you get them, but some of those APIs have gone away or are more generally used as an example of what that data is. ↩︎

  2. At a conceptual level, I’m lumping schemes that require an external process to execute some code and pass that into the installer in some way, ala JSON files or CLI flags etc as still fundamentally being about the installer (or at least the install/resolve “process”) executing that code. ↩︎

  3. Since corporations are often the organizations that are bringing new hardware into existence. ↩︎

  4. And you could pretty easily argue the same for the Python tag and the ABI tag as well! ↩︎

  5. Remember, the perennial manylinux PEP made no mention of how an installer should interrogate the system to determine what glibc version was present, it was up to each installer to figure out how to do that, and I believe most just implicitly trusted packaging to do that for them. ↩︎

  6. This could mean that the installers handle it, it could mean PyPI handles it, it could mean some selected group of people handles it. Basically just some group of people deciding on the end user’s behalf. ↩︎

  7. If we were starting over from scratch, the variant mechanism could have handled platform tags for us entirely, which could mean that instead of a wheel like foo-1.0-py3-cp38-platform1.platform2.platform3.platform4.whl, it could have just been foo-1.0-py3-cp38-$variant.whl, and then the mapping of $variant to what platforms it supports is handled in metadata.

    Arguably you could even use it for the Python tag and the ABI tag too! ↩︎

  8. Although the lack of ability to experiment and get real world experience before hand makes the PEP process ultimately more conservative, for fear of having to live with a mistake “forever”, so it’s possible that the PEP process for a “by default” axis would become more streamlined in the world I envision. ↩︎

  9. And if it works, maybe even try to transition previous axis? ↩︎

  10. And the installer could pick whether they wanted to implement that “by default” by vendoring the "arbitrary” provider, or by having an allow list of “arbitrary” providers that they’ll use in an opt-out fashion, or some other, completely different mechanism if they so desire. ↩︎

  11. Some day I’ll figure out how to say things with fewer words, but not today. ↩︎

17 Likes

I agree, and I think that proposal is essentially the same as PEP 817, but without setting any requirements for the installer UI. A PEP that defines the standard interface and lets people experiment with tools.

Realistically I would expect that some installers immediately support some common variant selectors out of the box[1]. Other installers can take the “install your selectors up front if you want them to work” approach. Distributors might use the second type, but decide on a set of selectors they trust and bundle those alongside it. This is all fine!

I think such a proposal might have been accepted and implemented by the time this thread ultimately concludes. :sweat_smile:


  1. given that Charlie Marsh is a PEP author, I bet uv will! ↩︎

3 Likes

I agree with this but I think that this PEP should be scaled back to the first sentence and then the second sentence should be a future PEP to be discussed after the first PEP is implemented and proven and tested. All the discussion about vendoring etc should be postponed to the second PEP when the questions that would need to be answered would be answerable.

5 Likes

Surprise :wink:

Though I think there’s many ways you could implement that general idea, of which PEP 817 is one of them!

I’m not really worried about the PEP itself at the moment. It appears to me pretty easy to conclude that, in it’s current state, it’s not ready to be accepted.

I think that between the various threads, there has been a lot of talking past one another and just generally folks getting frustrated with each other, causing a breakdown in communication. Which I think is making things much more tense and making everyone feeling unhappy.

With all of that happening, people tend to get entrenched and snippy, which makes things even harder to progress on, but also makes it difficult to sort out where people actually stand on what’s being proposed.

So I’m trying to step back, and figure out, at a high level, is this general, but purposefully vague idea something that folks are on board with?

If it’s not, then the specifics in PEP 817 as they stand today don’t really matter, because PEP 817 is a possible implementation of that general idea, and if that general idea isn’t OK with folks, then 817 will need to become something wholly different (or just be abandoned as not going to happen).

If it is, then it’s possible that PEP 817 could be (or become) a reasonable solution that folks are happy with, and we can start to peel back the onion on 817 specifically to figure out what needs to change (whether those are changes to the actual implementation or to the process or what) to make it be the right solution.

But as it stands today, it’s really hard for me (and I would guess others) to really get a good read through the talking past each other on how far apart everyone actually is.

9 Likes

Thank you for taking the time to bring it to a higher level. I think that is important as I agree that it’s been hard to gather how far apart everyone is right now.

So for me, I think we should have a defined interface (whether it’s variants or not) for “axis”, allowing arbitrary axis to be distributed, but in an opt-in manner and that interface should be our preferred mechanism for new axis . We should pick a mechanism for selecting “by default” (or maybe “standard” is a better word) axis, and any axis on that list should be expected to work out of the box with any compliant installer .

If the PEP was more focused on this I think I would be a yes vote. As of now after digesting it more and rereading it over and over again, I would be a no. As written the PEP is prescribing too much on implementation details to tool owners instead of defining an interface for “axis” that can allow tool owners to discuss that and if we believe there is a centralized implementation that makes sense either per axis or even more general for a list of axis that are believed should be supported by default.

3 Likes

I’m sorry, but I don’t understand what you’re asking here. Could you please rephrase and / or illustrate with an example?

Well, the answer is rather simple: we were trying not to impose a particular implementation.

Oh, and since this keeps coming up (I suppose we should put it in the rejected ideas), I believe that requiring users to install plugins manually (i.e. `pip install variant-gpu-provider`) to enable them would be a very bad idea, and for two reasons.

Firstly, it’s bad UX. You’re requiring the user to actively maintain the installation of the provider alongside your production software. You have to keep it up-to-date yourself.

Besides, how would you integrate that with, say, `poetry sync`? Let me clarify that this command removes any packages not in the package list.

Secondly, it’s an increased supply chain attack risk. I mean, we’re talking about a case where the installer runs code from any installed provider package. Any compromised dependency of a benign package can maliciously add a provider that will afterwards be executed by the installer.

I admit, it’s not like this isn’t an attack vector already. I mean, unless I’m mistaken, pip will gladly let any installed package (including a compromised dependency of a benign package) overwrite its own files. But that’s not a reason to proliferate that.

I disagree, this is the exact model that hatch has with plugins for both hatch and hatchling.

Requiring plugin execution at installation time is absolutely imposing an implementation design.

By contrast, defining a static data file format to describe target deployment environments completely frees up the design space for how that data file gets generated:

  • implicitly during the installation process itself (what the PEP currently describes)
  • running an entirely separate tool
  • running a separate installer subcommand
  • specifying it directly without running a tool at all (and then externally ensuring the deployment target matches that specification)

With such a data format defined, the plugin execution requirements would all shift to the deployment target analysis step, and the requirement on installers would be reduced to having a way to accept the output of the target analysis step as an input to variant selection.

Any target based variant selector (install time selectors in PEP terms) that wasn’t included in the target analysis file would fall back to the non-variant version.

The ahead of time plugins (which I’m now thinking of as “variant consistency” selectors) feel like they could potentially be handled by adding “publisher” metadata to the variants that are exporting an interface, and “consumer” metadata to variants that are consuming those interfaces, so installers can just match them up. The difference from what’s currently in the PEP is ensuring there is a clear way for the variant metadata in a package to say “variant X requires a variant of package A that provides features M & N”. That may even be possible with the already defined metadata, but the PEP doesn’t define that process (instead, it’s delegated to being an install time plugin operation, but without really giving the plugins the information they would need to do it reliably)

Edit: as far as making pip install pytorch or uv install pytorch “just work with optimal results” goes, that would require either implicit target analysis in the tool, or pregeneration of a target analysis file in a defined location that tools could opt-in (either implicitly or explicitly) to reading

3 Likes

FWIW, I don’t think folks are that far apart in general terms, but it’s such a big concept that the opportunities for talking past each other are abundant.

I do think your separation of the problem definition (and its relation back to the status quo) is genuinely helpful, so perhaps a version of it could be turned into an Appendix to the PEP itself?

Oh, I didn’t realize that the PEP wording may imply that. Of course we assumed that users are permitted to provide the data externally, via a static file, and that is definitely present in the PEP. However, admittedly I didn’t think too deeply about how that can conflict with all the MUSTs and SHOULDs.

What I’ve meant to say is that we don’t want to force a specific file format and let installers go with what’s convenient to them. Particularly given that many tools already have their own file formats.

But a specific file format is precisely the sort of thing that is crucial to interoperability. How else will analysis tools and installers work together?

2 Likes

Some of the authors felt that it was easier to “plan for the behavior” but not specify the file format / structure.

If there’s consensus that this should be part of the PEP, I’m sure that won’t be a problem.

The current format we came up with is the following:

[metadata]
created-by = "tool-that-created-the-file" # Freeform string
version = "0.1" # PEP 440 version

[[provider]]
resolved = ["cpu_level_provider==0.1"]
plugin-api = "cpu_level_provider"
namespace = "cpu_level"

[provider.properties]
x86_64_level = ["v1", "v2"]

[[provider]]
resolved = ["nvidia-variant-provider==0.0.1"]
plugin-api = "nvidia_variant_provider.plugin:NvidiaVariantPlugin"
namespace = "nvidia"

[provider.properties]
cuda_version_lower_bound = ["12.8"]
sm_arch = ["100_real", "120_real", "70_real", "75_real", "80_real", "86_real", "90_real"]

It has a few characteristics / design intents:

  • The name of the tools that generated the file and its version
  • Trying to follow / respect the same structure you would find in the pyproject.toml
  • Being easy to read by humans: hence TOML over JSON
  • Simpler the better

I do personally share your feeling @pf_moore and I would prefer to formalize that format now and avoid a long & painful situation like with pylock.toml in a couple years.

1 Like

I think it’s worth making a distinction between code that based on constraints that are passed in (i.e. “give me the Python 3.13 manylinux wheels that support glibc version 2.25 and higher”), and those that detect the state of the system the code is running on (i.e. “use ctypes to find out which CUDA version is installed”). The former likely can be formally specified (and notably is required to be able to create an installation for a different system than the code is running on, with containers being an clear example of this), whilst the latter must be effectively arbitrary and does not natively handle creating an install in one place and using it somewhere else. From what I can see from the PEP, only the latter is discussed, which to me implies sadly that to get anything reliable (given there appears to be no control of the detection logic) out of such a system one must resort to source builds and disable wheels entirely?

Reading through this would it be helpful to remove the plugin requirement and let installer implementors decide based on preferences of security and maintainability?

Seems we focused on plugins as a simple way for us to support the wider community but I see now removing it wouldn’t substantially change things.

Agreed.

Also agreed. And thanks for your efforts to try to break out of that situation.

At the extremely high level, yes, I’m on board with the idea of variants. I do have some high level reservations[1], but in a broad sense I want a solution to the problems this is targeting.

It’s possible, yes. But it’s going to be a difficult process. One issue I see is that the PEP has eleven authors. That’s a huge number, and we’re already seeing signs that getting consensus among the PEP authors when the community suggests changes, is part of the problem[2].

Community consensus forms slowly. That’s a big part of the difficulty with any standards discussion. When coupled with a large group of proposal authors who naturally take significantly more time to react to suggestions than a single author would, keeping discussions moving forward becomes an enormous task.

Agreed. And specifically, I have no good feeling for how unified the PEP authors are in their reactions to the feedback we’ve had so far.

I’m becoming more and more interested in this approach. As you say, it offers a lot more flexibility in designing approaches for generating the data, and it also offers a solution for the problem of describing target systems other than the one the installer is running on.

What I don’t see is how this squares with the PEP authors’ stated requirement that pip install torch “just works”. Are they willing to accept that the user runs a tool to generate the description file as a one-off exercise? Do they expect pip to pick a “preferred” tool and run that automatically? Who is expected to notice that the user has installed a new graphics card and the tool needs to be rerun? Or am I completely misunderstanding how others view this option?

(And in the light of what I said above, I’m less interested in @mgorny or @jonathandekhtiar giving their personal views, and more in what the consensus of the PEP authors as a group is. I’m willing to wait for a consensus answer, although I doubt that discussion will stop while that consensus is achieved :slightly_smiling_face:)


  1. For example, given that this is part of the “Wheel Next” project, why are we trying to do this with the existing wheel standard, rather than getting a newer, more flexible format in place which can then integrate variants with far less difficulty? ↩︎

  2. For example, I’m taking your posts as personal opinions, not necessarily reflective of what the PEP authors as a group think ↩︎

6 Likes

I’m trying to envision a scenario in which I would be on board with it. I don’t dispute that getting hardware detection is a win, but if those interested in that had proposed something narrow that only handled optional hardware acceleration, and wasn’t something as complicated to reason about all of the impacts as this pep is, I think there would have been broad community support over a year ago, and we probably would already have solved the primary problem this pep is attempting to solve.

For that specific need, we only need a list of hardware accelerators common enough to agree upon the community supporting, and a canonical way to detect that hardware. I also don’t see updating that list along with canonical detection methods to be anything more than a maintenance task; it should be possible to define it in such a way that it’s expected to be updated periodically so long as updating detection doesn’t require anything that seems absurd by comparison to existing detection.

By contrast, the plugin mechanism significantly changes people’s existing expectations about when code will be run, what level of isolation exists at different stages, complicates lockfiles, complicates installers that manage multiple environments, complicates installers not written in python (even if the big existing one there with uv is ok with it, this complicates it for any future endeavor) and hasn’t managed to be presented in a way where things that were previously dealbreakers have been solved.

I see the technical argument for coming up with a solution that doesn’t require another new solution in the future, but this particular mechanism seems to have been abstracted to the point where it isn’t even a suitable technical design to get uniform tool behavior, let alone minimize the complexity for those looking at auditing code.

At the very core of it, there’s a problem with the plugin design, because plugins are installed into the environment being modified by the install command, aren’t differentiated from normal packages intended for use at application runtime, any importable module (of which packages can provide ones other than their name on pypi) top level namespaces might be used in the future, and these plugins may need to run at a time after some dependencies have started being installed, but prior to being done with installation.

If the authors are willing to change this so that plugins must be in an isolated environment and can only query information available from the installer rather than running in the the target environment, and start with not requiring arbitrary plugins, there might be a path forward that preserves the plugin mechanism.

Whether seeking a modification to the plugin design that keeps it being a plugin based solution and solves enough of the issues that it can reach agreement is worth it is another question. To me, sticking with the plugin mechanism feels like a sunk-cost fallacy. How many different possible axes can we define narrowly with less time spent? Right now, we have 2 proposed axes: Optional hardware accelerators, primarily gpus, and switching which underlying library a package is built against out.

3 Likes

Folks shouldn’t underestimate the significance of the feature namespace mechanism that the PEP defines. It isn’t something we had when the variant discussion first started, and it’s definition is the reason I see PEP 817 as a big step forward, even if the specific plugin invocation design in the current iteration proves unacceptable.

We’re now equipped with a tool that allows us to concisely describe not only potential axes of variation, but also ordered preferences for selection along each of those axes.

That’s huge, and is precisely what has allowed the discussion to advance from questions like “How do we describe which hardware a variant wheel supports?” to the current questions like “How do installers determine what hardware is available in a given deployment environment?”

6 Likes

This seems entirely orthogonal to whether or not it’s implemented as a plugin system to me. I don’t see a reason why a tool wouldn’t be allowed to offer something here to skip the canonical detection mechanism and target something specific, in the same way that installers already allow targeting other platforms.

1 Like

I’m not sure I follow this exactly.

Manylinux comparability is determined by running some code that (ultimately) spits out a list of strings that indicate what manylinux platforms a system is. An installer can allow you to bypass that detection and provide that compatibility yourself (and I think most, maybe all? of them do).

I think a similar thing is reasonable for an installer to provide for any mechanism we have of implementing a compatibility axis.

I’m skipping the comments on the design in the PEP for the reasons I’ve mentioned previously, but I want to answer this a minute because I think it talks to the question of “is our existing mechanism for adding an axis good enough or do we need a new thing?”.

I think that your statement is kind of operating at too high of a level to be super accurate.

For instance, I don’t think “GPUs” are a singular axis.

The different GPU makers are very different from each other in terms of what you have to consider. I happen to know Nvidia better (though I’m still sketch on it), but for Nvidia you need to worry about:

  • The CUDA kernel version.
  • The CUDA driver version.
  • The “architecture” (roughly the GPU generation, but technically different).

I’ve got no idea what any of the other GPUs care about, but I believe they’re very different.

There’s also things like TPUs, ASICs, FPGAs, NPUs, VPUs, DSPs.

There’s more and more different types of hardware acceleration happening constantly! GPUs are the elephant in the room today, but the other kinds are getting more important as well.

Another axis that is important today is CPU architecture levels. Right now you can’t ship wheels that take advantage of more modern CPU instruction sets like AVX512 in a good way (there’s work arounds, but they all kinda suck).

1 Like