WheelNext & Wheel Variants: An update, and a request for feedback!

The closest equivalent in Gentoo are USE flags. Besides controlling package features, USE flags have some overlap with the intended use case for variants, such as CPU instruction sets.

In Gentoo, the flag choice is entirely manual, with some “reasonable” defaults provided at distribution level. Say, if you don’t set CPU_FLAGS_X86, packages using explicit SIMD code would be limited to SSE2 on amd64, or SSE on i686. We provide a tool, cpuid2cpuflags that users can use to detect their CPU features and set a good value.

At this moment, it supports ARM, PowerPC and x86, and the logic is already quite complex. Mind you, it only needs to support Linux. When packages start needing new CPU-related flags, we need to update it in lockstep with the distribution flag definitions, to ensure that flag names match. And of course, it happened more than once that developers added new flags without actually pinging me to update the tool.

Another problem with that approach is that since the tool needs to be run manually, people don’t realize that they need to run it again after an update, so they end up running their systems with incomplete flag sets.

And let’s not forget that Gentoo is what you’d call an “harder than average” distro. Aiming for more general population of PyPI users means that we need to aim for a simpler solution.

2 Likes

Ok, but what does this have to do with SIMD support levels? Here on Ubuntu 24.04:

$ dpkg --print-architecture
amd64

Ok, also of note in that page:

Preferably, the project should rely on runtime dispatch for arch-specific optimizations.

3 Likes

I think this is only possible if the plugins are in some kind of controlled registry, or it’s an opt-in choice to download a plugin[1].

The way the original proposal was phrased, it seemed like pip install foo would automatically download whatever plugin foo asked for and then run it, no questions asked. An open index like PyPI[2] doesn’t have the security resources to stop malicious plugins from sneaking in, even if only for a brief time window.


  1. so the auditor can inspect it before making that choice, or set up their own plugin repository, or whatever they want ↩︎

  2. or even worse, something like GitHub ↩︎

2 Likes

Taking a step back, are there any good examples of prior art in this area? It sounds like this is very much a universal issue, not closely tied to Python, so I don’t think we should be innovating here, but rather we should build on what others haved already done.

In particular, are there other package management systems that download and run package selection code at install time? How do those systems ensure that the selection code is autitable and controllable? Do they have a curated package repository (somnething that Python doesn’t have, one specific concern I have here is how much reliance this might put on PyPI not hosting malicious code).

@jonathandekhtiar claimed there could be 20-30 selector namespaces. Are there any examples of other package management systems that allow packages to be defined to that level of granularity? Being able to control 20-30 different variables when differentiating packages seems incredibly unmanageable (even if “most” packages only use a few of that number) and I’d like to know how other package managers handle that.

6 Likes

Is there an allegory to platform triples with variant namespaces? From reading the above, it seems the current suggestion is that each vendor (eg ARM, NVIDIA, etc for GPUs) would have their own dedicated namespace. Could we instead express this as gpu_api? For CPUs, we have far more variety but a standard way of expressing this, without new namespaces for Intel, ARM, AMD, RISC, etc.

I might be misunderstanding things, or this might’ve been considered and rejected already, but I’m not sure I understand why the number of variant axes is so high — I had naïvely expected eg GPU API version, BLAS type, perhaps supported chipset instructions — single digit ‘namespaces’, but each with many different possible properties that a resolution algorithm could match up.

A

Yes there are MANY examples of prior art.

If you want to look in the python world - conda virtual packages are one of the reasons people adopted conda forge in the scientific compute space.

Spack was another attempt taking everything on the opposite side (what if we just rebuild everything at install time) with compiler flags “local specific”.

Docker had to implement a similar concept Multi-platform | Docker Docs because “just containers tags” was absolutely not enough. And ironically this is smthg that became really critical when Apple switched to ARM processors. Over night so many containers had to be rebuilt in “multi platform mode” (apple including x86 emulation immensely helped to smooth the transition).

We will be including an entire section on prior art inside the PEP. We took a lot of inspiration in the design in how these systems work and how we can best adapt them to the “python ecosystem”. No need to reinvent the wheel (or maybe we shall :winking_face_with_tongue:)


What other package managers have done is including that code / vendoring it inside the installers.

We always assumed pip maintainers would be firmly opposed to that idea. Maybe we are wrong about that.

For many reasons:

  • increase the maintenance load on the installer side
  • Getting github issues on code you didn’t write / don’t really know how to fix (I have no idea how to maintain FPGA related code - I would assume so as most people).
  • Deciding how to place the bar “what should be or should not be included inside the installer” can create real tensions.

So all of these points considered we always assume it was better to just “build smthg separately”. Now if we wanted to build an allow list of “approved plugins” or vendor them inside installers.

  1. I don’t think the PEP needs to go on that ground. The PEP is defined around the technical “mechanism”. Installers can totally have freedom of design in how they deal with them. I wouldn’t be surprised different installers take different strategies.
  2. If we really want to go on the “governance ground” in the PEP. I would prefer to split up in two as you and @barry suggested a little before.
3 Likes

I would just like to remind everyone that we should be taking everyone’s posts in good faith.

That people have real concerns about how this interacts with existing expectations and the complexity of this solution.

And that people haven’t put all this effort into developing this propsal and prototyping it to solve a none issue or are just ignoring existing solutions.

The worst outcome for me would that this creates a split, where variants could be hosted on alternative indexes to solve real problems customers might have, but then standard tools can’t interact with those variants.

It might make sense to wrap up the existing discussions soon so that the propoants have time to think about and integrate feedback, rather than expecting them to defend all details continuously across dozens and dozens of posts in a short time frame.

13 Likes

At what level of hardware feature complexity should we just tell users “build from source if you need the most optimal option?”

This is a serioius question, this many axes, should people insist they really are needed, will result in nearly impossible to humanly audit and test build outputs

4 Likes

Note that this problem exists regardless of whether it’s due to vendoring or dynamic plugins (since it is the wrapping installer that reports the runtime error either way).

At least with vendoring, the installer authors know which of their selector plugins have been updated recently.

2 Likes

I believe it’s an artifact of the design. If you design plugins to be independent from each other. (Which as @rgommers pointed out… You pretty much have to at least for legal reasons). Well the only way to guarantee no name clash is the namespace :: name :: value design.

This is very much what Spack is for. And exactly why people have adopted it.

I personally believe the PEP should give tools for people to do what they need and try to avoid - when possible - to tell them what to do. Because ultimately these limits are artificial. If anything, our prototype proves it’s technical possible to really support any wild idea people may have (though doesn’t mean it should be supported - but we do need a good reason to actively stop it).

I do like the concept of “allow listing” the most “prominent use cases” as they grow and they become “prominent”. I sincerely believe variants will open doors and use cases we can not foresee today and having a design that can evolve and adopt these new use cases as they grow is a really strong argument in favor of the design (or smthg similar) we currently have.

I really think we (WheelNext) collectively wanted to avoid a proposal that would increase the maintenance burden on installers or organizations like PyPA / maybe-soon-to-adopted packaging Council. Now honestly, if the community believes it’s the way forward, then so be it :ok_hand:

Very fair point. I think it was more a question of respect for us to not propose a design that increase the work for someone else. But rather let them offer that option themselves if that’s something they prefer. I’m not sure I would have personally much appreciated if someone was to propose smthg and tell me “because of my proposal you have to do X much more now”. I’d have much more appreciated being the person fronting the idea - if that’s something that matters to me

3 Likes

Sorry for the repeating posts - I’m on my phone really hard to reply to multiple people in one go.

First of all, thank you Damian. On a personal level I sincerely appreciate your message and participation in this thread.

I will be trying to produce a summary of the different points made in this thread - if anything for purely selfish reasons (but not only) - it helps immensely to write a PEP that already start providing some answers to the concerns highlighted here.

And yes at some point we will need to focus on the actual PEP which is reasonably hard to do when DPO is at full speed, though to be honest, I consider this discussion we are having an essential step to understand what the community is worried about and how we can do a fair and best attempt to address these concerns

This PEP will have without any doubt many many iterations. So it’s a matter of finalizing a first draft that we can push on github and start refining over time.

I’m even willing to organize a live “Q&A” or “Recorded session where we go over the PEP and highlights the key points”. I do believe it will help people to have a way to quickly ramp up on the topic without reading a very long document. Let us know what will help - and we’ll try to make it happen.

6 Likes

For the record, I don’t think there’s that much technical difference between “allowlisting” plugins and expecting installers to deal with automation. In the end, I don’t think installer authors would be writing all the logic themselves (except perhaps for the most common / easily available use cases), but rather relying on third-party code to supply that logic. So what we’re effectively talking about is the difference between each installer vendor separately auditing all the libraries needing to implement the logic vs. having a central body that audits provider plugins.

2 Likes

Selectors seem reminiscent of build backends. Which is a system which works pretty well, but has some points of friction which I’d like the PEP authors to think about. Most particularly, you don’t control what version of a build backend a consumer may use, so new build backend versions can break things (Cython 3.0 comes to mind), and trying to specify reproducible installs requires pinning.

I’m therefore interested in the way that build backends may be unpinned in metadata, but then pinned to specific versions for install reproducibility via PIP_CONSTRAINT.

How are selector plugins expected to evolve and change over time, if they are packages?
Are selectors expected to execute in an isolated environment? Will we expect a --no-build-isolation option for this for installers?

The security considerations are partially addressed if it is at least possible to constrain the “selection environment”.


Looking at pip-compile, there are options for extracting the build dependencies for exactly these sorts of needs. I would expect to need similar options to extract selector plugins packages.

6 Likes

It goes beyond being “reminiscent” of build backends … We purposefully decided to adopt as many design intents and cues from the build backend design and API - specifically because it’s already a design that was approved by the community and something people are used to.

[variant.default-priorities]
namespace = ["custom_namespace"]

[variant.providers.custom_namespace]
requires = ["package-name>=0.0.1,<1.0.0"]
plugin-api = "package_name.module:PluginCls"

We even took inspiration on how build isolation is working :slight_smile:

So if anything - I’m glad it “feels reminiscent” because that’s very much a purposeful intent.

4 Likes

I think a standard approach to pinning should be via lock files, though the standard lock file still needs to be extended to build backends.

The exact UX of constraints should likely be left up to the tools, for example pip is likely to decouple install time and build time constraints soon: Add build constraints by notatallshaw · Pull Request #13534 · pypa/pip · GitHub. I would imagine if this propsal passed as is it would make sense to add an additional variant constraints.

3 Likes

Yes, I agree with all that has been said about the spec not dictating the implementation. But the proposal itself should have some notion of how tools might handle the situation.

Regarding locks, a multi-environment lock could hold many variants, so the selector would be needed at install time. Controlling which version of the selector gets used determines whether or not the lock really results in a reproducible install.

And multiple packages in such a lock could have conflicting requirements for their selectors. I think there’s some nontrivial gap in the current DX around build backends. Doubling down on it is actually fine, IMO – the spec doesn’t have to solve everything – but I’d like people to look at these problems and at least think through what happens when PIP_CONSTRAINT becomes a de-facto standard way to pin build backends and selectors.

2 Likes

I know very little about conda. Could you give me a pointer to some sort of example that shows how conda handles the sort of “20-30 different namespaces” problem that you referred to here?

Isn’t that precisely the opposite of this proposal - they explicitly avoid dealing with binary distribution, so they don’t have the problems we’re trying to solve here?

From a very quick read of that document, it looks much more like the “fat wheel”/“dynamic dispatch” approach that @pitrou was suggesting (which the wheel variant people seem to disagree with).

Awesome - I look forward to seeing it (and sorry if the above felt like a point-by-point rebuttal of the examples you gave, I’m genuinely glad to know this has been thought about, and I’ll be glad to have the gaps in my knowledge around prior art filled :slightly_smiling_face:). In the interests of keeping things a manageable size, I’d recommend focusing mainly on prior art that implements solutions similar to what you’re proposing (dynamic selector plugins invoked at runtime to choose which binary artifact to download and install) - those would be much more helpful (IMO) than examples that demonstrate alternative approaches which wouldn’t suit Python[1].

Speaking personally, I’d be against including the code (in the sense of being responsible for it ourselves) but vendoring is much more plausible.

Your points all apply, but if I’m trading them off against the risks and problems around adding a mechanism whereby pip downloads and installs plugins on demand, based on package metadata, then I’d be willing to consider vendoring.

There would be some concerns, of course:

  1. How much would this increase the size of the pip wheel (both in bytes, and in “number of vendored dependencies”)? You were talking 20-30 possible variables - Adding 20-30 new vendored dependencies to pip is a lot.
  2. The libraries would need to conform to pip’s vendoring requirements (most critically, only pure Python code is allowed).

We could avoid these issues, at the cost of a less user friendly approach, by requiring selectors to be 3rd party packages, and expecting the user to install any needed selectors before running pip.

I’d like to believe that, but in practice I think it’s going to be impossible to avoid a pressure for installers to implement some level of “minimum expected” behaviour. And I think that’s entirely reasonable. Therefore, not saying what that “minimum expectation” is, simply pushes the responsibility onto installer maintainers to make that judgement - and you’ve already made the point that installer maintainers don’t have the expertise necessary to make informed choices in this area (a statement that I agree with!)

To be specific, on a purely personal basis, I don’t understand why all of this is such a big deal. I’ve never found the current state of affairs with numpy/scipy and BLAS to be a problem, and while I’ve not used torch or any of the other GPU-intensive libraries, I feel that if I did, needing to point pip at the correct index to pick up the libraries that are optimised for my system wouldn’t be that much of an ask. So for me, as a user and as a pip maintainer, I’d be inclined to do the bare minimum to support wheel variants. And in the absence of a standard that says “this is the absolute minimum needed to provide a good user experience for a broad range of users, and installers must implement at least this level of functionality” I’d push back hard on any PRs that added significant extra complexity to pip in order to support extra wheel variant functionality.

I don’t think that is where we want to be - I think the proposal should require specific designs from installers. I’m not talking about UI (command line flags, defaults, etc.) but I do mean everything else - should selectors be downloaded and invoked dynamically, or should they be static based on a fixed whitelist? Should the user pre-generate a “selector values” metadata file for the environment, or should installers generate this on demand?

I’m happy for the PEPs to be split up into “technical mechanism” and “installer features”, but I’m not comfortable with the latter being merely advisory or informational. Apart from anything else, users have a right to know what they can expect from standards-compliant installers, without having to research every tool individually.


  1. Which is what the list you posted above felt like, if I’m honest ↩︎

9 Likes

Here’s some context about virtual packages work in conda-land[1]. The situation is simpler because it’s a single tool[2] for a single ecosystem, and those detection capabilities are vendored into conda itself. In turn, conda doesn’t cover nor distinguish the full set of 20-30 different dimensions that would conceivably become necessary if this is solved through plugins, and in a larger ecosystem to boot.

We really have all the freedom to design this in a way that satisfies people’s concerns. I think people are overstating the severity, it’s not like any single organisation or user can reasonably audit what’s in pytorch v2.8.0 and all its dependencies. The corporate solutions I’ve seen at best do some basic filtering (+caching) and CVE scanning.

So if (for example) pytorch adds another dependency on a plugin that does CUDA-variant detection, how does that change the calculus? I don’t see how it does TBH. The only annoying thing would be that the docs have to tell you to install the plugin first, so that when you install pytorch, the plugin can actually do its job[3].


  1. though it isn’t merged, it describes the situation very well, including e.g. how to override things where necessary. It’s a specification after the fact, in the sense that this is implemented in all conda clients and in heavy use throughout our ecosystem. ↩︎

  2. counting conda, mamba and pixi as one for argument’s sake; they all do the same ↩︎

  3. and then display a warning if you installed pytorch without the plugin that you should install the plugin and then reinstall pytorch to get the best-suited option to your system, etc. ↩︎

3 Likes

FWIW, if you’d asked me a couple of years ago (before I started on the Python component distribution prototype for LM Studio that became venvstacks), I would have felt much the same way (that is, why isn’t an approach like piwheels.org adequate?).

In practice, the combinatorial explosion ends up being substantially worse, so even if a project like PyTorch can feasibly host variant wheels across at least some of the feature combinations they care about, and have at least some users put in the effort to configure their systems to use those, it’s impractical for most projects to do the same (and even if they did, we’d be getting back into an --allow-external type situation, just aggregated in local user index config instead of centrally in PyPI file hosting links).

Relying on separate indexes also shifts the private index hosting situation from “run a private index for your organisation” to “run one for each hardware acceleration configuration you care about optimising for”.

Edit: Discourse mobile post editor lost the plot, so this was initially posted half finished

6 Likes

FWIW, we don’t explicitly avoid binary distribution. We support it. See, e.g., cache.spack.io. Spack’s solver finds binaries from build caches (or from existing installations) and preferentially reuses them as long as they satisfy the user’s request (which can include constraints on variants). If it has to build, it solves for a build configuration that satisfies the user’s request. There is more on that in this paper.

We also built and spun off a library for describing microarchitecture compatibility – archspec. We use to deal with arch constraints in the spack solver (“will this binary be compatible with the host?), and I believe conda now uses it for virtual packages.

I would say in the spack universe we’re at the point where we support too much combinatorial-ness in the public build cache, and the current direction we’re taking is to pare it down so that we have a decent matrix of builds for hitting most major CPU and GPU platforms without losing performance. Users can still obviously build from source if that doesn’t work for them. If our survey is accurate, currently around 50% of users use their own local build caches and around 15% use the public ones. We want to increase that.

Anyway, this is very much a problem we care about in the Spack world – enabling binary installs in most environments without losing performance.

7 Likes