Idea: selector packages

This was the scenario I had in mind when first proposing this, and I think there’s a way you can emulate it already with a bit of refactoring (and potentially an extra manual step for users, which is what a “selector package” would be necessary to automate):

  • Assuming you have package spam-{discriminator} today, where users will install spam-cu12 or spam-cu11 or spam-avx2 etc.
  • Refactor as much hardware-independent code as possible into spam-core
  • Refactor hardware-specific code into spam-{discriminator} (where it presumably is today, along with the independent code)
  • Make spam-{discriminator} require spam-core
  • Teach spam-core to attempt import and initialize of the supported modules in some priority order
  • Create spam which requires spam-core and spam-{lowest common denominator}


  • users can install spam and have a working CPU-only environment
  • they can later install spam-cu12 to “light up” CUDA 12 support
  • distributors/admins can install as many spam-{discriminator} as they like for broadest support
  • users might get a warning on first import if spam-cu12 is there but their system has no CUDA 12 support. But you’re in control of your user experience here, so can suggest specific actions
  • later, if selector packages ever happen, you replace spam with one that can check and install the specific package needed.

Now, I admit I’m not intimately familiar with how or why this may be a bad idea for CUDA. I have the impression that if it were this easy, someone else would’ve done the pattern already. But I can’t imagine there isn’t some sensible boundary that can be drawn to put the lowest-level Python APIs on one side, and keep the hardware- and version-independent code on the other.[1]

It should get you out of totally horrible dynamic names as well, and into the realm of not-so-horrible dynamic names :smiley: It may work out nicer to have each specific platform as a separate package anyway - that’s what I’ve seen others do. Most contributors, you’d hope, are not having to touch them and so can just install the ones they need for their own development/test.

  1. Much like how numpy/scipy refactored many of their FORTRAN dependencies into a CPython-independent binary. ↩︎


Thanks for the thoughts, Steve! I don’t think that model fits CUDA very well. Let me elaborate on the two kinds of discriminators I can see right now.

  • Ones that can coexist, like CPU optimization levels, as mentioned above. I think these fit well with your particular scheme.
    • You could also potentially have a GPU option in the mix of CPU options.
    • This kind needs a sort of priority. Even though they can coexist in a given environment, only one can be active for a given package at one time.
  • The other kind can’t coexist. CUDA versions are in this class. CUDA 11 packages can’t be used with CUDA 12 packages. You only have 1 CUDA runtime, and it’s only going to be compatible with at most one of them.

There’s a couple of ideas conflated here:

  • capability
  • requirement

The “capability” aspect gives implementation flexibility within one package. The “requirement” aspect enforces conformance across multiple packages in an environment.

Are these similar enough to share one conceptual implementation (“selectors”) or should there be distinct implementations?


I guess I’m not sure what goes into a CUDA 11 vs a CUDA 12 package. Can they not both be installed (i.e. extracted from a wheel) onto the same machine, even if one will not load? And can we detect that load failure at runtime, or is it a guaranteed process termination? (They have different package names, of course, but the user-visible spam module handles from _spam_cu12 import ... vs from _spam_cu11 import ... and users don’t have to think about the specific one apart from installing it.)


Your suggestion assumes that there is a valid CPU-only version of the package. If that doesn’t exists, i.e. there is no lowest common denominator that can be installed by default, spam can’t depend on anything useful and the normal installation wont work.

Then normal installation won’t work. There’s not really anything anyone else can do about it either.

But what would work with this structure in those circumstances is a distro/admin that wants to just include all the varieties, so that if any of them will work, the distro will work.

Right now, this option doesn’t exist, so we can’t even create a container image with everything that will work with whatever hardware is there when it runs. We have to create one container image per possible hardware configuration, and multiply that out by every package that has different needs.

The key change here is selecting the right driver set at runtime from whatever is installed, rather than forcing a user to figure it out pre-installation.


The file structure is identical between the two packages. They differ in that one is linked to CUDA 11, and the other is linked to CUDA 12. For example: cudf-cu11 · PyPI and cudf-cu12 · PyPI. As such, you can’t have both - they would clobber one another in an environment.

You could certainly make each of these have a different import name, and create a new “loader” frontend that would try different options. I think the general pattern of having the same python file layout linked to different library options is common enough that the loader frontend boilerplate would be cumbersome.

That’s why I mentioned “refactoring” a few times :wink: I was under no illusions that you could just do this with the existing package layouts.

Those packages you linked are empty, which doesn’t help me give you better suggestions of how a refactoring might look, but I assume there are some .pyd/.so modules in there that have the actual native dependency. If you distribute those separately and all[1] the pure Python in the “loader” frontend, you’re really just looking at your internal imports coming from the different packages.

If you’re currently instructing users to directly import from native modules, they may end up with a level of indirection. No big deal, it only happens at import time, and I’ll gladly guarantee that most users would rather spend an extra millisecond or so importing names than have to choose the right package themselves.

  1. All, most, or some. ↩︎


Sorry, here’s the files for cudf-cu12 and cudf-cu11

If I’m understanding you correctly, you still need runtime detection of which package to use, which assumes that said package must be present on the system. Users would either need to allow the loader to download the correct package, or otherwise manually download the correct package ahead of time. You could think about installing all variants and let the selector choose, but you’re talking about 500 MB-1GB per package. It would much smaller if they are using common shared libraries (hello PEP 725!).

Having each implementation be a different package is also making a mess on PyPI. Can there be a way to group implementation packages underneath their common “loader” project? This feels like info for a platform tag, but I can see why that isn’t open-ended. Is there another place we can stuff arbitrary distinguishing metadata, not in the project name?

I think the project name is fine, because you can have a dependency back to the platform-neutral part and so installing spam-cu12 gets you spam as well. There’s a rough convention of namespacing like that, so as long as you have the same “prefix” it’s not really a big deal - you wouldn’t want anyone else taking it anyway, it’s real tempting for users to trust things that start with your name.

Yeah, so without the “selector packages”[1] proposed here, users would have to manually install it. So pip install spam spam-cu12 or potentially just pip install spam-cu12. I personally wouldn’t make spam do the install itself - hence suggesting you would bring in a lowest-common denominator helper by default, so that it works, but installing an additional package provides acceleration.

With the proposal here, that could switch to let the initial install run some code to determine which other package it needs and add it to the requirements list. So pip install spam instructs pip to actually install spam-cu12, because it has a chance to look at the machine and see that -cu12 would be the best choice.

But without this proposal, users already have to install a specific package. There’s no change to their experience by splitting up the packages to install side-by-side. However, there is a change for those who are more sensitive to install matrices than disk space - a few extra GB in a fixed virtual machine or container image is often preferred over a dynamic install on first use.

  1. A packaging/install-time concept, not an import-time concept. ↩︎


Can you factor out those shared libraries into a separate, common wheel that isn’t linked against a specific CUDA version? Then, by “simply” controlling the loading order of those libraries (on Linux and Windows), everything should work. At least that’s my understanding after reading this post describing how the RobotPy project does it.

Thanks for that reference, Jeremy! I had missed it back then. There was so much great knowledge in there from @virtuald @steve.dower @njs and the rest. My ultimate conclusion from that thread was summed up well by @barry-scott and @rgommers

It’s not that conda, Fedora, or Debian have better tools. It’s that conda, Fedora, Debian, et al. are coherent, organized groups with consistent build practices. There’s a lot of compatibility “metadata” tied up in consistent build practices. Have you ever tried to mix more than a couple of PPA’s or similar that have package overlap?

Can NVIDIA get away with the approach of pushing CUDA into a separate wheel, and then using RPATH or similar to make RAPIDS libraries work? Probably, since NVIDIA controls all the parts. What does it mean for other projects that might want to utilize CUDA?

My motivation ultimately is how to loosen the need for tight control of package systems and allow more interchangeable parts. I worked on the Conda and Conda-forge ecosystems for a while, and slowly developed tools for improving compatibility. The notion of “run_exports” (originally named “pin_downstream”, originating in WIP Build customization by msarahan · Pull Request #1585 · conda/conda-build · GitHub) was especially helpful. It basically does the dependency injection described by @njs:

Even with all of that, it was very easy and common to have accidental breakage with package updates, especially when both Anaconda defaults channel and conda-forge were involved. My conclusion from all of this is that we were/are not capturing compatibility data adequately. I see selector packages, PEP 725, et al. as tools to either capture or utilize “fingerprint” metadata that first and foremost will prevent using incompatible things together. Things like selectors are ways to make decisions based on metadata. If we can understand and feed all of the different compatibility facets into the solver (pip or otherwise), I believe we can improve the shared library situation in a safe and reliable way.

Worst case scenario is that you just end up not being able to share libraries broadly, but at least you’ll understand why they aren’t compatible.

EDIT: I was remiss in not mentioning ABI Laboratory, by Andrey Ponomarenko, which was truly essential in figuring out what compatibility bounds should be for each package.