Idea: selector packages

steve.dower · February 12, 2024, 9:31pm

This was the scenario I had in mind when first proposing this, and I think there’s a way you can emulate it already with a bit of refactoring (and potentially an extra manual step for users, which is what a “selector package” would be necessary to automate):

Assuming you have package spam-{discriminator} today, where users will install spam-cu12 or spam-cu11 or spam-avx2 etc.
Refactor as much hardware-independent code as possible into spam-core
Refactor hardware-specific code into spam-{discriminator} (where it presumably is today, along with the independent code)
Make spam-{discriminator} require spam-core
Teach spam-core to attempt import and initialize of the supported modules in some priority order
Create spam which requires spam-core and spam-{lowest common denominator}

Now:

users can install spam and have a working CPU-only environment
they can later install spam-cu12 to “light up” CUDA 12 support
distributors/admins can install as many spam-{discriminator} as they like for broadest support
users might get a warning on first import if spam-cu12 is there but their system has no CUDA 12 support. But you’re in control of your user experience here, so can suggest specific actions
later, if selector packages ever happen, you replace spam with one that can check and install the specific package needed.

Now, I admit I’m not intimately familiar with how or why this may be a bad idea for CUDA. I have the impression that if it were this easy, someone else would’ve done the pattern already. But I can’t imagine there isn’t some sensible boundary that can be drawn to put the lowest-level Python APIs on one side, and keep the hardware- and version-independent code on the other.^[1]

It should get you out of totally horrible dynamic names as well, and into the realm of not-so-horrible dynamic names It may work out nicer to have each specific platform as a separate package anyway - that’s what I’ve seen others do. Most contributors, you’d hope, are not having to touch them and so can just install the ones they need for their own development/test.

Much like how numpy/scipy refactored many of their FORTRAN dependencies into a CPython-independent binary. ↩︎

msarahan · February 13, 2024, 3:21pm

Thanks for the thoughts, Steve! I don’t think that model fits CUDA very well. Let me elaborate on the two kinds of discriminators I can see right now.

Ones that can coexist, like CPU optimization levels, as mentioned above. I think these fit well with your particular scheme.
- You could also potentially have a GPU option in the mix of CPU options.
- This kind needs a sort of priority. Even though they can coexist in a given environment, only one can be active for a given package at one time.
The other kind can’t coexist. CUDA versions are in this class. CUDA 11 packages can’t be used with CUDA 12 packages. You only have 1 CUDA runtime, and it’s only going to be compatible with at most one of them.

There’s a couple of ideas conflated here:

capability
requirement

The “capability” aspect gives implementation flexibility within one package. The “requirement” aspect enforces conformance across multiple packages in an environment.

Are these similar enough to share one conceptual implementation (“selectors”) or should there be distinct implementations?

steve.dower · February 13, 2024, 4:42pm

I guess I’m not sure what goes into a CUDA 11 vs a CUDA 12 package. Can they not both be installed (i.e. extracted from a wheel) onto the same machine, even if one will not load? And can we detect that load failure at runtime, or is it a guaranteed process termination? (They have different package names, of course, but the user-visible spam module handles from _spam_cu12 import ... vs from _spam_cu11 import ... and users don’t have to think about the specific one apart from installing it.)

MegaIng · February 13, 2024, 4:55pm

Your suggestion assumes that there is a valid CPU-only version of the package. If that doesn’t exists, i.e. there is no lowest common denominator that can be installed by default, spam can’t depend on anything useful and the normal installation wont work.

steve.dower · February 13, 2024, 4:59pm

Then normal installation won’t work. There’s not really anything anyone else can do about it either.

But what would work with this structure in those circumstances is a distro/admin that wants to just include all the varieties, so that if any of them will work, the distro will work.

Right now, this option doesn’t exist, so we can’t even create a container image with everything that will work with whatever hardware is there when it runs. We have to create one container image per possible hardware configuration, and multiply that out by every package that has different needs.

The key change here is selecting the right driver set at runtime from whatever is installed, rather than forcing a user to figure it out pre-installation.

msarahan · February 13, 2024, 5:04pm

The file structure is identical between the two packages. They differ in that one is linked to CUDA 11, and the other is linked to CUDA 12. For example: cudf-cu11 · PyPI and cudf-cu12 · PyPI. As such, you can’t have both - they would clobber one another in an environment.

You could certainly make each of these have a different import name, and create a new “loader” frontend that would try different options. I think the general pattern of having the same python file layout linked to different library options is common enough that the loader frontend boilerplate would be cumbersome.

steve.dower · February 13, 2024, 5:09pm

That’s why I mentioned “refactoring” a few times I was under no illusions that you could just do this with the existing package layouts.

Those packages you linked are empty, which doesn’t help me give you better suggestions of how a refactoring might look, but I assume there are some .pyd/.so modules in there that have the actual native dependency. If you distribute those separately and all^[1] the pure Python in the “loader” frontend, you’re really just looking at your internal imports coming from the different packages.

If you’re currently instructing users to directly import from native modules, they may end up with a level of indirection. No big deal, it only happens at import time, and I’ll gladly guarantee that most users would rather spend an extra millisecond or so importing names than have to choose the right package themselves.

All, most, or some. ↩︎

msarahan · February 13, 2024, 5:25pm

Sorry, here’s the files for cudf-cu12 and cudf-cu11

If I’m understanding you correctly, you still need runtime detection of which package to use, which assumes that said package must be present on the system. Users would either need to allow the loader to download the correct package, or otherwise manually download the correct package ahead of time. You could think about installing all variants and let the selector choose, but you’re talking about 500 MB-1GB per package. It would much smaller if they are using common shared libraries (hello PEP 725!).

Having each implementation be a different package is also making a mess on PyPI. Can there be a way to group implementation packages underneath their common “loader” project? This feels like info for a platform tag, but I can see why that isn’t open-ended. Is there another place we can stuff arbitrary distinguishing metadata, not in the project name?

steve.dower · February 13, 2024, 7:36pm

I think the project name is fine, because you can have a dependency back to the platform-neutral part and so installing spam-cu12 gets you spam as well. There’s a rough convention of namespacing like that, so as long as you have the same “prefix” it’s not really a big deal - you wouldn’t want anyone else taking it anyway, it’s real tempting for users to trust things that start with your name.

Yeah, so without the “selector packages”^[1] proposed here, users would have to manually install it. So pip install spam spam-cu12 or potentially just pip install spam-cu12. I personally wouldn’t make spam do the install itself - hence suggesting you would bring in a lowest-common denominator helper by default, so that it works, but installing an additional package provides acceleration.

With the proposal here, that could switch to let the initial install run some code to determine which other package it needs and add it to the requirements list. So pip install spam instructs pip to actually install spam-cu12, because it has a chance to look at the machine and see that -cu12 would be the best choice.

But without this proposal, users already have to install a specific package. There’s no change to their experience by splitting up the packages to install side-by-side. However, there is a change for those who are more sensitive to install matrices than disk space - a few extra GB in a fixed virtual machine or container image is often preferred over a dynamic install on first use.

A packaging/install-time concept, not an import-time concept. ↩︎

jvolkman · February 13, 2024, 10:17pm

Can you factor out those shared libraries into a separate, common wheel that isn’t linked against a specific CUDA version? Then, by “simply” controlling the loading order of those libraries (on Linux and Windows), everything should work. At least that’s my understanding after reading this post describing how the RobotPy project does it.

msarahan · February 14, 2024, 12:27am

Thanks for that reference, Jeremy! I had missed it back then. There was so much great knowledge in there from @virtuald @steve.dower @njs and the rest. My ultimate conclusion from that thread was summed up well by @barry-scott and @rgommers

It’s not that conda, Fedora, or Debian have better tools. It’s that conda, Fedora, Debian, et al. are coherent, organized groups with consistent build practices. There’s a lot of compatibility “metadata” tied up in consistent build practices. Have you ever tried to mix more than a couple of PPA’s or similar that have package overlap?

Can NVIDIA get away with the approach of pushing CUDA into a separate wheel, and then using RPATH or similar to make RAPIDS libraries work? Probably, since NVIDIA controls all the parts. What does it mean for other projects that might want to utilize CUDA?

My motivation ultimately is how to loosen the need for tight control of package systems and allow more interchangeable parts. I worked on the Conda and Conda-forge ecosystems for a while, and slowly developed tools for improving compatibility. The notion of “run_exports” (originally named “pin_downstream”, originating in WIP Build customization by msarahan · Pull Request #1585 · conda/conda-build · GitHub) was especially helpful. It basically does the dependency injection described by @njs:

Even with all of that, it was very easy and common to have accidental breakage with package updates, especially when both Anaconda defaults channel and conda-forge were involved. My conclusion from all of this is that we were/are not capturing compatibility data adequately. I see selector packages, PEP 725, et al. as tools to either capture or utilize “fingerprint” metadata that first and foremost will prevent using incompatible things together. Things like selectors are ways to make decisions based on metadata. If we can understand and feed all of the different compatibility facets into the solver (pip or otherwise), I believe we can improve the shared library situation in a safe and reliable way.

Worst case scenario is that you just end up not being able to share libraries broadly, but at least you’ll understand why they aren’t compatible.

EDIT: I was remiss in not mentioning ABI Laboratory, by Andrey Ponomarenko, which was truly essential in figuring out what compatibility bounds should be for each package.

gpshead · June 25, 2024, 12:17am

Old thread, new + old conversation, and even older references to the problem over the ages… perfect storm. I wanted to link in some other prior art as additional food for thought that this idea has merit. It even uses the “select” term you chose to describe this problem as.

https://bazel.build/docs/configurable-attributes

We designed that config_setting()+select() approach into Blaze at Google in probably 2013-14 (before Bazel was released) as the concept of code that belonged only in some builds was rearing its head from many angles. It is similar to what you describe. Needing the ability to choose which dependencies to select based on external inputs at any given node in the dependency graph. (for most here who I assume are Bazel unfamiliar, think of deps= as being vaguely the same-ish as Python packaging requirements - both systems must create a directed acyclic graph internally)

The “choice” component in Bazel initially came purely from command line flags to the build (ie: imagine a Bazel build flag named --cpu=x86-64-v3 and wanting that to pull in an AVX2 library, or a --config=buzzword-cloud flag to pull in environment specific libraries) which get conglomerated into a name as a config_setting() node (implementation detail) for select() to use.

The concept of a “default” when no selector is specified exists in Bazel. But you can omit a default select config_setting key, which will lead to a build time error if a build led to nothing being selected. This effectively mirrors the “this is only available for cuda12 and TPU builds, never the default cpu-only” concept.

To create an actual no-source “selector” virtual package as described here in Bazel, you create a library node with no sources that only has deps, some of which use select.

Agreed, no code execution is a requirement. The thing I see from a Python package index point of view on this is that the dependency resolution data is still available in graph form from a package index, but it is multi-dimensional in the presence of selectors. With a new parallel dimension added for every selector’s unique selection metrics within such a graph. This is the challenge. But is also what is fundamentally required.

Until you resolve the selectors, you don’t know which graph dimension will ultimately be rendered. But you can still represent it and can get the complete list of all possible dependencies required to use a specific package as a flat list if desired [1]. The subset of that to ultimately use can only be determined once you do selection resolution. Which in a Python packaging world seems to be a pip install or equivalent time phase where the selectors would need to be declared or decided.

(While I’m often not a huge fan of install time auto-detection, that’d be an implementation detail for a given package manager implementors, not a requirement of the overall system design - the requirement would be the ability to let selector usable information be specified up front for use when resolving dependencies)

I don’t know what I wanted to get out of this post other than to connect the concept of dependency selection based on install or build time inputs as implemented in another system (Bazel), and show support for the idea in some form within Python packaging.

[1] Caveat - Beware of this flat list of all possibly needed packages. Providing an API to get it always winds up with people using it wrong. (… ask heavy Bazel users how they know … easy query vs cquery that things really should use …) Real life repeatedly encountered example: A selector that chooses which dependency to use based on licensing constraints for the build (makes sense right?). Then someone else coming along and writing what they think is license compliance tooling that used the simple non-selector configurable flat list API to look at the all possible dependencies view… and raises big red flags on an incompatible license. Or similar for security vulnerability analysis of deps. (should I have put a trigger warning on that real world example?) Tooling processing build graphs really needs to always plumb through relevant selectors lest it give misleading answers… This can be a hard hurdle for a lot of existing users processing what were formerly though of as simple dependency trees “list of lists turned into a flat unique set” to jump over given for so long builds had just been “the default” rather than needing a choice plumbed through in places other than the build itself.