What to do about GPUs? (and the built distributions that support them)

pf_moore · May 17, 2024, 9:47am

InstallCommand won’t work as expected in a modern installer, because the setuptools install subcommand is never used in the PEP 517 process flow.

msarahan · June 3, 2024, 5:42pm

Cross-linking Implementation variants: rehashing and refocusing, but feel free to ignore that thread if it is not a productive contribution to this topic.

bwoodsend · June 13, 2024, 8:23pm

Would it be such a bad thing to have the selector package but let it do its work at runtime rather than during a pip install? I know this is against the status quo but say we had a pip install-able bootstrap package which doesn’t contain CUDA but instead contains some initialize_cuda(version) function. Dependent code can call the function as boilerplate at the top of their code and the function will, the first time, download the appropriate libraries into somewhere external to the Python environment (and writable!) like ~/.cache/python-cuda/{version} then subsequently just return immediately.

The benefits would be:

The bootstrap package is free to be non pure Python and can therefore call the magic C++ APIs to figure out what hardware it has available (as before)
Pip/packaging/importlib/PyPI don’t need to know or care what’s going on with GPU tags nor invent a new tag standard to accommodate them
The CUDA binaries don’t need to live on PyPI
The boostrap process will do the right thing and rerun itself if someone swaps out their GPU for an incompatible one
If CUDA do ever find a way to slice their binaries up so that you can have the bits you need without dead weight or compatibility code (at least I’m guessing that’s why they’re currently so unbelievably huge) then the bootstrap package should be easily adjustable to the refined DLL selection process
This package should be narrowly scoped enough to be comfortably maintainable by a small group of people who know their way around the world of GPU compatibility
If multiple versions of a CUDA-dependent package use the same CUDA version then you get a nice bandwidth saving since you don’t need to redundantly re-download CUDA (I’m somewhat biased on this one since, in my rural world of sheep and hay fever, I only get ~500kB/s bandwith)
End user packaging tools (e.g. PyInstaller/Nuitka/py2exe) don’t then immediately inherit the problem of having to process these new GPU tags with the inevitable request that packagers will want to combine the contents of multiple CUDA wheels into one fat package so that it’s not locked to the same hardware as whoever built the application (I’m very biased on this one since I co-maintain PyInstaller and I already want to crawl into a dark corner as soon as I hear the word GPU)

It does mean that something like pip freeze isn’t giving you the full picture anymore though…

(Apologies if this is all very naive)

brettcannon · June 20, 2024, 5:55pm

Maybe I’m misunderstanding the intent, but the way I’m reading this is that users would be expected to install an sdist so that a build backend it specified could either build or download the appropriate wheel/binary to get the most optimized build w/o downloading some fat binary?

And because I’m still catching up on stuff after parental leave, I’m going to be a bit lazy and ask how does conda determine what to install GPU-wise? Does it have its own algorithm that gets updated for each GPU vendor and which it runs at install time?

msarahan · June 21, 2024, 12:07am

Welcome back! Your laziness is totally understandable, at least until the kid(s) starts sleeping through the night. You may find my summary post helpful as a starting point, but there’s a ton of good discussion that you’ll eventually need to read.

That’s the way that NVIDIA is doing things now. Looking forward, it seems like there is a general desire to avoid sdists as much as possible, so we need to find a different way. The different way most likely means more environment markers recognized by installers. Because it is desirable for these markers to not be hard-coded into install tools, most (all?) designs here include some kind of component that users must install, which would provide custom environment markers (and maybe other metadata).

Conda represents system state metadata with “virtual packages”: Managing virtual packages — conda 24.7.2.dev36 documentation

These behave the same as packages and participate in solving environments. These started out as implementations in conda itself (conda/conda/plugins/virtual_packages at 82bcb12633cbb3fd0c0837c6f8fc89a5918d0c7e · conda/conda · GitHub), but a plugin mechanism was developed later: Plugin mechanism for virtual packages · Issue #10131 · conda/conda · GitHub

Pip might be able to take this approach, but pip would probably need to take all constraints in an environment into account before this would work (Implementation variants: rehashing and refocusing - #63 by pradyunsg)

pf_moore · June 21, 2024, 5:17am

Just to expand on that, there is a desire to avoid sdists in general, because they allow arbitrary code execution.

Given that the “selector” proposal has included suggestions that might add arbitrary code execution to wheel installs, I for one would be interested in comparing it with a sdist-based solution (which wouldn’t involve a change to the risk profile).

steve.dower · June 21, 2024, 9:49am

The main regression/downside of the sdist approach is that it would almost certainly add a reliance on servers other than PyPI^[1] and would probably complicate creation of a wheelhouse for transfer/installation elsewhere (i.e. pip wheel -r reqs.txt would require some extra override to instruct the sdist to choose the desired option, though this may be true regardless, hence “probably”).

Downloading runtime files on first use (or with a manual command after install) is likely a better option than the sdist. This is how nltk has handled its datasets for a long time now. But all the options suck a little bit, which is why we’re trying to find something better.

The difference in multiplexing is interesting though:

installer considers all possible variants, eliminates those that won’t work (aka virtual packages/dependencies)
installer considers one package, which then pulls in the variant that will work (aka NVIDIA’s sdist, selectors, nltk)

Unless you abused PyPI for file storage, I guess… ↩︎

brettcannon · June 21, 2024, 9:55pm

Yep, I’m still working my way through the backlog here (and on GitHub), although I just finished the other thread.

Who chooses the plug-in implementations? The packages themselves, the users, or conda?

That’s the trend I’m seeing in the comments as well. For me the question is who chooses what code to run and when does it run?

msarahan · June 24, 2024, 2:48pm

I’m not aware of any virtual package plugin that is outside of the set shipped with conda. The conceptual ideal being discussed here is that installer maintainers should not be saddled with maintaining any virtual package capability, and that means providers like NVIDIA maintaining plugins, which means end users choosing to install the NVIDIA plugin.

Having that extra step is a pretty major sticking point among people we’ve talked to at the JAX and PyTorch projects. Some way of having an “approved set” or a default set that can ship with installers would be nice. I think it’s worth discussing whether this approach is even remotely workable, and if so, how would we approach the problem of who approves “plugins” (term used loosely, may not be an actual plugin), and what process does approval mean?

I think I covered it above, but to reiterate: no one has come up with a plan that avoids needing the end user to make an explicit decision to install a “plugin.” This is good from the standpoint of someone who wants more control of what their computer does, but bad from the standpoint of publishers who want the absolute lowest barrier to entry for their users.

steve.dower · June 24, 2024, 3:15pm

This is a distro, and anyone can (and should!) do them. We encourage it, we just really don’t have a mechanism to endorse all of them, and so we play it safe by carefully endorsing none of them.

I’d love to see a world where the things you get from python.org are just the first building block, and you’re taking responsibility to build the rest of the environment yourself, including adding any plugins you want (or your users will want). This is already the situation we have for Linux, and since you don’t have to be a full OS distributor to make a “sumo installer” for Windows or macOS, it ought to be easier.

(To go one step deeper, I personally believe the best success would be for distros based around the major frameworks, rather than trying to remain “independent”. So you would get a full installer for PyTorch that includes everything up to the tools that you would use to run it, whether GUI or CLI. That seems to best serve users’ needs while also keeping the veneer of “official” that people seem to crave.)

oscarbenjamin · June 24, 2024, 3:48pm

The wheel selector suggestion was intended to make the process automatic much like the the prior selector package idea that it references. It was pointed out though that in many situations there is a need for users to control the selection.

Just having variant wheels and an explicit way to select between them sort of seems like a prerequisite to designing anything that would give automatic wheel selection so discussion moved off in that direction. I think it is very much the case though that there needs to be a way to handle this automatically for most users.

barry · June 24, 2024, 5:00pm

Over in the other thread I propose defining a config file (TOML) and recording file (JSON) as the sources of truth. I’ll leave it to that other post for details, but if we define file formats, and potentially leave it to venv creation time to generate the “static” parts of these, then maybe we don’t need to define a plugin API, and running some platform detection tool could be left as a convenience (one that likely still requires installing some code, but you wouldn’t necessarily have to do that if you knew what you wanted).

gpshead · June 24, 2024, 7:19pm

Some perspective: RHEL9 already requires x86-64-v2 hardware.

x86-64-v2 only hardware is over a decade old. v3 hardware became the norm 7-8 years ago. v2 is a quite conservative safe choice for anyone shipping binaries today.

I expect some PyPI wheels are already being built using x86-86-v2 today without indicating it anywhere. Unresearched speculation: Some may have gone ahead and already ship wheels built using x86-64-v3 flags.

ncoghlan · June 27, 2024, 1:27am

Marginally more researched speculation: if someone builds wheels with their distro flags incorporated they’ll start building v3 wheels once their distro drops support for v2 hardware (I haven’t checked, but it wouldn’t surprise me if Arch or another leading edge distro was already assuming v3)

We also can’t rule out gcc distributors bumping their default x86-64 arch up to v2 (or even v3) instead of keeping their own default build flags elsewhere:

I don’t know if there is an easy way to get this kind of target info out of ELF binaries (short of scanning every opcode looking for the newer ones)

ofek · July 19, 2024, 3:27am

When Hatch downloads Python distributions on Linux the default is v3 when detection fails.

mayeut · February 2, 2025, 1:57pm

I’ve been digging into this a bit with RHEL 9+ moving to x86-64-v2

The info can be there but it needs to be declared with https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-mneeded.

Scanning every opcode probably wouldn’t help as we’d need to know if it’s required or conditionally dispatched.

Some more info on this detection in manylinux_2_34 x86-64 builds produce binaries that are not compatible with all x86-64 CPUs · Issue #1725 · pypa/manylinux · GitHub & feat: add GNU_PROPERTY_X86_ISA_1_NEEDED detection by mayeut · Pull Request #535 · pypa/auditwheel · GitHub