Would it be such a bad thing to have the selector package but let it do its work at runtime rather than during a pip install? I know this is against the status quo but say we had a pip install-able bootstrap package which doesn’t contain CUDA but instead contains some initialize_cuda(version) function. Dependent code can call the function as boilerplate at the top of their code and the function will, the first time, download the appropriate libraries into somewhere external to the Python environment (and writable!) like ~/.cache/python-cuda/{version} then subsequently just return immediately.
The benefits would be:
The bootstrap package is free to be non pure Python and can therefore call the magic C++ APIs to figure out what hardware it has available (as before)
Pip/packaging/importlib/PyPI don’t need to know or care what’s going on with GPU tags nor invent a new tag standard to accommodate them
The CUDA binaries don’t need to live on PyPI
The boostrap process will do the right thing and rerun itself if someone swaps out their GPU for an incompatible one
If CUDA do ever find a way to slice their binaries up so that you can have the bits you need without dead weight or compatibility code (at least I’m guessing that’s why they’re currently so unbelievably huge) then the bootstrap package should be easily adjustable to the refined DLL selection process
This package should be narrowly scoped enough to be comfortably maintainable by a small group of people who know their way around the world of GPU compatibility
If multiple versions of a CUDA-dependent package use the same CUDA version then you get a nice bandwidth saving since you don’t need to redundantly re-download CUDA (I’m somewhat biased on this one since, in my rural world of sheep and hay fever, I only get ~500kB/s bandwith)
End user packaging tools (e.g. PyInstaller/Nuitka/py2exe) don’t then immediately inherit the problem of having to process these new GPU tags with the inevitable request that packagers will want to combine the contents of multiple CUDA wheels into one fat package so that it’s not locked to the same hardware as whoever built the application (I’m very biased on this one since I co-maintain PyInstaller and I already want to crawl into a dark corner as soon as I hear the word GPU)
It does mean that something like pip freeze isn’t giving you the full picture anymore though…
Maybe I’m misunderstanding the intent, but the way I’m reading this is that users would be expected to install an sdist so that a build backend it specified could either build or download the appropriate wheel/binary to get the most optimized build w/o downloading some fat binary?
And because I’m still catching up on stuff after parental leave, I’m going to be a bit lazy and ask how does conda determine what to install GPU-wise? Does it have its own algorithm that gets updated for each GPU vendor and which it runs at install time?
Welcome back! Your laziness is totally understandable, at least until the kid(s) starts sleeping through the night. You may find my summary post helpful as a starting point, but there’s a ton of good discussion that you’ll eventually need to read.
That’s the way that NVIDIA is doing things now. Looking forward, it seems like there is a general desire to avoid sdists as much as possible, so we need to find a different way. The different way most likely means more environment markers recognized by installers. Because it is desirable for these markers to not be hard-coded into install tools, most (all?) designs here include some kind of component that users must install, which would provide custom environment markers (and maybe other metadata).
Just to expand on that, there is a desire to avoid sdists in general, because they allow arbitrary code execution.
Given that the “selector” proposal has included suggestions that might add arbitrary code execution to wheel installs, I for one would be interested in comparing it with a sdist-based solution (which wouldn’t involve a change to the risk profile).
The main regression/downside of the sdist approach is that it would almost certainly add a reliance on servers other than PyPI[1] and would probably complicate creation of a wheelhouse for transfer/installation elsewhere (i.e. pip wheel -r reqs.txt would require some extra override to instruct the sdist to choose the desired option, though this may be true regardless, hence “probably”).
Downloading runtime files on first use (or with a manual command after install) is likely a better option than the sdist. This is how nltk has handled its datasets for a long time now. But all the options suck a little bit, which is why we’re trying to find something better.
The difference in multiplexing is interesting though:
installer considers all possible variants, eliminates those that won’t work (aka virtual packages/dependencies)
installer considers one package, which then pulls in the variant that will work (aka NVIDIA’s sdist, selectors, nltk)
Unless you abused PyPI for file storage, I guess… ↩︎
I’m not aware of any virtual package plugin that is outside of the set shipped with conda. The conceptual ideal being discussed here is that installer maintainers should not be saddled with maintaining any virtual package capability, and that means providers like NVIDIA maintaining plugins, which means end users choosing to install the NVIDIA plugin.
Having that extra step is a pretty major sticking point among people we’ve talked to at the JAX and PyTorch projects. Some way of having an “approved set” or a default set that can ship with installers would be nice. I think it’s worth discussing whether this approach is even remotely workable, and if so, how would we approach the problem of who approves “plugins” (term used loosely, may not be an actual plugin), and what process does approval mean?
I think I covered it above, but to reiterate: no one has come up with a plan that avoids needing the end user to make an explicit decision to install a “plugin.” This is good from the standpoint of someone who wants more control of what their computer does, but bad from the standpoint of publishers who want the absolute lowest barrier to entry for their users.
This is a distro, and anyone can (and should!) do them. We encourage it, we just really don’t have a mechanism to endorse all of them, and so we play it safe by carefully endorsing none of them.
I’d love to see a world where the things you get from python.org are just the first building block, and you’re taking responsibility to build the rest of the environment yourself, including adding any plugins you want (or your users will want). This is already the situation we have for Linux, and since you don’t have to be a full OS distributor to make a “sumo installer” for Windows or macOS, it ought to be easier.
(To go one step deeper, I personally believe the best success would be for distros based around the major frameworks, rather than trying to remain “independent”. So you would get a full installer for PyTorch that includes everything up to the tools that you would use to run it, whether GUI or CLI. That seems to best serve users’ needs while also keeping the veneer of “official” that people seem to crave.)
The wheel selector suggestion was intended to make the process automatic much like the the prior selector package idea that it references. It was pointed out though that in many situations there is a need for users to control the selection.
Just having variant wheels and an explicit way to select between them sort of seems like a prerequisite to designing anything that would give automatic wheel selection so discussion moved off in that direction. I think it is very much the case though that there needs to be a way to handle this automatically for most users.
Over in the other thread I propose defining a config file (TOML) and recording file (JSON) as the sources of truth. I’ll leave it to that other post for details, but if we define file formats, and potentially leave it to venv creation time to generate the “static” parts of these, then maybe we don’t need to define a plugin API, and running some platform detection tool could be left as a convenience (one that likely still requires installing some code, but you wouldn’t necessarily have to do that if you knew what you wanted).
Some perspective: RHEL9 already requires x86-64-v2 hardware.
x86-64-v2 only hardware is over a decade old. v3 hardware became the norm 7-8 years ago. v2 is a quite conservative safe choice for anyone shipping binaries today.
I expect some PyPI wheels are already being built using x86-86-v2 today without indicating it anywhere. Unresearched speculation: Some may have gone ahead and already ship wheels built using x86-64-v3 flags.
Marginally more researched speculation: if someone builds wheels with their distro flags incorporated they’ll start building v3 wheels once their distro drops support for v2 hardware (I haven’t checked, but it wouldn’t surprise me if Arch or another leading edge distro was already assuming v3)
We also can’t rule out gcc distributors bumping their default x86-64 arch up to v2 (or even v3) instead of keeping their own default build flags elsewhere:
I don’t know if there is an easy way to get this kind of target info out of ELF binaries (short of scanning every opcode looking for the newer ones)