What to do about GPUs? (and the built distributions that support them)

GitHub packages notably does not support distributing Python packages at the time of writing. GitLab may provide a more meaningful reference point.

2 Likes

The current specification (PEP 503) for a Python package index makes it really easy to have the actual files (package wheels and archives) hosted somewhere different from the index server. As long as the hash is uploaded once, the file’s location can be pointed to anywhere (and pointer changes can come with a re-validation against the hash).

This would lose the guarantee on the availability of these files in the long-term, but I’m sure Google, Facebook and Nvidia have the resources to permanently host files.

This could be implemented by having some way to store a link in place of a file (ie in some database for each package).

After reading PEP 470, it seems like this idea was supported in the past, but withdrawn? Or is PEP 470 about hosting the index as well?

1 Like

That would probably require having nvcc on the host system (in addition to a standard C/C++ compiler), which probably negates much of the “story” for having a wheel in the first place.

FWIW, the conda-forge pytorch GPU builds are ~450MB: https://anaconda.org/conda-forge/pytorch/files

There’s a bunch of options to compiling CUDA (depending also how many GPU arches are being built for, and how/whether JIT compilation is supported) - it’s a pretty complicated issue (and I don’t claim to understand everything), but it helps a lot that there are a bunch of nVidia / rapids folks are actively working with conda-forge on the GPU side of things.

Another relevant question for build tag discussion might be that CUDA should be generally compatible within major versions (except nvRTC), cf. discussion here.

1 Like

With Nix users can state in their configuration what “features” their system supports, like the tags you suggest. Packages can then define what “features” they require. This helps with the additional restrictions you at times may have, on top of the hard constraints from the CPU type.

Note we have not standardized any GPU features yet because the Nixpkgs CI does not test on GPU’s. However, given the amount of scientific computing users we have, and the push to build also for things like MKL, I think it will happen soon.

I think the idea of tags or user-defined markers is a good idea, but there needs to be a forum/community for standardizing markers or it will become a mess.

1 Like

PEP 508 already define in and not in as valid operators, they just have no practical use cases now.

2 Likes

There’s a set of related problems here with non-Python dependencies in addition to just file size. I wrote a blog post about key Python packaging issues from the point of view of scientific, data science and machine learning projects and users two weeks ago: Python packaging in 2021 - pain points and bright spots | Quansight Labs.

The PyPI request that triggered this thread was from a small, relatively obscure package. The same issue will show up for many other ML/AI projects needing CUDA though. A few related issues/observations:

  • Statistics · PyPI says that TensorFlow alone uses about 1 TB.
  • PyTorch, MXNet, CuPy, and other such libraries will all have similar problems. Right now they tend to use CUDA 10.2, and end up with wheel sizes of close to 1 GB (already very painful). Soon they all want to have CUDA 11 be the default. See for example torch · PyPI, mxnet-cu102 · PyPI.
  • RAPIDS - a large GPU ML project from NVIDIA - already gave up on PyPI completely due to issues around PyPI and wheels
  • PyTorch self-hosts most wheels in a wheelhouse on S3, which makes it impossible for downstream libraries to depend on those wheels.
  • large parts of the scientific computing, data science, machine learning, and deep learning rely on GPUs. This is a significant and growing fraction of all Python users.
  • a bit further into the future: ROCm (the CUDA equivalent for AMD GPUs) may end up being in the same position.
  • ABI tags do not solve the problem that there’s no separate package that can be depended on; every package author must just bundle in whatever libraries they need into their own wheel. That’s not limited to CUDA or GPUs, the same is true for other key dependencies like OpenBLAS or GDAL whose authors don’t have a special interest in Python.

CUDA specifically

The issue with CUDA 11 in particular is not just that CUDA 11 is huge, but that anyone who wants to depend on it needs to bundle it in, because there is no standalone cuda or cuda11 package. It’s also highly unlikely that there will be one in the near to medium future because (leaving aside practicalities like ABI issues and wheel tags), there’s a social/ownership issue. The only entities that would be in a position to package CUDA as a separate package are NIVIDIA itself, or the PyPA/PSF. Both are quite unlikely to want to do this. For a Debian, Homebrew or conda-forge CUDA package it’s clear who would own this - there’s a team and a governance model for each. For PyPI it’s much less clear. And it took conda-forge 2 years to get permission from NVIDIA to redistribute CUDA, so it’s not like some individual can just step in and get this done.

Interfacing with other package managers

One potential solution that @dustin did not list - and which may the most desirable solution that would also solve related issue like GDAL for the geospatial stack and other non-Python dependencies - is: create a mechanism where Python packages uploaded to PyPI can declare that they depend on a non-PyPI package. There’s clearly a lot of potential issues to work out there and it’d be a large job, but if it could be made to work it would be a major win. It would help define the scope of PyPI more clearly, and prevent more important packages from not being on PyPI altogether in the future.

This is kind of similar to @steve.dower’s “selector package”, but the important difference is that it allows you to actually install (or check for installation of) the actual dependency you need built in a way you can rely on and test. A selector package is much more limited: if you specify cuda-10.2 or openmp-gnu then that will solve the PyTorch self-hosting issue, but not the file size and “bundle libs in” issues. If you get your external dependency from conda, homebrew or your Linux package manager, they may not have the same ABI or even functionality (for, e.g., GDAL or FFMpeg there are many ways of building them, two package managers are unlikely to provide the same things for each library).

Possibility of a solution fully within PyPI

None of the solutions @dustin listed are a structural fix. If the interfacing with external package managers won’t fly, then a solution within PyPI alone would have to have at least these ingredients:

  • An ownership model where the PyPA or another well-defined team can be in charge of packages like cuda, gdal, openblas in the absence of the original project or company packaging their own non-Python code for PyPI.
  • A way to rebuild many packages for a new compiler toolchain, platform or other such thing.

In other words, have a conda-forge like organization and tech infra focused on packaging for PyPI. That is a way larger exercise than interfacing with an external package manager, and then would then still leave a large question mark around the scope of what should be packaged for PyPI. I assume people would not be unhappy with standalone CUDA, OpenBLAS or GDAL packages if they just worked - but where do you stop? Scientific computing and ML/AI depend on more and more large and exotic libraries, hardware and compilers - and I can imagine packaging all of those is something that’s not desirable nor sustainable.

Expertise & resources

Maybe I can help, either personally or (more likely) with getting other people and companies with the right expertise and resources involved. I am a maintainer of NumPy and SciPy where I’ve been involved in packaging them for a long time. I also lead a team working on PyTorch.

At Quansight Labs, which I lead together with Tania Allard, one of our main goals is to tackle PyData ecosystem-wide problems like these. We have already decided to spend part of our budget for this year to hire a packaging lead (currently hiring), who could spend a significant amount of time on working on topics like this one. If there’s interest in, for example, working out a PEP on interfacing with other package managers, we could work on that - and also on getting funding for the no doubt significant amount of implementation work. Possibly also including funded time for one or more PyPA members to help work through or do an in-depth review of the key issues for PyPI and pip/wheel/etc.

8 Likes

In my opinion this is the preferred solution. Going the conda way is a tremendous amount of work, and work that is already being done by for example Linux distributions who, I think, will typically do a better job at that. By declaring dependencies, anybody can integrate this into their distribution of choice on whatever operating system/architecture they would like to.

Note that Haskell (Cabal) allows for specifying pkg-config style dependencies. A certain Haskell on Nix distribution maintains mappings from pkg-config style dependencies to Nixpkgs “system” dependencies.

The package manager of the D language, DUB, also allows for specifying pkg-config style dependencies.

A short but interesting discussion for Rust/Cargo is also about pkg-config and states how it is a limited solution, for example not helpful on Windows, and so that there should be a way you can describe dependencies for different platforms using different tools. A module exists for Cargo that offers pkg-config support

What is needed is to declare:

  • native build inputs (tools)
  • build inputs, that is, run-time dependencies one links against and for which for example the headers need to be available during build-time.
    Ideally we’d also declare:
  • run-time dependencies one needs to invoke or by any other mean need to have available during run-time only

Some of these are already handled by backends such as meson and cmake. Therefore, we should maybe not duplicate such efforts and instead bless (a) certain backend(s) that handle these well. With meson one can for example invoke meson introspect --scan-dependencies /path/to/meson.build to get all declared dependencies. The metadeps package allows specifying pkg-config dependencies in a Cargo.toml.

Somewhat related topics:

2 Likes

To be quite honest, I fail to see the value in having things (like CUDA and MKL) packaged for conda-forge (where these shared dependencies are already packaged) and for PyPI.

Without instigating, really, is there any specific advantage to python wheel packaging over conda packages, other than TUF?

Notably, RAPIDS packages install with conda.

cibuildwheel to conda-smithy may be the migration most appropriate for packaging non-Python dependencies.

2 Likes

In order to reference shared libraries installed as {other python packages, conda packages, os distro packages}, we’d need to specify those as optional dependencies to be chosen from by specifying an install strategy at package install time.

# in setup.py:
3rdpartydeps = [
{"match": {json expr obj},
"requirementsset': },
]

That additional (nested) metadata could be added to setup.py if we can choose a configuration management system -style set of platform strings to specify as match constraints for which extra-pip dependencies should be considered in applying the strategy.

As far as strategies, I don’t run pip as root now (which it would need in order to run the os package manager or not rootless container(s))

Thanks @FRidh, interesting. Good to know about the Haskell-Nix dependency mapping, I’ll read up on it.

I was thinking more about declaring runtime dependencies; build-time will be more tricky for the kind of packages we’re talking about.

Can we please not go there? I really do not want this to turn into a conda vs. wheel/pip/whatever discussion. Clear scoping would be needed before diving into the details, but certainly the outcome should not be specific to conda.

I intentionally left out proposing a specific method of specifying dependencies. That’s too detailed to start with, I’m looking for high level feedback/discussion (prior art, potential blockers, better alternatives, etc.).

I’ve updated my post with what I could quickly find about other ecosystems.

This would be a good first step. I do want to emphasize that in the examples I gave (Haskell, Rust, …) the package manager and the build system are the same, whereas with Python we have that now decoupled into a front-end and back-end. Since other build systems typically allow/require you to declare dependencies, I think we should use their functionality, and extend PEP 517 with additional hooks for obtaining the non-Python dependencies from the build system. Of course, build systems that don’t support it could use a certain key in pyproject.toml. Anyway, now I am going too far into the details!

One such potential issue: Python packages would be coupled to the availability of their dependencies. For example, TensorFlow bundles CUDA if TensorFlow were to depend on CUDA such that CUDA was installed from somewhere else during TF’s installation, every finished package would be dependent on the CUDA install method itself, and all of our popular old versions could suddenly be permanently broken if that method stopped working. It also might be hard to merge with system-installed CUDA packages.

Edit: TF doesn’t bundle CUDA itself in its Python packages; users must have it installed externally.

1 Like

But if CUDA somehow came from PyPI (and not some external host) and this used normal dependency resolution, would that be acceptable?

1 Like

Does it? The tensorflow packages on PyPI have GPU support say the TensorFlow docs, and the largest wheel on tensorflow · PyPI is <400 MB. CUDA itself is a lot larger - I think you expect the user to have it installed already. That’s also what GPU support  |  TensorFlow says. And those docs also say only CUDA 11 is supported, so the problem is narrowed down a little compared to what CuPy, PyTorch et al. do.

1 Like

Shoot, that’s right; I edited my post. I was thinking of all the CUDA-related compiled stuff TensorFlow includes, but that doesn’t include CUDA itself. Thank you.

Still, depending on external packages via PyPI could have the problems I described. I clarified since my original example was wrong.

But if CUDA somehow came from PyPI (and not some external host) and this used normal dependency resolution, would that be acceptable?

CUDA from PyPI seems like it would be helpful.

1 Like

I don’t have super relevant insight into this, but I do want to point out it doesn’t have to be pure python, just that if it can’t be pure python it needs to change relatively infrequently, and we need to get it into Python itself.

1 Like

Or into, say, a selector package that can contain some pure Python code to try and load its native code, which presumably will fail for unsupported platforms, and so it can then request a different actual package.

1 Like

You’re right, I should have said it either has to be pure Python (so libraries like pypa/packaging can detect GPUs without needing to ship multiple wheels) or it needs to be in the stdlib.

1 Like

Just noting that if anyone wants to follow up on the specification of external dependencies, a draft PEP for that was started a few years back that could be dusted off and modernised to account for pyproject.toml et al, rather than having to start from a blank page: Adding the draft status PEP for external dependency expression by tleeuwenburg · Pull Request #30 · pypa/interoperability-peps · GitHub

1 Like

Thanks @ncoghlan! There’s some parts of that that are indeed useful, in particular the “reasonable communications layer by which information can be shared between those two separated ecosystems”. I wouldn’t want to reuse the build related parts, e.g. 'include!libblas.h' is not a healthy idea. Back in 2015 the “build from source” problem was still a lot more relevant than it is now. Today all important libraries provide wheels on at least Windows/Linux/macOS, and aarch64 and ppc64le wheels are starting to get traction too. So really runtime dependencies are what matters.

There’s also a lot more detail that’s needed. For example, CUDA and MKL are single-vendor and you should be able to rely on them as runtime dependencies independent of whether they were installed with, e.g., apt or mamba. For other runtime dependencies that won’t be true necessarily. So do they need to be treated differently, yes or no?

Writing a PEP should come later I believe; a clear description of use cases, current packaging practices, and problems to be solved seems needed to refer to and get people on the same page first.