What to do about GPUs? (and the built distributions that support them)

dustin · February 9, 2021, 6:53pm

I just rejected a request for a 2.5GB file size limit on PyPI and think it’s probably time we have a discussion about why and the future of the ecosystem here.

File sizes on PyPI have been slowly increasing in size for a while now, mostly driven by certain project’s needs to support specific GPU ABIs. Newer ABIs, like CUDA 11 also seem to be resulting in even larger distributions than before, so this problem is getting worse over time.

You can get a sense for overall size of these large projects at https://pypi.org/stats/. (Note that not all of these projects are large due to individual file size – some just release small distributions very frequently and are unaffected by these issues.)

Why are large files challenging for PyPI?

There are a couple reasons why the PyPI maintainers are currently unwilling to raise the limit above 1GB:

CDN constraints:
- We likely have an upper bound on cache size that our CDN provider holds for us (it’s hard to know for sure what it is, but they almost definitely are not holding all 7TB of PyPI in memory). The more large files in this cache, the less overall # of files it can hold, the less likely a given file will be in the cache, and thus the more churn our cache experiences (which leads to more backend requests, longer response times, increased bandwidth to our backends)
- Our current CDN “costs” are ~$1.5M/month and not getting smaller. This is generously supported by our CDN provider but is a liability for PyPI’s long-term existence.
Networking / bandwidth constraints:
- These packages are already a large drain PyPI’s non-CDN infrastructure (bandwidth from backends to the CDN and storage). Our current infrastruce “costs” are ~$10K/month and also not getting smaller. This is also supported by one of our cloud providers, but is a liability (albeit smaller than our CDN liability).
- Larger overall size of PyPI on disk makes it harder to host mirrors, requiring mirrors like bandersnatch to implement features to block certain large projects from being mirrored.
Upload experience:
- the current PyPI upload API is synchronous, and a >1GB upload is one long blocking request to that endpoint, and is more likely to fail or consume excess resources.
Download experience:
- End users trying to download large files get a poor user experience in terms of install-time and reliability, especially if they are on poor connections.

What are our options?

There are a few options that have been proposed or considered, I’ll try to list them here as well as their challenges/downsides:

GPU tags

Similar to the existing platform and CPU architecture tags, we could introduce a new GPU ABI tag for wheels that corresponds to the GPU ABI that the wheel supports. Something like:

my_project-2.2.0-cp38-cp38-manylinux2010_x86_64-cuda90.whl

The challenge here is that there needs to be a reliable, cross-platform (and pure-Python) way to detect GPU ABIs, something like platform.gpu(), that can be used by installers like pip to determine what architectures the host supports. There isn’t a standard for detecting the various GPU ABIs in a reliable way.

Another challenge is that we haven’t added new tags to the Wheel spec, and it’s unclear how the addition of a new tag would be supported by the various tooling that produces/consumes wheels.

[edit]: It’s unlikely this would work, as @pradyunsg notes, this doesn’t work as-is because we have optional build tags in wheels.

Environment markers

As @pradyunsg notes, an option in the vein of PEP 496 is to have environment markers like:

install_requires=["packagename > 1.0 : sys.gpu == 'cuda11'"],

This still requires the same reliable, cross-platform (and pure-Python) way to detect GPU ABIs as above.

A downside here is that maintainers need to publish N+1 different projects for every N GPU ABIs they want to support (with one ‘parent’ project that requires all the ABI-specific projects, with markers), which has other challenges discussed in the next option.

Tell publishers to split their projects up

Some projects split their project namespace up based on the GPU ABI they are providing. For example, the cupy-cuda* projects are split into cupy-cuda80, cupy-cuda90, etc, with one for each GPU ABI.

The downsides here are that while this looks like it’s reducing average package size, it’s still basically the same on disk. And while this works for cupy-cuda*, it likely will not work for bigger frameworks like the project whose limit request I rejected. It’s also not very friendly to the user, as they need to figure out themselves which project to install.

Tell publishers to host elsewhere

One option is to just tell publishers they must self-host and tell their users how to install. This is what pytorch does, for example, and they tell their users to install with something like:

pip install torch==1.7.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

The downside to this is that it fractures the ecosystem, adds extra steps for the end user, and means that publishers need to set up and maintain their own PEP 503-compliant repository (or pay a third party to do so). It also means that someday this external index could go away or become compromised, as it doesn’t have the same support that projects published to PyPI have.

Allow external hosting on PyPI

We get a lot of requests for this, but most folks aren’t aware that this used to be a feature of PyPI that was removed via PEP 470 because in practice it wasn’t great for end-users, similar to reasons mentioned above.

I think we’re fairly unlikely to add this back, but perhaps if we consider “fat GPU wheels” as a special case (i.e., not make this available to all packages) it could be worth reconsidering.

Charge for large file hosting on PyPI

PyPI is currently entirely free-to-use. It’s possibly that publishers that want larger file sizes would be willing to pay to cover costs of infrastructure as well as improvements to tools to support larger file sizes (e.g. making a new, asynchronous upload API) which mitigate the infrastructure liabilities of hostinging larger files.

Challenges here are that PyPI currently has no infrastructure to handle any paid features or payments. We also don’t have a great sense of what this would be worth to publishers.

Selector packages

This was proposed in Idea: selector packages but ultimately is attempting to solve a much bigger problem than just the “fat GPU wheels” problem.

Downsides to this are that it introduces more “dynamic” dependencies, which we are generally trying to move away from (e.g. with setup.py). More details are in that thread.

What other challenges do we have?

I think one additional challenge here is that we (the Python Packaging Authority) don’t seem to have many folks with a ton of experience / deep knowledge about GPUs and the needs of these frameworks. I personally don’t have more than what I’ve outlined here.

If you feel like you do, I’d appreciate your thoughts here, but otherwise it seems like we probably can’t fix this on our own and will also need to work with multiple competing projects as well as GPU providers themselves to find a solution that works for everyone.

uranusjr · February 9, 2021, 8:01pm

I wondered about the possibility to have user-defined markers, which is basically dynamic dependencies but explicitly structured. The rough idea is to allow tools (build backends?) to register custom markers and its associated logic. I haven’t thought very deep into this and maybe it won’t work at all, but it’s an idea…

brettcannon · February 9, 2021, 10:16pm

The CPU and OS is part of the platform tag, so if we were going to do something like this I would advocate to make it a part of the platform tag and not an entirely new tag on its own.

I’ve heard this suggestion before, ranging from just providing a way to specify extra wheel tags to consider valid to the being able to register code to do some of the work. I’m not sure if having an e.g. PYTHONPLATFORMTAGS environment variable which was a comma-separated list of tags would be enough to cover this, but it would be simple enough to add to packaging.tags.sys_tags().

dustin · February 9, 2021, 10:47pm

Note that unlike platforms or CPU architectures, GPU ABI providers need not be mutually exclusive – it’s possible that an installation environment could support multiple. Making it part of the platform tag would have the potential to really overload that tag, IMO.

Also worth mentioning that support for certain features with in a single GPU ABI (like onednn/mkldnn or cudnn) is also a distinguishing feature that we may want to indicate via a tag.

pradyunsg · February 10, 2021, 12:04am

FWIW, this doesn’t work as-is because we have optional build tags in wheels. To quote PEP 427:

The wheel filename is {distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl.

Environment markers might be a better mechanism for this. It preserves the deterministic conditional dependencies state as we have right now, pushes for smaller packages that only contain the libraries that are necessary while still being a fairly straightforward pip install <package> (where <package> has those conditional dependencies, perhaps even in an extra).

@uranusjr’s reference to user-defined markers reminded me of this, although I do think that we’d likely need someone with a better understanding of this to pitch in on how well this’ll work for such projects.

It’s probably worthwhile to ask those projects (who likely have the understand on what might work vs what might not work) to pitch in as well; although we’d need to be clear around the expectations there – that this is early stages of “understanding the problem and ideating on a solution” at this point and all this is volunteer driven (unless someone has a pile of money big enough to drive this).

I also feel that folks who worked on conda’s handling of GPU packages have would likely have insights they’d be willing to share, and we should definitely ask them for their thoughts (either here, or over any other communication channel and loop that info back here; I don’t know how/who to ask on their end tho).

pradyunsg · February 10, 2021, 12:12am

Oh, and if we really want to, we can also add in as an operator for environment markers and make every downstream redistributor wave their fists at the sky w.r.t. the complexity of our markers mechanism.

dustin · February 10, 2021, 12:39am

@pradyunsg Great points. I added your note about wheel tags to that option, I had forgotten about build tags. I also added the environment markers option, but note that it has similar challenges to the “GPU tags” option and the “Tell publishers to split their projects up” option.

pradyunsg · February 10, 2021, 12:42am

Nit-pick: One important difference is that the burden of figuring out the right package name is not on the end user and the current language doesn’t make that clear.

njs · February 10, 2021, 2:12am

This might be pie-in-the-sky but… can we ask the GPU manufacturers to fix the underlying issue that these libraries contain multiple gigabytes of code? That’s like 99% automatically generated, right? Can’t they do the automatic generation on the final install system or something instead of asking everyone to ship it back and forth across the internet all day?

westurner · February 10, 2021, 9:31am

Packages larger than 1GB could be asked to upgrade to a paid plan. Nonprofits can sell goods and services. Can we show per-project bandwidth information at least to logged-in warehouse users?

What does GitHub packages offer in terms of max free package size? Are they planning to implement TUF?

Does e.g. find-packages for external hostimg still work?

uranusjr · February 10, 2021, 10:02am

GitHub packages notably does not support distributing Python packages at the time of writing. GitLab may provide a more meaningful reference point.

EpicWink · February 10, 2021, 10:08am

The current specification (PEP 503) for a Python package index makes it really easy to have the actual files (package wheels and archives) hosted somewhere different from the index server. As long as the hash is uploaded once, the file’s location can be pointed to anywhere (and pointer changes can come with a re-validation against the hash).

This would lose the guarantee on the availability of these files in the long-term, but I’m sure Google, Facebook and Nvidia have the resources to permanently host files.

This could be implemented by having some way to store a link in place of a file (ie in some database for each package).

After reading PEP 470, it seems like this idea was supported in the past, but withdrawn? Or is PEP 470 about hosting the index as well?

h-vetinari · February 10, 2021, 10:28am

That would probably require having nvcc on the host system (in addition to a standard C/C++ compiler), which probably negates much of the “story” for having a wheel in the first place.

FWIW, the conda-forge pytorch GPU builds are ~450MB: Files :: Anaconda.org

There’s a bunch of options to compiling CUDA (depending also how many GPU arches are being built for, and how/whether JIT compilation is supported) - it’s a pretty complicated issue (and I don’t claim to understand everything), but it helps a lot that there are a bunch of nVidia / rapids folks are actively working with conda-forge on the GPU side of things.

Another relevant question for build tag discussion might be that CUDA should be generally compatible within major versions (except nvRTC), cf. discussion here.

FRidh · February 10, 2021, 12:26pm

With Nix users can state in their configuration what “features” their system supports, like the tags you suggest. Packages can then define what “features” they require. This helps with the additional restrictions you at times may have, on top of the hard constraints from the CPU type.

Note we have not standardized any GPU features yet because the Nixpkgs CI does not test on GPU’s. However, given the amount of scientific computing users we have, and the push to build also for things like MKL, I think it will happen soon.

I think the idea of tags or user-defined markers is a good idea, but there needs to be a forum/community for standardizing markers or it will become a mess.

uranusjr · February 10, 2021, 12:31pm

PEP 508 already define in and not in as valid operators, they just have no practical use cases now.

rgommers · February 10, 2021, 2:24pm

There’s a set of related problems here with non-Python dependencies in addition to just file size. I wrote a blog post about key Python packaging issues from the point of view of scientific, data science and machine learning projects and users two weeks ago: Python packaging in 2021 - pain points and bright spots | Quansight Labs.

The PyPI request that triggered this thread was from a small, relatively obscure package. The same issue will show up for many other ML/AI projects needing CUDA though. A few related issues/observations:

Statistics · PyPI says that TensorFlow alone uses about 1 TB.
PyTorch, MXNet, CuPy, and other such libraries will all have similar problems. Right now they tend to use CUDA 10.2, and end up with wheel sizes of close to 1 GB (already very painful). Soon they all want to have CUDA 11 be the default. See for example torch · PyPI, mxnet-cu102 · PyPI.
RAPIDS - a large GPU ML project from NVIDIA - already gave up on PyPI completely due to issues around PyPI and wheels
PyTorch self-hosts most wheels in a wheelhouse on S3, which makes it impossible for downstream libraries to depend on those wheels.
large parts of the scientific computing, data science, machine learning, and deep learning rely on GPUs. This is a significant and growing fraction of all Python users.
a bit further into the future: ROCm (the CUDA equivalent for AMD GPUs) may end up being in the same position.
ABI tags do not solve the problem that there’s no separate package that can be depended on; every package author must just bundle in whatever libraries they need into their own wheel. That’s not limited to CUDA or GPUs, the same is true for other key dependencies like OpenBLAS or GDAL whose authors don’t have a special interest in Python.

CUDA specifically

The issue with CUDA 11 in particular is not just that CUDA 11 is huge, but that anyone who wants to depend on it needs to bundle it in, because there is no standalone cuda or cuda11 package. It’s also highly unlikely that there will be one in the near to medium future because (leaving aside practicalities like ABI issues and wheel tags), there’s a social/ownership issue. The only entities that would be in a position to package CUDA as a separate package are NIVIDIA itself, or the PyPA/PSF. Both are quite unlikely to want to do this. For a Debian, Homebrew or conda-forge CUDA package it’s clear who would own this - there’s a team and a governance model for each. For PyPI it’s much less clear. And it took conda-forge 2 years to get permission from NVIDIA to redistribute CUDA, so it’s not like some individual can just step in and get this done.

Interfacing with other package managers

One potential solution that @dustin did not list - and which may the most desirable solution that would also solve related issue like GDAL for the geospatial stack and other non-Python dependencies - is: create a mechanism where Python packages uploaded to PyPI can declare that they depend on a non-PyPI package. There’s clearly a lot of potential issues to work out there and it’d be a large job, but if it could be made to work it would be a major win. It would help define the scope of PyPI more clearly, and prevent more important packages from not being on PyPI altogether in the future.

This is kind of similar to @steve.dower’s “selector package”, but the important difference is that it allows you to actually install (or check for installation of) the actual dependency you need built in a way you can rely on and test. A selector package is much more limited: if you specify cuda-10.2 or openmp-gnu then that will solve the PyTorch self-hosting issue, but not the file size and “bundle libs in” issues. If you get your external dependency from conda, homebrew or your Linux package manager, they may not have the same ABI or even functionality (for, e.g., GDAL or FFMpeg there are many ways of building them, two package managers are unlikely to provide the same things for each library).

Possibility of a solution fully within PyPI

None of the solutions @dustin listed are a structural fix. If the interfacing with external package managers won’t fly, then a solution within PyPI alone would have to have at least these ingredients:

An ownership model where the PyPA or another well-defined team can be in charge of packages like cuda, gdal, openblas in the absence of the original project or company packaging their own non-Python code for PyPI.
A way to rebuild many packages for a new compiler toolchain, platform or other such thing.

In other words, have a conda-forge like organization and tech infra focused on packaging for PyPI. That is a way larger exercise than interfacing with an external package manager, and then would then still leave a large question mark around the scope of what should be packaged for PyPI. I assume people would not be unhappy with standalone CUDA, OpenBLAS or GDAL packages if they just worked - but where do you stop? Scientific computing and ML/AI depend on more and more large and exotic libraries, hardware and compilers - and I can imagine packaging all of those is something that’s not desirable nor sustainable.

Expertise & resources

Maybe I can help, either personally or (more likely) with getting other people and companies with the right expertise and resources involved. I am a maintainer of NumPy and SciPy where I’ve been involved in packaging them for a long time. I also lead a team working on PyTorch.

At Quansight Labs, which I lead together with Tania Allard, one of our main goals is to tackle PyData ecosystem-wide problems like these. We have already decided to spend part of our budget for this year to hire a packaging lead (currently hiring), who could spend a significant amount of time on working on topics like this one. If there’s interest in, for example, working out a PEP on interfacing with other package managers, we could work on that - and also on getting funding for the no doubt significant amount of implementation work. Possibly also including funded time for one or more PyPA members to help work through or do an in-depth review of the key issues for PyPI and pip/wheel/etc.

FRidh · February 10, 2021, 4:54pm

In my opinion this is the preferred solution. Going the conda way is a tremendous amount of work, and work that is already being done by for example Linux distributions who, I think, will typically do a better job at that. By declaring dependencies, anybody can integrate this into their distribution of choice on whatever operating system/architecture they would like to.

Note that Haskell (Cabal) allows for specifying pkg-config style dependencies. A certain Haskell on Nix distribution maintains mappings from pkg-config style dependencies to Nixpkgs “system” dependencies.

The package manager of the D language, DUB, also allows for specifying pkg-config style dependencies.

A short but interesting discussion for Rust/Cargo is also about pkg-config and states how it is a limited solution, for example not helpful on Windows, and so that there should be a way you can describe dependencies for different platforms using different tools. A module exists for Cargo that offers pkg-config support

What is needed is to declare:

native build inputs (tools)
build inputs, that is, run-time dependencies one links against and for which for example the headers need to be available during build-time.
Ideally we’d also declare:
run-time dependencies one needs to invoke or by any other mean need to have available during run-time only

Some of these are already handled by backends such as meson and cmake. Therefore, we should maybe not duplicate such efforts and instead bless (a) certain backend(s) that handle these well. With meson one can for example invoke meson introspect --scan-dependencies /path/to/meson.build to get all declared dependencies. The metadeps package allows specifying pkg-config dependencies in a Cargo.toml.

Somewhat related topics:

westurner · February 10, 2021, 5:01pm

To be quite honest, I fail to see the value in having things (like CUDA and MKL) packaged for conda-forge (where these shared dependencies are already packaged) and for PyPI.

Without instigating, really, is there any specific advantage to python wheel packaging over conda packages, other than TUF?

Notably, RAPIDS packages install with conda.

cibuildwheel to conda-smithy may be the migration most appropriate for packaging non-Python dependencies.

westurner · February 10, 2021, 5:34pm

In order to reference shared libraries installed as {other python packages, conda packages, os distro packages}, we’d need to specify those as optional dependencies to be chosen from by specifying an install strategy at package install time.

# in setup.py:
3rdpartydeps = [
{"match": {json expr obj},
"requirementsset': },
]

That additional (nested) metadata could be added to setup.py if we can choose a configuration management system -style set of platform strings to specify as match constraints for which extra-pip dependencies should be considered in applying the strategy.

As far as strategies, I don’t run pip as root now (which it would need in order to run the os package manager or not rootless container(s))

rgommers · February 10, 2021, 10:09pm

Thanks @FRidh, interesting. Good to know about the Haskell-Nix dependency mapping, I’ll read up on it.

I was thinking more about declaring runtime dependencies; build-time will be more tricky for the kind of packages we’re talking about.

Can we please not go there? I really do not want this to turn into a conda vs. wheel/pip/whatever discussion. Clear scoping would be needed before diving into the details, but certainly the outcome should not be specific to conda.

I intentionally left out proposing a specific method of specifying dependencies. That’s too detailed to start with, I’m looking for high level feedback/discussion (prior art, potential blockers, better alternatives, etc.).