What to do about GPUs? (and the built distributions that support them)

Just noting that if anyone wants to follow up on the specification of external dependencies, a draft PEP for that was started a few years back that could be dusted off and modernised to account for pyproject.toml et al, rather than having to start from a blank page: Adding the draft status PEP for external dependency expression by tleeuwenburg · Pull Request #30 · pypa/interoperability-peps · GitHub

1 Like

Thanks @ncoghlan! There’s some parts of that that are indeed useful, in particular the “reasonable communications layer by which information can be shared between those two separated ecosystems”. I wouldn’t want to reuse the build related parts, e.g. 'include!libblas.h' is not a healthy idea. Back in 2015 the “build from source” problem was still a lot more relevant than it is now. Today all important libraries provide wheels on at least Windows/Linux/macOS, and aarch64 and ppc64le wheels are starting to get traction too. So really runtime dependencies are what matters.

There’s also a lot more detail that’s needed. For example, CUDA and MKL are single-vendor and you should be able to rely on them as runtime dependencies independent of whether they were installed with, e.g., apt or mamba. For other runtime dependencies that won’t be true necessarily. So do they need to be treated differently, yes or no?

Writing a PEP should come later I believe; a clear description of use cases, current packaging practices, and problems to be solved seems needed to refer to and get people on the same page first.

Would you say that support for external dependencies would create 10X more configuration combinations for package authors to diagnose?

Could we write a tool to list all versions of all visible libs and packages? What could the lurking variable(s) be? It should probably also show the LD_PRELOAD list.

Are you expecting that pip will run these other package managers as root (in order to use package managers that unarchive as root and then set permissions and extended file attributes as root)?

(It’s possibly worth mentioning here that virtualenvwrapper has add2sitepackages and toggleglobalsitepackages.

When you add2sitepackages, that’s no longer a 'hermetically-sealed` build/install: you then forefit the build/install isolation that is the whole point of virtualenv. You’re then expanding the “attack surface” of available e.g. gadgets; and, for a production deployment, pip doesn’t yet (?) fix permissions of internal or external dependencies.

https://virtualenvwrapper.readthedocs.io/en/latest/command_ref.html#path-management :

I’m not sure that any of the proposed solutions are even close to attaining “silver bullet status” given that almost all of the CUDA software in question is (a) large and (b) needs to support a wide variety of GPUs if you want it to be broadly applicable and not force the user into navigating the GPU version namespace manually and/or suffer long PTX JIT times at startup, assuming that PTX is even an option for all of the GPU kernels you want to publish. The price of hiding such details is large wheels - it’s like a speed of light constant.

I would propose that one solution might be wildcard redirects for very specific vetted vendors. This is to say that rather than doing per-file redirects, which caused the QoS problems that PEP-470 addressed, PyPI just accepts that certain families of packages which are identified as part of a pre-agreed namespace (for example: ^cuda-.|^nvidia-.) do a bulk redirect to the vendor in question, that obviously being Nvidia in my example.

The “vetting” part would also probably involve agreeing to certain QoS obligations. Files covered under a registered wildcard wouldn’t be removed before years, would never be updated in place, would be made available with SLAs on latency and average global bandwidth, etc etc. If the agreement also specified “trust but verify”, it would also be easy enough to expose certain CDN statistics or have bots randomly download targeted files from various parts of the world and report in on whether the the external provider was any worse than the PSF’s designated CDN. If an external vendor started failing to meet their obligations, they would be under threat of losing their wildcard redirect.

TL;DR: I am suggesting that the blast radius of redirection be limited to a small handful of large entities who can pay their CDN bills at scale and meet the overall QoS needs of PyPI while providing large file support.

I could also suggest more radical solutions like IPFS being adopted as a global data store for PyPI and allow this to be sharded across the internet as a whole, but now we’re departing the realm of science and getting more into science fiction. :slight_smile:

1 Like

This is beginning to sound like adding provides metadata elements, where multiple packages can fulfill the same requirement. When the installer can see that there are multiple options available to the user, it obligates the user to choose one.

1 Like

FWIW, we have the provides-dist metadata key that no one is using. Making various packaging tools use it is a whole other beast that no one has yet poked. :slight_smile:

1 Like

Note that for the Provides mechanism to be practical, PyPI needs to sort of bring back the register-a-name-without-releases mechanism, so a package name can be designated as virtual to use in the field. Otherwise we’d have an issue similar to dependency confusion if a name both exists as a real package and is listed as another package’s Provides. The current approach of requesting a name reservation from PyPI admins would not scale well if Provides gets wide usage.


I agree - there’s no silver bullet. The only thing that can be shaved off is things that people now statically link or bundle in dll’s for, and maybe 1-2 fewer SIMD/PTX variants.

This will apply to a very small group of vendors/projects, so I’m not sure it’s all that helpful. For example just raising the limit to 1.5 GB or 2 GB for that small group and asking them for a reasonable size PyPI sponsorship to cover the costs will be much more effective for everyone than some complicated redirect mechanism.

And regarding sizes, it sounded like the overall size of PyPI is at least as important as a set file size limit. Setting an overall per-project limit so people are forced to stop uploading nightlies for such large packages (or really, all packages) would have a bigger positive effect than the per-wheel limit. Example: the top project by sum-of-package-size has wheels in the 20-50 MB range, the problem is it does almost daily pre-releases: lalsuite · PyPI.

We already have an overall project size limit, it’s 10GB per project, though we have made some exemptions.

Thanks for pointing that out @dustin. It looks like we automatically got limit increases, so I never noticed. I just reduced the total size numpy takes up by 6% by cleaning up some very old pre-releases. We host our nightlies elsewhere, because uploading them to PyPI feels like a bit of an abuse of the system. Out of interest, why don’t you make the largest users of space with pre-releases clean up after themselves, or implement an automatic cleanup policy for dev releases after a given period of time? It looks like this can reduce the size of PyPI by 10-20% fairly quickly.

We already ask this for projects that have exemptions on total project size, or are close to the 10GB limit.

I think I can speak on behalf of the other PyPI maintainers: generally our goal is for PyPI to have whatever exists on PyPI exist exactly as it was uploaded, ~indefinitely (unless the owner chooses to remove it).

This is why we don’t allow releases to be overwritten, don’t change metadata on existing releases, and don’t have a “cleanup” policy like this. There is undoubtedly some users that such a policy would be disruptive.

I’m proposing a “small group” here just to keep the implementation and risk/reward ration tractable and predictable for PyPI, but the assumption is also that this would be corporate participants who are also contributing some of the largest wheels. PyPI’s hosting is donated but not certainly not “free” in the sense that it can just keep scaling indefinitely - this proposal would allow some of the asymmetric load to be shared and distributed across additional (paid) CDNs. More importantly, however, this would also allow commercial entities (who have multiple incentives to do so) to continue to contribute to the global pypi index while also continuing to control their own file sizes and destinies, so to speak.

Hello all, I’m a maintainer of the CuPy project.
Thank you very much for all your efforts in keeping the PyPI ecosystem healthy!

Although this is not a direct solution to the “large files on PyPI” issue, let me share the recent news related to GPU & Python.

  • CUDA now follows CUDA Enhanced Compatibility policy introduced in CUDA 11.1 (September 2020). This provides binary compatibility within the same CUDA major version, e.g., binary built with CUDA 11.1 can run on CUDA 11.1, CUDA 11.2, … but not on CUDA 12.0.
    In general, packages built with CUDA 11.1 will work with CUDA 11.2, so this should contribute to reducing the number of packages on PyPI. There was a technical limitation (related to “NVRTC” module) in CUDA 11.2 so we had to release cupy-cuda111 (for CUDA 11.1) and cupy-cuda112 (for CUDA 11.2) separately, but I heard that they’re fixing this issue in upcoming releases.

  • NVIDIA is about to release CUDA Python module that provides a Python/Cython wrapper for CUDA Toolkit libraries. I’m not sure how this library is going to be packaged, but some Python packages may rely on this unified library instead of releasing a package for each CUDA version.

  • AMD GPUs (ROCm Platform, which is a similar concept to NVIDIA’s CUDA Toolkit) are becoming popular. CuPy, PyTorch, and TensorFlow now all provide ROCm wheels. So solutions like environment markers may need to be designed in a vendor-independent way.


In conda-forge, there was a naming discussion on the previous scheme of using cpu / gpu, and while it’s not uniformly rolled out yet, cuda builds now use a cuda extension, leaving room also in the future for rocm builds or others.


3 posts were split to a new topic: External hosting linked to via PyPI

If GPU tags are considered, will CPU tags also be considered? AFAIK, pip cannot decide between versions of packages that have been compiled with difference optimizations (e.g. SSE4, AVX, AVX512). It would be great if that could come at the same time.

Looks like this discussion got posted to Hacker News, there’s a fair amount of comments about it here: What to do about GPU packages on PyPI? | Hacker News

CPU architectures are already included in tags. There is no current plan to extend this to include various CPU optimizations as well.

I meant CPU feature or version tags, not arch tags indeed. That seems analogous to these suggested CUDA version tags.

I know neither are planned, but I would propose that the CUDA version tag proposal be amended with a CPU feature tag.

“Dustin Ingram via Discussions on Python.orgpython1@discoursemail.com schreef op 22 mei 2021 07:10:30 CEST: