Yes, although as usual it is not the only issue
Requiring that sdists and wheels have the same compatibility constraints is problematic but reducing the size of wheels is only one part of this. I will describe the situation as it concerns python-flint (which I maintain) and gmpy2 which is closely related. I don’t know as much about pytorch etc as Ralf does but ultimately I think many of the issues are similar, just more extreme in terms of file size because of the larger CUDA binaries.
The C-level dependencies for gmpy2 are GMP, MPFR and MPC. For python-flint the dependencies are GMP, MPFR and Flint so GMP and MPFR are shared dependencies. The primary purpose of both gmpy2 and python-flint is to expose the functionality of the underlying C libraries so that it can be used from Python.
In every other packaging system (conda, apt, homebrew etc) there would be a single shared copy of the GMP and MPFR libraries that would be used by both gmpy2 and python-flint (that’s why they’re called shared libraries!). There would only need to be one GMP package to be maintained. The build farm would only build this package once. A user would only download it once. It would only be in one location on disk and one location in memory at runtime. Projects like python-flint and gmpy2 could just use that GMP package as it comes and could build directly against its binaries.
In the case of the PyPI ecosystem it isn’t easily possible to have a GMP package that gmpy2 and python-flint can share because any binary wheels would need to be ABI compatible. Instead both gmpy2 and python-flint need to build GMP and bundle it. Building GMP is especially difficult on Windows and both projects have had to figure out ways of solving that problem along with all of the associated CI tooling to make it happen and both projects need to maintain that going forwards. Both projects have to carry patches for GMP. Both projects have to have slow CI jobs that build all of these dependencies from scratch. Both projects upload binary wheels containing effectively duplicate libgmp.so
etc files.
Many users of python-flint are also users of gmpy2 and so they have to pip install
larger wheels containing duplicate libraries. After install those bundled libraries are duplicated on disk in every virtual environment. The duplicated libraries are loaded separately into memory at runtime within each process. Note that in e.g. Linux if you have a single system-wide libgmp.so
it would be shared in physical memory by all running processes that use the library. When you install manylinux wheels though you get different copies in each venv and even duplicates across different packages within a venv and each of those would use separate physical memory at runtime.
From the perspective of python-flint and gmpy2 maintainers the primary issue here is duplication of effort: it is a lot of work for each project to package the same libraries. It would be easier if we could share the same build of the same libraries. We would need to either have separate wheels that literally just bundle the C libraries or for python-flint to depend on gmpy2. For that to work with wheels on PyPI the dependency arrangement between the wheels would be an ABI dependency and would need to be an exact constraint between particular wheels like gmpy2==<hash of a gmpy2 wheel file>
.
It is also possible to link Flint with a BLAS library in order to accelerate some operations. I imagine that 99% of python-flint users have a BLAS library from NumPy but it just seems too difficult to leverage that with python-flint wheels because it would again introduce an ABI dependency. Even closely coupled projects like NumPy and SciPy struggle to share a BLAS library in their wheel builds. So python-flint is left with the option of either building and bundling another duplicate BLAS library or forgoing those optimisations even though the user has the necessary library installed.
In the case of GMP file size is not such a major issue but to me it seems completely absurd to have multiple copies of libgmp in the same virtual environment: it is a clear sign that something went wrong somehow. This is the best that package authors have been able to come up with though while working with the constraints of Python’s packaging standards. All ABI dependencies have to be avoided because otherwise tools like pip won’t know how to install compatible binaries.
In the case of things like pytorch etc all of the same considerations apply except that the duplicated shared binaries are massive. It would be much better if they could be split out into separate wheels but you would still need to be able to encode the ABI constraints somehow when doing so.
Also I might be wrong about this but I think that the reason the CUDA binaries are so large in the first place is just because they are really provided as a bundle of subpackages for particular GPUs. The user likely has only one particular GPU but they end up absorbing a gigabyte of other code because with wheels there is no way to choose the right subpackage at install time based on something like what GPU they have.