Linking to Python-provided libraries?

Has anyone looked at linking to Python-provided version of libraries (e.g. zlib, expat) in wheels, rather than vendoring them inside the wheel? h5py is looking for a longer term solution for zlib on Windows (see Update zlib in Windows wheels · Issue #2261 · h5py/h5py · GitHub and CVE-2023-45853 - zlib.dll is installed as a dependency for h5py · Issue #2354 · h5py/h5py · GitHub), but it would seem better to me if rather than potentially having insecure versions of libraries being distributed via PyPI, that wheels could rely on Python to provide access to those libraries it comes with?

If it isn’t vendored inside the wheel, and isn’t distributed via PyPI, and isn’t part of the standard library (or else why is there a problem to solve?) - then whence would Python provide access?

For libraries with a relatively simple C API, such as zlib, it would be possible to reexport the API as a PyCapsule wrapping a set of functions, like datetime does. However, that wouldn’t be practical for larger APIs such as OpenSSL.

If it isn’t vendored inside the wheel, and isn’t distributed via PyPI, and isn’t part of the standard library (or else why is there a problem to solve?) - then whence would Python provide access?

In this case, HDF5 (which is a C library) depends on zlib (which is an C library that CPython requires, and exports a Python interfaces as the zlib stdlib module), so users already have zlib. There are other C libraries that are included in CPython (e.g. expat, openssl, libffi) which other C libraries (or other Python extension modules) also depend on. Having a single instance of these libraries (as is the default on Linux) would make it simpler for users to handle security issues with these libraries for wheels (only update the version of Python you already have).

For libraries with a relatively simple C API, such as zlib, it would be possible to reexport the API as a PyCapsule wrapping a set of functions, like datetime does . However, that wouldn’t be practical for larger APIs such as OpenSSL.

I’m not sure if reexporting is required for external libraries like zlib (that was partially why I’m asking, as I’m not familiar enough with how CPython can be built, but I had assumed that these external libraries were dynamically linked always), but ideally the headers/shared objects/DLLs would be in the same place as the CPython ones, so it’s relatively easy to link to them. I’m not thinking of modules that are written in C, I’d imagine you would want to write a C-extension to bridge between any external library and any stdlib module.

CPython usually doesn’t ship the zlib library, except perhaps on Windows. It links to an externally provided-one.

For example the Ubuntu system Python executable bundles the zlib module and links to the system’s zlib:

$ /usr/bin/python3 -c "import zlib; print(zlib)"
<module 'zlib' (built-in)>
$ ldd /usr/bin/python3
	linux-vdso.so.1 (0x00007f56c3110000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f56c2a15000)
	libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f56c29e4000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f56c29c8000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f56c27a0000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f56c3112000)

while conda-forge’s Python has a distinct zlib module linking to the conda-forge zlib:

$ /home/antoine/mambaforge/envs/pyarrow/bin/python -c "import zlib; print(zlib)"
<module 'zlib' from '/home/antoine/mambaforge/envs/pyarrow/lib/python3.10/lib-dynload/zlib.cpython-310-x86_64-linux-gnu.so'>
$ ldd /home/antoine/mambaforge/envs/pyarrow/lib/python3.10/lib-dynload/zlib.cpython-310-x86_64-linux-gnu.so 
	linux-vdso.so.1 (0x00007ffd755fb000)
	libz.so.1 => /home/antoine/mambaforge/envs/pyarrow/lib/python3.10/lib-dynload/../../libz.so.1 (0x00007f615d123000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f615d0f8000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f615ced0000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f615d14c000)

The official answer is: at your own risk.

We make a small amount of effort to not drastically change the install layout with respect to things bundled in the installer (a much higher effort for our own files), which means if you discover that there’s a copy of libffi or OpenSSL bundled as a DLL and you decide to use it, chances are it’ll keep working for that version of Python.

But we treat those as internal implementation details, which means we could change them at any time. They may be patched from their original versions, and you’ll just have to find that out by yourself because we don’t document it for you. They may not even match what CPython uses (for example, we statically link zlib on Windows, but Tcl/Tk need a dynamically linked copy, and nobody checks whether they’re the same one).

As Antoine pointed out, for most other platforms we don’t distribute anything at all, but rely on distributors to provide the libraries and compile CPython against them. You should probably use the platform versions of these libraries, just like CPython does.

1 Like

An alternative is to use conda, which allows distribution of non-Python packages, so you could just specify zlib as a dependency and not have to vendor it. It looks like there is a conda-forge package for zlib although I can’t speak to its reliability.

1 Like

Given that it’s currently used by 453 other conda packages, you can certainly assume that package to be reliably maintained.

>>> import subprocess, json
>>> 
>>> out = subprocess.check_output(['mamba', 'repoquery', 'whoneeds', '-c', 'conda-forge', 'zlib', '--json'])
>>> d = json.loads(out)
>>> rdepends = set(o['name'] for o in d['result']['pkgs'])
>>> len(rdepends)
453
>>> sorted(rdepends)[:10]
['abinit', 'adios', 'afterimage', 'aiokafka', 'ambertools', 'aria2', 'arrow-cpp', 'assimp', 'astrometry', 'atari_py']
2 Likes

Thanks, this was the answer I was looking for. On non-Windows, we’re doing fine (and we’re familiar enough with how linking works to be able to handle the issues that arise there), it’s Windows where we’re having more difficulties (as none of the h5py devs use Windows).

We are using nuget to get zlib currently on Windows, but it looks like all the options there are quite out of date (so we need to look at alternatives).

One thing you may want to keep in mind is that there are different ways of installing Python on Windows, and you want your wheels to work with at least all the common ones. The python.org installers and those from the Windows store may have zlib in the same place, but it’s unlikely that for example the Python shipped in conda-forge does - and it’s not uncommon for users to install only Python (or Python + other non-Python tools/libraries) with conda, and all Python packages from PyPI. There may be other installers out there (Enthought’s suite, is ActiveState Python still a thing?, PyBIs, Spack may gain Windows support, etc.) with different layouts as well.

It wasn’t entirely clear to me from your answer if you’d try to obtain zlib from a python.org installer during the h5py wheel build process and still vendor zlib, or if you’d want to dynamically preload zlib at runtime from a private CPython location. The former should be fine, the latter will quite likely run into issues with differences between how Python installers are built.

1 Like

I should also point out that because the zlib in CPython is only there for Tk, if someone chooses to leave out Tk, then it won’t even be installed.

vcpkg is probably the alternative you want.

What about the zlib and zipimport modules? Am I misunderstanding you?

There’s also conan (never used it, I just know it exists :smiley: ).

Those are statically linked. They don’t use the zlib1.dll that Tk requires.

(And Tk only gets it because they have a single build flag for shared/static, which applies to all components, including their vendored zlib. If I could’ve, we’d statically link zlib into the Tk DLL, but that just isn’t an option - all shared or all static. But one day they/we might fix it, and so the zlib DLL could disappear overnight in a Tk update.)

1 Like