Hi! When we switched python-build-standalone to statically linking libpython into bin/python, we ran into some code that made assumptions that libpython was dynamically linked and expected to be able to open libpython3.x.so.1 and have that be the running libpython. We had to introduce some poor workarounds to maintain compatibility with such code. I want to propose that CPython actively monitor for this assumption and raise DeprecationWarnings with the intention of breaking such code in the future.
As background, the Python interpreter executable, which I’ll call “bin/python”, is a short C program that just calls Py_BytesMain(argc, argv). The actual Python runtime is in libpython, which can be either statically or dynamically linked into bin/python. Static linking is better for performance, especially on the free-threaded build for reasons involving thread-local storage access, which is why we switched over. However, if you need to ship a libpython.so anyway (e.g. for use by embedders), this now doubles your install size, so some distributors prefer dynamic linking. (Note that this has nothing to do with “static linking” / “a static executable” in the sense of a fully-static binary that doesn’t use the system libc. In general, a fully-static binary cannot later dynamically load anything, so the problems in this post don’t arise, so this discussion is about dynamic executables.)
If your bin/python dynamically links libpython, that means that by the time the interpreter starts, your libc has already loaded a library named libpython3.x.so.1, and that’s where the interpreter’s implementation of Python comes from. Any later request to load libpython3.x.so.1 (i.e., directly or indirectly via dlopen) will be satisfied by the existing one, and return the functions and data in the real running Python interpreter.
However, if your bin/python statically links libpython, a request to load libpython3.x.so.1 will cause a separate copy of libpython to be loaded from disk. Depending on the details of how it is requested, libpython symbols might be returned from the main interpreter or might be returned from this second copy.
The two specific cases we’ve seen are:
- Extension modules that themselves declare a dependency on
libpython3.x.so.1, because they were (incorrectly) built with-lpython3. The upstreampython3-configcommand and pkg-config files properly distinguish two cases, the (default) extension module use case, where you shouldn’t use-lpython3because you’re expecting to be loaded into an existing Python, and the embedding use case (a non-Python application or library that is pulling in Python), where you should. But there are various third-party build systems that don’t get this quite right. - Pure-Python code that is accessing the CPython API via something like
ctypes.PyDLL(f"libpython3.{sys.version_info[1]}.so.1").
On distributions like Fedora that have bin/python dynamically link libpython, such code works properly. On distributions like Debian that have bin/python statically link libpython but also ship a libpython.3.x.so.1 on the default library search path, such code mostly works properly. Specifically, in case 1, the Linux/ELF ecosystem does not actually associate symbols to the libraries they came from, so the extension module has a request to load libpython3.x.so.1 and a totally unrelated request to find e.g. PyArg_ParseTuple from wherever it can be found, which hapens to be found first in the main executable. Even in case 2, as long as you’re calling functions that do not operate on global data and have found the right libpython, you’re executing the same code and things mostly work. But “mostly” is of course pretty dangerous.
On distributions that do not ship a libpython3.x.so.1, this code will all fail to work—if you’re lucky. If you’re unlucky, you have a libpython3.x.so.1 on the search path from some other installation of Python, and things can get quite weird. We ran into this problem almost immediately where extension modules in case 1 worked in python-build-standalone provided that you happened to have that same version of Python installed systemwide, which was quite difficult to debug. Even though we did still ship a libpython3.x.so.1 for embedders to use, it was no longer loaded in by bin/python and the binary didn’t declare an rpath that listed its directory (because it didn’t need to), so the linker would find and load /usr/lib/libpython3.x.so.1 if it existed.
We’re currently working around this by setting an rpath on our bin/python to point at the directory containing our libpython, simply to make this class of code “work”—we don’t actually need that rpath ourselves in normal operation. But this only gets us to the status quo of builds like Debian, where such code is still loading a second copy of libpython and hoping for the best. There are other possible hacks, e.g., placing a fake libpython.so with no actual code on the search path, so that one is found but symbols must be resolved from the main executable. The upstream effort for prebuilt-cpython will need to figure out whether to adopt one of these hacks, risk breaking this type of code, or dynamically link libpython (and take the performance hit).
It seems to me that the best option is to try to get maintainers of such code to fix it, so I propose the following deprecations:
- When loading an extension module, we can parse the shared-object headers ourselves to see if the module declares a dependency on
libpython3.*.so*. If it does, we can raise a warning that the module was compiled incorrectly, perhaps with a link to some docs explaining the problem, and then continue to attempt to import it and hope for the best. ctypes.pythonapialready exists, and is set (on non-Android UNIX) toctypes.PyDLL(None), which means “resolve symbols from what’s already loaded into the process”. There is basically no valid correct use ofctypes.PyDLL("libpython3.x.so.1"). Either you’re loading the current Python interpreter, in which case your code should useNonefor the reasons described above, or you’re loading a different Python interpreter (another build, another version, etc.), which has no guarantee of working well because of symbol collisions. Doingctypes.CDLL("libpython3.x.so.1")is also certainly wrong for GIL reasons. So I think we should have theCDLLconstructor check if the basename of the argument is of the form/libpython3.[0-9]*.so(.[0-9*])?)/(or something like that) and raise a warning telling you to usectypes.pythonapiinstead.
In a few release cycles I’d like to see both of these be errors.
The downside of this change, of course, is we’re raising warnings and eventually hard errors on code that largely works right now. But I think all of this code has an alternative way of being written that would also work fine and works more reliably, so while it’s a migration cost, I think that’s defensible because that code currently has undefined behavior. There is also a small risk that people have private code that they know will only be run on a specific environment where these concerns don’t apply. The one I’m most worried about is a setup with a library that matches the libpython3.x.so.1 pattern which is, in fact, not CPython at all, or some very careful code loading and driving a different-version libpython via ctypes.CDLL("libpython3.y...") and they can convince themselves that there’s no risk of symbol conflicts in their use case. I’m not sure if these setups even exist; I suggest that if we get bug reports during the deprecation period, we can add some sort of flag to override the safety check, since I think this type of thing is much less common than code that wants to access its own libpython.
As a data point, auditwheel now complains about extension modules that load libpython, so for any extension modules that have been run through (a recent version of) auditwheel, this problem shouldn’t arise.
(For clarity, all of this discussion refers to UNIX-shaped platforms, namely Linux, Mac, etc. I’ve written .so but this generally applies to .dylib or the Python framework on macOS. I am pretty sure that this class of problem doesn’t arise on NT because symbol resolution works very differently. I see that ctypes.pythonapi on Android is defined by explicitly loading libpython3.x.so, and I don’t yet know why it differs from other Linux and will figure that out before proposing a PEP or PR.)