Native-lib-loader: Documentation and best practices on using native libraries in Python wheels

Hi everyone, as part of the wheelnext effort I’d like to share Sharing libraries between wheels documentation with you. It’s inspired by @rgommers’s pypackaging-native with a more narrow and specific focus on the mechanics of how the underlying system libraries (namely the dynamic loader) handle loading libraries, as well as discussing the various approaches that projects have taken over the years to handle these libraries (auditwheel-style mangling, ctypes.CDLL, etc). There are a lot of subtle details that are discussed in comments in various setup.py files or similar places, but are otherwise not captured anywhere more permanent and in a form that can be easily used by new packagers. The new page aims to fill that gap. In addition to these docs, I have a current reference implementation up at GitHub - wheelnext/native_lib_loader, and I am happy to see development proceed there as we can discuss and converge on solutions.

I am hoping that this project can serve as a way for us to collect the best practices available to us and better understand the edge cases where they don’t work. In an ideal world I would love for us to work towards proposing standards (PEPs), but given the immense variety of systems on which people run Python extension modules I think that it’s valuable to not let the perfect be the enemy of the good and at least document practices that are the best we can do across a wide range of platforms.

There has been plenty of preexisting discussion around this project. Some of the places I’ve found the most comprehensive discussions include @njs’s pynativelib proposal and this previous DPO thread from @virtuald, and @FFY00’s dynamic-library project aims to provide another very similar implementation. Happy to see other links as well if others can enlighten me. For those of you who want to continue this discussion, also feel free to join the dynamic-library channel on the PyPA Discord. Thank you to everyone who has already provided any contribution to this project so far, and thanks in advance to everyone who helps participate as we go forward!

15 Likes

That’s a very nice write-up. Quick comment: it lacks a link to the repository where the source code lives.

Very nice indeed, thanks for sharing @vyasr.

Something that we found very useful when working on scipy_openblas32|64 wheels was doing both the symbol and library renaming that you describe, and then providing some utility functions that can easily handle that rename easily at build time for the package linking against the renamed library. get_pkg_config and how it supports all of preloading, auditwheel style mangling, and using it as a regular shared library for local development in particular may be of interest:

Perhaps something like that could be generalized and included in dynamic-library? The libdir and RPATH handling in particular seem generally applicable.

3 Likes

Thanks for pointing that out, I wasn’t very clear that when I mentioned a reference implementation it lived in the same place as the documentation: GitHub - wheelnext/native_lib_loader

2 Likes

This looks interesting. I see that OpenBLAS supports a SYMBOLPREFIX. Is the way this is currently working predicated on setting that when building BLAS then using BLAS_SYMBOL_PREFIX inside SciPy to map the prefix used when SciPy builds against BLAS? AFAICT there used to be a generated header blas64-prefix-defines.h produced by this build utils file in order to provide the mappings, but I’m guessing that changed during the migration to Meson for build and I haven’t gone digging for the new approach yet.

Producing a pkgconfig (or CMake config for CMake projects) to handle this would certainly be nice. It does simplify the handshake between the producer and the consumer to agree on what the final symbol names look like without having to rewrite code.

1 Like

Yes, that’s how it is working indeed. The relevant pieces are:

2 Likes

Got it, so this approach is viable but it does require that the underlying library support this form of symbol mangling or that we implement something like machomangler that also works for ELF and PE. The latter seems like we’d be biting off more than we can chew. I don’t know that we want to rely on any tool that requires knowing the full specifications for these. Even patchelf has bugs and when we run into them they are challenging to diagnose for all but a vanishingly small percentage of users. Conversely, modifying the original libraries to support symbol prefixing at build time is pretty straightforward but requires manual intervention for each library. Maybe we should advocate that approach and if it becomes too unscalable we revisit building a tool for mangling after the fact?

3 Likes

I agree with your assessment @vyasr, modifying the code to support symbol renames is cleaner. Another advantage over a tool doing the mangling is that it does allow migrating to separate wheels, with say both packages B and C depending on the same package A with a symbol prefix or suffix. With random mangling at build time that isn’t possible.

1 Like

I think the other approach that could work in principle would be if the build tool itself (Meson, CMake, etc) supported this directly. That would be the hardest solution to implement I expect, but it might be the most powerful if it could be done.

Very useful write-up!

In relation to Option 3b: Environment Variables, that’s the approach venvstacks uses, but it’s only possible because venvstacks ensures both of the assumptions in the “If we were always in a virtual environment … and if virtual environments had well-defined entry points for hooking in to set environment variables we could embed the information in the environment.” qualifier hold.

As you say, it’s not a general purpose solution, it requires the adoption of a specific deployment and execution model to enable it.

2 Likes

Right yeah the challenge is that most of the documented approaches can work in some contexts but they break down in others. It’s very difficult to come up with something that works even “most” of the time.

3 Likes