`pkgconfig` specification as an alternative to `ctypes.util.find_library`

Hi all,

I’ve posted this to poetry, but was directed to PyPA, so I’m reposting here.

Background

Python has historically used runtime functions like ctypes.util.find_library by invoking ldconfig and gcc to get the flags and libraries for native dependencies.

Not only does this cause performance issues on startup, but it’s also extremely inconvenient to provide such information at runtime when it’s primarily a configuration matter.

There are tons of patches in nixpkgs to support this. Most importantly, it makes the whole Python development experience on Nix(OS) nearly unusable.

Moreover, setting LD_LIBRARY_PATH to discover the paths at runtime causes numerous issues. As a result, we’re exploring using manylinux at devenv.sh, but we’d like to address this issue properly within Python packaging.

Implementation

At the time of installing a package, the pkg-config executable would be invoked to collect metadata from .pc files.

The resulting metadata would be stored in a parseable format as the primary source for ctypes.util.find_library (retaining the current logic as a fallback). This would also serve importlib_metadata for querying the information at runtime.

Proposal

We propose supporting a pkgconfig section in pyproject.toml itself, inspired by the autotools interface.

tomlCopy code

[pkgconfig.dependencies]
gtk-2.0 = {version = "^3.0", platform = "darwin"}
zlib = "*"

Dependency alternatives

At times, there might be a need to fallback to another library, so we can accommodate that as well:

tomlCopy code

[pkgconfig.dependencies]
libsodium = {version = "*", alternative = "openssl = *" }

Runtime querying

importlib_metadata would be augmented to include pkgconfig metadata.

Open Questions

How should we address cases where pkg-config is unavailable? Consider how this impacted the cabal implementation.

Note that I’d like to gather feedback, hopefully we can turn it into a PEP.

Domen

2 Likes

pkgconfig · PyPI exists to locate and parse the files. I’m not sure what putting this in the metadata gives you though (were you expecting something would install these packages?), and you probably want to use the pkg-config file for specific shared libraries, not for the whole package. I would presume that the backends which wrap cmake, meson and similar can already do this (via their native support), and the above PyPI package could be use for the pure-python backends (e.g. setuptools)?

1 Like

I’d like to have the specification be part of pyproject.toml, which would benefit from:

  • Having a clearly documented declarative (parse-able) way of specifying native dependencies. Tooling like Nix can then automatically parse it, convert the library to Nix specific one and provision it.
  • Vendoring pkgconfig library, simplifying setup.
  • Providing runtime API to discover what metadata pkgconfig did return, so that library can be loaded on-the-fly using Python code, replacing the current find_library logic.

The use case is any Python code that depends on any native dependencies, for example libcairo.

3 Likes

Hello!

I think your proposal needs a lot more text to explain the problem in details and each part of the proposed solution.

OK, this is a clear start. Sounds like it could make sense to add support for pkg-config to ctypes.
(Even though ctypes is intrinsically fragile and not recommended for production code)

These change (or extend) the meaning of the string parameter passed to find_library/LoadLibrary, so unsure if they could be taken upstream as is.

I don’t really get the meaning of the code in the ticket linked here, and haven’t read the «numerous issues».
It would be useful to mention them briefly in your proposal, and to explain what manylinux is adding or solving for using ctypes.

This is a jump! As you probably know, in Python the packaging / install machinery is quite independent from the import system (importlib.metadata and importlib.resources are quite recent packaging helpers in the importlib namespace), and ctypes currently has zero specific relationship with packaging (contrary to cffi for example which has utilities for people packaging their native lib). The proposal is now much bigger than the length of the post suggests: it wants to define a new packaging metadata, and add a level of integration between native libs and Python side that has no precedent, and create requirements for install tools and a runtime library, and also define the input for the metadata in pyproject. This simple proposal is actually a very tall order, and could be 3 different PEPs.

But I am not saying this to discourage you, rather to consider a change of tactics!
Can a custom build backend (see The Packaging Flow — Python Packaging User Guide) be written to run pkg-config (if available) and save the pc file as data?
Can a runtime library query that data and wrap the call to LoadLibrary?
In short: could this be done as third-party tools to demonstrate success of the concept and implementation first before trying to write new specifications?

5 Likes

Would this in any way support Windows?

2 Likes

Commenting here to say that I’m interested in identifying binary/vendored dependencies in built wheels, so this seems related to that! Will be following along with the discussion when I have more time next week :slight_smile:

1 Like

I don’t think the SBOM-esque information would be encoded in this format. This is information about “what to look for”, roughly analogous to an unpinned requirement. What an SBOM-esque need for binaries in dependencies needs is the logical equivalent of a pinned dependency in this context.

(I’m assuming the interest in identifying binaries in a wheel is coming from wanting SBOM-style information)

That can’t really be generated by this information outside of some cooperation by the build backend, at which point it’s better to investigate that angle directly instead IMO. :slight_smile:

2 Likes

find_library is problematic because:

  • It requires a compiler or linker at run-time. This inflates the run-time closure size by a large margin (think containerized environments or embedded devices). Yes, I know it falls back to searching LD_LIBRARY_PATH, but…
  • There is no way to specify which version of a library to use. If you want to use a custom libfoo.so, different from the “system libfoo”, then some serious black magic is required.

The workaround for this in Nix and Guix is to patch out ctypes.util.find_library calls with absolute /bar/baz/libfoo.so references at build time. This is a high barrier to entry for someone who just wants to package their favorite Python library.
(it’s possible to automate this patching of course, but that’s beside the point…!)

One solution to this would entail expanding pyproject.toml with a section for native dependencies (similar to Meson dependency('foo')). These would be probed at build time, saved as metadata (with absolute file names), and queried by find_library at run-time.

The probing for libraries at build-time is probably best served by pkg-config, as it includes things like CFLAGS if that should be necessary, but something akin to the existing find_library logic would work for the majority of cases (provided it can return an absolute file name).

Perhaps pkg-config support could be added as a separate PEP.

There is also an issue of building Python packages with native extensions, and make those discoverable to other packages (one suggestion in the Poetry thread was to expand site-packages with a pkgconfig directory), but that is probably again a separate PEP.

1 Like

We definitely need a solution for this with nix/guix. Aside from what has already been mentioned:

  • find_library is part of the standard library. It should be usable also by code not distributed as a package.
  • there can be more functions doing something similar, e.g. cffi.FFI.dlopen. Those should ideally be covered as well

Also, as mentioned in the poetry thread:

  • pkgconfig is not meant for this

All we really need is a mapping where the key is some identifier and the value the library name. At build time that mapping would be written to some file and stored in the wheel. At run time, some function is used that uses the key to obtain the library name. In nixpkgs we would patch the created file when creating the wheel.

What does this technically require?

  1. defining the format for declaring the mapping
  2. build backends to support this and create the file
  3. a library function to retrieve the path given the identifier

But I think this is the least of our problems. The biggest issue I see is, what does this give the average user? We need this for nixpkgs, but for non-nix/guix users, what’s in it for them? Why would they go through the hassle of declaring a mapping and using some alternative function for retrieving the filename if they can just use find_library directly?

Thus, while this is an issue, I think this is more of a nix/guix issue and not a general Python packaging issue. And looking at how often it occurs, I don’t think it is such a big issue anymore. Once a package uses a native dependency they don’t often change. It’s just that new contributors/users need to know about it.

Note declaring non-Python dependencies for packaging is a related but different issue. That would have more value for users in my opinion, for both non-nix/guix and nix-guix.

1 Like

A question here, given that @domenkozar’s initial post touched on both ctypes specifically and the wider problem of dealing with native dependencies. If the problem is specifically ctypes, then I think mixing in pkg-config as a potential solution is probably premature. However, if the problem is primarily “allow for better build-time dependency resolution of native libraries, so we can stop using more fragile runtime solutions”, then yes pkg-config is the most important part of the solution puzzle here. And ctypes is only an example - and not even the most important one. Which of these two problems is it for Nix?

pkg-config works fine on Windows, and can be installed with at least Chocolatey and conda-forge (there may be other standalone binaries or other distributors too, not sure). In fact, SciPy uses pkg-config for dependency detection of BLAS and LAPACK on all platforms. And would like to extend that usage to other shared/static libraries needed, e.g. the ones shipped by NumPy.

The primary problem with use of pkg-config right now is that there is no place to install the .pc files to that is useful. For context, .pc files are simple metadata files that are read by pkg-config and they contain info like the include and libraries directories, and the needed flags to link to the library when you (for example) build a Python extension module. pkg-config has a search path, but wheels don’t install files to anywhere on that path. The path can be extended with a PKG_CONFIG_PATH env var though, so any well-defined place will help here. It’s basically the same problem as for header files, and also for .cmake files.

The structure is normally (under an install prefix - and please ignore for now that Windows is a little different, that doesn’t matter yet here):

include/
lib/
  pkgconfig/
  cmake/

Here is an example of a .pc file from the SciPy docs: BLAS and LAPACK — SciPy v1.12.0.dev Manual

Okay, @pradyunsg and I have a PEP for this that’s basically been 95% ready for several month. I really should go do the last 5% and submit it for review now - hopefully next week, after the “get out Python 3.12 pre-release wheels” rush is over. It already contains an example of depending on pkg-config. I don’t want to jump the gun here and link to it now before the last bits are done, but I’ll send you a link @domenkozar to see if it addresses part or all of what you had in mind.

So that is the “declare a dependency” part. The “use of pkg-config” is a build system thing - build systems like Meson and CMake already have support for this. I don’t think setuptools does natively, but it’s easy to do ad-hoc (e.g. run pkg-config --cflags zlib in a subprocess call and it will return a string with the flags you need).

The “how to expose the metadata for pkg-config to use as a Python package providing a library” is the unsolved part. Mostly the .pc file install location, but I think there’s a small twist in that the package installer should write the absolute path of the install location into the .pc file since they’re not relocatable.

The ctypes part is then probably more an adoption thing probably; even if everything works smoothly with pkg-config, the use of find_library & friends isn’t going to go away overnight. However, my impression is that the use of that is only a small part of the problem; most Python packages that depend on native libraries do not use ctypes to make that work.[1]


  1. They do use a host of other hacks that involve running Python code at build time, and this breaks cross-compilation usually. Some examples at Cross compilation - pypackaging-native. ↩︎

7 Likes

I think for nix’s specific problem that has been raised here (handling ctypes.util.find_library), the correct solution would be for nix to provide patched Python nix packages which override the ctypes.util.find_library function to call out to a nix search function and return that path (the function already shells out, so I’m not sure that there much of an issue there). I’d imagine if you can easily check whether you’re on nix or not, such a patch could be upstreamed into ctypes.util.find_library. It would also not couple a fix to the timeline of native dependency PEP.

1 Like

Hey!

OK, this is a clear start. Sounds like it could make sense to add support for pkg-config to ctypes.
(Even though ctypes is intrinsically fragile and not recommended for production code)

Relying on pkg-config would make it less fragile, I do agree it’s fragile to instrospect file system state at runtime though. Note that CFFI is more or less in the same messy world.

This is a jump! As you probably know, in Python the packaging / install machinery is quite independent from the import system (importlib.metadata and importlib.resources are quite recent packaging helpers in the importlib namespace), and ctypes currently has zero specific relationship with packaging (contrary to cffi for example which has utilities for people packaging their native lib). The proposal is now much bigger than the length of the post suggests: it wants to define a new packaging metadata, and add a level of integration between native libs and Python side that has no precedent, and create requirements for install tools and a runtime library, and also define the input for the metadata in pyproject. This simple proposal is actually a very tall order, and could be 3 different PEPs.

The proposal decouples two stages:

  • collecting what dependencies we need (configure/build stage)
  • using the dependencies (runtime stage)

I understand that would need to be multiple proposal with the current process, but that’s suboptimal as these two things belong together.

But I am not saying this to discourage you, rather to consider a change of tactics!
Can a custom build backend (see The Packaging Flow — Python Packaging User Guide) be written to run pkg-config (if available) and save the pc file as data?
Can a runtime library query that data and wrap the call to LoadLibrary?
In short: could this be done as third-party tools to demonstrate success of the concept and implementation first before trying to write new specifications?

This could be done completely separate, but I’d like us to solve this at the very core of Python.

If that requires an existing implementaiton we can make one of course.

This is really good news, thank you for all the work here.

I’m not sure about the benefit of decoupling of native dependencies from the method of how they are discovered.

If you declare generic/openssl, there should be a specified way how that information is collected, otherwise you end up with a leaky abstraction that will yield another set of issues.

I’m not sure why a separate build backend is needed for things like native dependencies (pkg-config), it’s more of a way how dependencies are collected, not built.

That won’t work exactly, it needs to be a two-step process.

Imagine you’re building a container, you want to collect the native dependency information at build/configure time, so that when you do build the container the correct dependencies are injected.

Search function at runtime means you’ve delegated dependency resolutions from build time to runtime, which is what I’d like to avoid.

I’m confused then. I though the issue was resolving which library to load (as I’m aware of nix’s design of not having a single /lib), whereas you seem to be saying the issue is dynamically resolving itself. I’d suggest then you’re trying to replace ctypes with something else, given I’m not sure why (given the alternatives that exist) someone would use ctypes unless they were interested in resolving and loading libraries dynamically?

No, I was not referring to find_library, but to the fact that ctypes is very easy to mess up (jokingly called the fastest way to make Python segfault.

There are several reasons to do this:

  • Keeping proposals scoped tightly and on a single topic makes it easier to move them forward,
  • There are a number of types of users of metadata on native dependencies; things like static analysis tools will only need the metadata itself, not the dependency discovery at build time or runtime,
  • There are multiple ways of doing dependency discovery. pkg-config is the most prominent one, but certainly not the only one. I think it should be better supported, but I wouldn’t want to venture into making its use mandatory at this point. As a package author, I should be able to write something like dependency('mylib') in my build config files, and have the build system use pkg-config, CMake, configtool, or its own custom code to act as the “dependency provider” to discover mylib.

A separate backend isn’t needed for using pkg-config at build time. Any good build system/backend dealing with compiled code should be able to do this. If it’s about using pkg-config with ctypes.util.find_library at runtime specifically, then no build backend should be involved at all. find_library already has platform-specific behavior, and its semantics are pretty vague, basically “do your best to find this library”. pkg-config could be added to that directly as one of the ways to try and locate the library in case pkg-config is already installed. That would be backwards compatible and a sensible thing to do.

I agree, with the note that if the patch tries to use pkg-config when installed, it should be more generic and work for Nix and a number of other distros/scenarios.

I had a look at the list of patches @domenkozar linked to. The first one I recognized is psycopg, and that’s a great example here. It ships only a pure Python package/wheel, does not declare any dependencies in its pyproject.toml, but in its README has sudo apt install libpq5 and does this inside the package:

libname = find_libpq_full_path()  # uses ctypes.util.find_library
if not libname:
    raise ImportError("libpq library not found")

pq = ctypes.cdll.LoadLibrary(libname)

That is indeed pretty fragile. However, I think we indeed have all the ingredients to fix the issue:

  • PostgresQL (which provides the needed libpq) has to be declared as an external dependency in pyproject.toml[1],
  • distros can then use that info to have their packaging tooling add postgresql as a runtime dependency of psycopg (or at least, the packager has an easier time when reading the metadata and handling it manually),
  • this may already be enough for many distros (and indeed, on Arch Linux, ctypes.util.find_library('pq') returns a path to libpq.so.5)
  • distros with non-standard path can implement a rule to automatically add pkg-config as a runtime dependency too if there are any external dependencies in pyproject.toml,
  • that will fix the issue for Nix when find_library is actually trying to use pkg-config in addition to its current methods.

I think it’d be feasible to extend find_library now. Where it hits return None because it can’t find any library, the code could check if pkg-config is installed and if so, run pkg-config --libs libname in a subprocess call, then if that return a -L/path/to/some/libdir check that dir for the library. It’s pretty straightforward and should be backwards compatible, so would that even require a PEP?


  1. I’m not very familiar with PostgresQL, it’s possible libql is its own thing in all distros and that should be the name of the dependency instead. I see that in Ubuntu, it’s libpq5. ↩︎

3 Likes