Shipping common libraries in a dedicated wheel

Hi,

A coworker noticed the tbb wheel provides a set of libraries for use in other wheels. We’d like to replicate that with one of our projects (the vtk wheel) so that other wheels can be published without duplicating VTK’s system libraries (this would not go well with how VTK’s object factories work and symbol collisions galore…). Is there any guidance on how to make such a wheel? There’s also the question of how to provide the SDK so that other wheels can be built upon it (e.g., a paraview wheel) and instructions on how those wheels should be built.

Thanks,

–Ben

2 Likes

Numpy might serve as a sort-of example. It doesn’t use a separate wheel but if does use a separate package and supports being linked against.

You’re unlikely to find clean, officially endorsed solutions to this. The inability to link across wheels is one of the reasons why conda exists (and has to dismantle and reassemble so much to make it happen).

4 Likes

FWIW, we are working on GitHub - pypackaging-native/pkgconf-pypi: Repository to build pkgconf binaries for distribution on PyPI, plus Python environment-aware pkg-config to help with this. It allows packages to register pkg-config paths via entrypoints, and provides pkgconf/pkg-config scripts that automatically add the registered paths to PKG_CONFIG_PATH.

4 Likes

scikit-build-core provides a way to do something similar to the example @FFY00 shared, for projects using CMake.

From the scikit-build-core docs (link)

Finding other packages

Scikit-build-core includes the site-packages directory in CMake’s search path,
so packages can provide a find package config with a name matching the package
name - such as the pybind11 package.

Third party packages can declare entry-points cmake.module and cmake.prefix,
and the specified module will be added to CMAKE_MODULE_PATH and
CMAKE_PREFIX_PATH, respectively.

That’s helpful for other CMake-based builds to find files shipped in wheels at build time. It’s a pattern used in several of the RAPIDS libraries. For example, RAPIDS ships a libcudf wheel (PyPI link) containing shared libraries, with this in its pyproject.toml.

[project.entry-points."cmake.prefix"]
libcudf = "libcudf"

(libcudf / pyproject.toml)

1 Like

That would work for compiling against such libraries, but does it address the use case of shipping a binary wheel that links to C/C++ libraries in another wheel?

No. You would either need to set rpath and hope the packages got installed to the same site-packages, which I believe is what NumPy and SciPy do, or mess with the way it is linked, for eg. by dlopen-ing the library as RTLD_GLOBAL before the extension module is loaded.

NumPy relies on being imported first and exposes its own C API as a PyCapsule of function pointers, which entirely avoids the issue of loading the DLL from the right path. This approach is not practical if one wants to expose e.g. an existing C++ library.

Quick clarification on NumPy and SciPy: there are two unrelated things being discussed above:

  1. NumPy exposes a C API for other libraries to use.
  2. NumPy and SciPy use a scipy-openblas32 or scipy-openblas64 wheel at build time only, in CI jobs, no dependency is present in any metadata. It’s then still vendored in through auditwheel & co.
    • This is quite useful because pip install scipy-openblas32 is much easier than the custom “download a tarball with a prebuilt OpenBLAS binary and get it found by the build system” that we had before.
    • The reason that we don’t use it as a runtime dependency is mostly that it’s not expressable with static metadata (we sometimes need OpenBLAS, sometimes not)

What @ben.boeckel seems to be after is more like (2), but then without the vendoring step. This is possible, however I’d caution that you should only do it if you’re in charge of all the wheels involved, and that you can == depend on the shared library or keep it with a tight version range. Because changing API or ABI is effectively impossible once multiple independent packages that you do not control depend on the sharedlib wheel.

5 Likes

Thanks for all the feedback, I’ll reply to some bits here:

Yes, I suggested conda as well when I was asked about this, but pip install is far easier for users.

While neat, VTK’s proper usage patterns are far too complicated for pkgconf to express in static .pc files. So maybe that’d work for simpler projects, but it just isn’t in the cards for VTK.

This could be promising.

Last I looked, I could not use this because VTK’s setup.py is generated during its configure step. I imagine ParaView would be similarly restricted. The issue is that VTK’s capabilities are not known until it is configured and implementing this in setup.py would mean, AFAIK, reimplementing “what does this set of configure flags mean” which is…not feasible. I don’t think a static pyproject.toml is hopeful either, but maybe there are some dark arts available.

Yes, this sounds close at least. VTK has…terrible ABI stability given its oodles of configure flags that make a single wheel with varying contents. API also gets dropped fairly aggressively (e.g., 9.3-deprecated APIs can be removed in 9.5). I would expect consuming crates to always use == dependencies against the wheel version being provided. This means that downstream packages depending on VTK and, say, PyVista, would need to coordinate which one to use based on the other…somehow.

It is possible technically to put a shared library in a wheel installed anywhere on PYTHONPATH, and then for other wheels to use it. I think it would be cool if someone put together the pieces to make it happen. So far no-one has bothered to invest the time. But https://github.com/njsmith/wheel-builders/blob/pynativelib-proposal/pynativelib-proposal.rst lays out a chunk of how to do it, except that when I wrote it I didn’t yet know how to make it work on macOS, and then https://pypi.org/project/machomachomangler/ has the necessary obscure nonsense to make it possible on macOS.

3 Likes

Right, but pip is easier because it doesn’t support this stuff. That was the point. So you’ll inevitably end up with something that’s a bit unreliable and feels like a bit of a hack.

IMHO, the best you can do is write (and maintain!) a wrapper library and ship that in a wheel (statically linked to the actual one). That way you can preserve the interface while updating the underlying library, and consumers will link to your DLL/so or use your extension module rather than the original one. It’s more work for you, the publisher, but it absorbs a lot of inevitable work/stress from the libraries that depend on you or the users who try to get them to all work together.

3 Likes

My contention is that people consider it easier only because people like you keep saying that line. There is no actual reason to consider it easier. conda install is not harder than pip install, and in both cases you would have created an environment first to host those dependencies (that’s conda create vs. python -m venv).

So, please step back a bit and consider that you could just stop spreading misinformation about conda.

4 Likes

That’s a bit unfair to assume someone only says conda is harder because other people say conda is harder. I personally consider conda to be much harder to use than pip. I don’t want to derail the conversation with why but I will say it’s from both my own experience and experience in coaching other people out of the messes they have gotten themselves into – nothing to do with blindly echoed hearsay.

Ok, but were those issues reported upstream [1]? Or are those even issues with conda, rather than conda-packaged software? And, if the latter, can it be attributed to flaws in the very design of how conda packaging works?

People are having tons of issues with pip-installed software as well, including because of flaws or limitations of the wheel format, so it could be easy to wrongly attribute responsibilities.

I could point you to the latest illustration of flaws in the wheel format, that led me to spend days grinding on Windows wheel production for PyArrow, where I ended up extensively patching delvewheel to finally solve the issue. Yet I did not see users go around and claim that “pip is difficult to use” because of that (even though they were getting inscrutable crashes due to a problem that by construction does not exist with conda packages).

So, yeah, I do think it’s matter of perception and heavily-biased discourse by people for whom pip’s flaws are just natural facts of life that don’t deserve mentioning, while anything that conda does imperfectly is used as an excuse to disparage it publicly.

(note: I do not dispute that it’s possible for users to have difficulties with conda. I’m disputing the way these difficulties are represented publicly, especially when compared with pip’s own failings)


  1. I’m not claiming that conda is perfect, and I do think that some things could be improved. ↩︎

4 Likes

Basically, it comes down to “customers would prefer to use wheels for reasons X, Y, Z”. This thread is about investigating the feasibility of that. Personally, I find it easier because venv is “everywhere” whereas conda is usually more steps to set up. I’ve also found it easier to use venv to deal with the “and just one more” problem whereas conda (historically) has more or less asked “please rebase all of your deps through conda’s packages”. Wheels not supporting SDKs is probably a key part of this being workable in practice. Other things that may be related to this sentiment:

  • setting up a Python Package Registry is well-supported in GitLab and Github; I’ve not seen such for conda myself
  • updates are easier (maybe? I’ve never submitted a conda package update)

But if the answer is “sorry, wheels just aren’t up to the task of shipping a usable VTK SDK”, then that’s the current status of the world. Perhaps it’s just then down to docs and education of “sorry, conda accommodates VTK better” at that point.

Well, at that point, we may as well just use that API as the VTK API. But there are almost 3500 headers in VTK…there’s no way we can just stop things to make “stable” wrappers for that. Though I do want to stabilize the API more (e.g., I finally got a proper deprecation treadmill implemented and used tree-wide), ABI stability is not something the codebase is even remotely prepared for.

That said, I would love to have generated C bindings someday so that we can hide all kinds of ABI machinations (e.g., moving data members to PIMPL) at least at that level, but I have no idea when time and opportunity to do so might arise.

With my mod hat on, I’m now officially declaring all posts about “conda is better/worse than pip” as off-topic. I will hide any future posts here that go down this route. Please start a separate topic if people really want to further the discussion.

5 Likes

It sounds to me what you’re actually trying to do is package VTK for PyPI (in the conda/homebrow/linux distro sense)? Perhaps what you should do is look at how it is packaged by other distros, and see what features they enable and how they handle the shifting API/ABI? There may be design decisions they’ve made based on their experiences with the VTK upstream which would provide some guidance on what to expose and what not to? If I were to do this, I’d try to copy them and map their concepts to wheel concepts.

There’s also the question of who are the target users for this (and what are their use cases)? On shared systems users may want to use (or be required to use) the system VTK, and making hard to do that (because you’ve built a ecosystem which forces your VTK version) may cause more issues than providing a sdist which fails in predictable ways (and hence can give meaningful error messages)?

It’s also worth remembering there are other ways of getting non-Python dependencies than conda: spack, homebrew/fink/macports or the underlying linux distro likely also have VTK, and for tools like paraview they may be a better place to source it?

1 Like

I have an implementation of this. Packages can register libraries via entrypoints, which then a startup hook dlopens using the absolute path, putting the libraries in the linker table, making them available for future resolutions.

This seems to be working pretty much everywhere other than musl, which needs a workaround because dlopening an absolute path doesn’t make objects loaded via the absolute paths available for the DT_NEEDED resolution (see musl - Satisfying DT_NEEDED from previous dlopens with explicit path).

3 Likes

Huh I hadn’t thought of using entrypoints for this, just because the “I need a specific library available” made me think of plain “import”. What extra stuff does using entrypoints get you?