Enforcing consistent metadata for packages

That is about right yes.

@oscarbenjamin’s problems of “this is a lot of effort and suboptimal” are very much related indeed, but at least the “effort” part won’t go away by dropping the sdist-wheel equivalent metadata constraint. By the way Oscar, your problem is quite interesting/complex, if you’d want to add it to a subpage of Native dependencies - pypackaging-native, that would be quite nice I think.

Agreed, at least for platforms for which wheels are also available. On niche platforms (PowerPC, IBM-z, AIX, armv7, etc.), there are very few wheels and users tend to know enough to help themselves.

If we wanted to, we could still attempt to resolve that by separating those two out. I’ve already gotten into the habit of annotating dependencies with code comments already to spell out these differences. E.g.:

# Upper bounds in release branches must have notes on why they are added.
# Distro packagers can ignore upper bounds added only to prevent future
# breakage; if we add pins or bounds because of known problems then they need
# them too.
build-backend = 'mesonpy'
requires = [
    # The upper bound on pybind11 is pre-emptive only
    #   3. The <2.3 upper bound is for matching the numpy deprecation policy,
    #      it should not be loosened.

it’s a small step from there to a second set of metadata.

It’s initially about size, but maybe the core issue is the difficulty managing “duplication between hard to build shared native dependencies with fragile ABI” as wheels and at the same time having to support them as sdist in OSS.

The difficulty for numpy and scipy to have consistent metadata between sdist and wheels makes it impossible for them to unbundle things like BLAS. Nvidia and cuda don’t have the same constraint because they don’t ship sdist.


Yes, that’s a great summary @groodt. We can avoid the problem by either not publishing sdists at all, or marking dependencies as dynamic. Neither would be good outcomes I believe.


Sorry, I’m still trying to paint the mental picture here. I feel like I’ve talked around in circles, and I don’t know how much of that is me confusing myself vs. feeling like I haven’t made myself understood.

The idea, as I understand it, is that hypothetically there would be a separate package that’s just BLAS, pre-built, no Python wrappers or anything, so that both Numpy and Scipy can depend on it (using their wrappers to interface to it). Similarly, one could imagine a separate package that’s just pre-built binary for libgmp, so that both gmpy2 and python-flint can depend on that.

And this separate package would be wheel-only, since there’s nothing that could practically be built on the user side; essentially the wheel exists as a way to get a specific DLL into place in the user’s environment. And the library would be built to however many different ABI standards, and each result would be in a separate wheel, and the wheel tag is enough information to know which ABI it’s for. Right?

And then, there’s already a process to create these built-for-different-ABIs versions of the DLL (and then use auditwheel/cibuildwheel to bundle them), and indeed there’s already proof of concept for the scipy-openblas32 wheel.

But the problem is that Pip can’t be instructed (via the metadata, which is why it’s come up in this thread) to choose the correct, i.e. ABI-compatible wheel?

I’m going around in circles trying to understand why wheel tags are or are not sufficient information to decide ABI compatibility. It really seems like they ought to be, but then the tag names don’t look like what I’d expect them to look like for that purpose. They say things like cp312m and abi3 (i.e., describing CPython-specific symbols), not glibc or musl (i.e., describing calling conventions, type sizes, struct alignment and padding rules etc.). I guess Pip is relying on the platform tag for the latter: i.e. manylinux is an abstract platform that assumes glibc’s conventions, which is why it isn’t anylinux.

But when I look at it that way, the confusion persists. Suppose I could hypothetically pip install scipy-blas-sold-separately; Pip knows what my platform and Python environment require, and chooses the specific wheel for that project that was built with wrappers that will work for me. Now it needs to find a corresponding blas-in-wheel-form dependency that will be ABI-compatible with the wrappers. Why wouldn’t using the same platform/environment-based logic (along with specifying a specific version for that wheel, and setting up a correspondence between wheel versions and versions of the underlying BLAS library) necessarily find the right wheel? There’s only one set of calling conventions my environment can possibly support, right? And the actual ABI symbols (or at least the subset that the main package will actually use) are 1:1 with the functions and structures defined in the API (same caveat), which in turn are determined by the BLAS version, right?

This is basically what conda is doing, as I understand it. But that required taking responsibility for the packaging story of far more than pypi. And even then, I think there are lots of awkward corner cases (e.g. I have numpy and openblas installed in an env but I don’t know if they’re talking to each other or numpy brought its own copy anyway)

That’s correct. It is not really a pip limitation but rather a packaging standards limitation since pip just follows the standards here. The standards do not provide a way to encode ABI compatibility.

There is no way to encode ABI compatibility except to build the binaries together. When you build GMP the first step is ./configure --optionA --optionB and the options passed determine the binaries and hence ABI that results. Afterwards compatible binaries can be built using the generated header files but there is no usable way to describe which ABI you have besides compiling a C program.


That’s the part I don’t understand. If ABI compatibility is not encoded by the wheel tags, then how is it possible for Pip to obtain a wheel for anything with non-Python code at all that is ABI compatible with my Python installation? Why doesn’t an identical problem apply there, to the one you describe? Equivalently: why doesn’t the fact that Pip can solve this problem, prove that yours is solved the same way?

I don’t understand this. My understanding is that the things that actually matter - the variables that define the “ABI” - are the data structure definitions and the calling conventions. But every wheel that depends on version X of GMP should know what the struct members are (and their order) because they’re unique to the GMP version. And the calling conventions, type sizes/endianness etc. should be determined by the platform (OS, compiler, hardware). And between the struct members (defined by the header) and the type sizes/endianness/struct member padding, the data structure is completely defined at the binary level. And the wheel tags which platform it is. Therefore, if you choose the wheel for the current platform, for the version you want, it must be the same ABI, because everything that matters is as it needs to be.

Where exactly is that wrong?

But the ABI may change based on compiler flags or configure flags as well.

What the various tools like auditwheel do is mangle the ABIs of non-Python libraries such that each wheel is self-contained, so there is (almost) no cross-package ABI to take account of. Numpy is a rare example where there is cross-package ABI use, and the transition to numpy 2.0 has been a massive coordination task.

1 Like

It might be worth explicitly sketching out the relevant parts of a concrete wheel METADATA file and an sdist PKG-INFO file for a hypothetical scipy → scipy-openblas32 scenario.

A scipy wheel → scipy-openblas32 wheel scenario and a scipy sdist → scipy-openblas32 wheel scenario for completeness is probably enough to get everyone on the same page.

1 Like

There is an identical problem there and that is what the ABI tag in wheel filenames is for. I don’t understand exactly what is required to solve that but CPython jumps through a lot of hoops to make it work in so far as it does.

I should probably add here that libraries like GMP and BLAS are potentially more difficult because unlike cpython they do not use “generic C”. Core routines use hand-rolled assembly code for different CPU architectures. Both gmpy2 and python-flint build GMP with --enable-fat which will bundle machine code for as many architectures as possible and libgmp will select the right code at runtime.

These things are not unique to the version. The configure script can change the struct definitions. For example it is possible to build MPFR and Flint with --with-mpir to use the MPIR library instead of GMP. Then all of MPFR’s and Flint’s structs will use the MPIR types instead of the GMP ones and even the DLLs would reference different other DLLs. This is in fact what is done for the conda-forge python-flint package unlike the PyPI wheels so that particular ABI incompatibility could emerge in the wild if someone mixes packages from conda and pip.

In general the configure script can do anything though just like a setup.py could open up the .py files and change them at build time. If you want you can even use the configure script to insert a backdoor into a user’s SSH daemon. Downstream users might complain about backdoors but if you want to fiddle with struct layouts then that’s considered normal.

I’m very sympathetic to reducing binary size (the huge data science/ml wheels are a huge problem for us too!), but from uv’s perspective, it’s all but impossible to write reasonable lockfiles if we can’t assume consistent metadata between the source dist and the wheels of a release, and i think that applies to poetry and pdm too.

I’m happy to add additional logic for sharing libs between packages, but this would need to be on top of the agreement on metadata consistency (like PEP 725); I believe this should be discussed separately from this topic (i already have trouble following this thread for things that are actual blockers re agreeing on metadata consistency)


Precisely this - wheel tags “sort of” solve this by a combination of careful definitions, compromises, and an acceptance that sometimes “it sort of works” is good enough. The “perennial manylinux” spec (PEP 600) is a good example of this - it doesn’t encode everything about the platform - instead it defines a lowest common denominator and leaves it to projects to deal with anything beyond that. Some of how that gets dealt with is by bundling, which has its own problems as we’re discussing here, but it’s certainly possible to do something similar at other levels.

This is the sort of thing I mean - a GMP wheel might use --enable-fat and state that it’s only valid when that option works. We can’t encode that in tags, so we’re stuck with a compromise, where it’s “good enough” when it works, but gets messy when it doesn’t.

I don’t know if this is something we’ll ever be able to fix. ABI compatibility is a hugely complex problem, and as far as I am aware no-one has ever solved it without some form of “take control of everything in the stack” solution. We already have solutions like linux distros and conda which take that approach, and the fact that people still want to use wheels means that there’s a significant part of the user community (too significant to ignore) who aren’t willing to accept those solutions.

There’s a social issue here as well, which is that users are a lot less tolerant these days of solutions which don’t always work. And that translates into pressure on maintainers to solve everything, and corresponding pressure on the ecosystem to be perfect.[1]

Please understand - I’m not dismissing the difficulties here, or claiming in any way that the problems being described are overstated or easy to solve. This is a hard, possibly even impossible, problem to solve. And the impact on the original question here is noted.

I think the direction of this discussion suggests that you can assume that, but we’re unlikely to come up with a standard in the near future that guarantees that your assumption will never be wrong. So you may well need to consider how you deal with that possibility.

  1. A historical anecdote here - I used to spend a lot of time hunting for Windows equivalents of all the Unix standard tools - coreutils, etc. I was forever hitting crashes and weird bugs because libintl was distributed in a variety of binary-incompatible forms. This was particularly frustrating to me as I didn’t personally even need internationalised versions of the tools. In those days, that was just “business as usual” and I learned how to hack around the problems. Nowadays, it would probably trigger bug reports and “why can’t they get this right” posts :slightly_frowning_face: ↩︎


I think it’s worth making a distinction between the metadata being identical between all wheels (and the sdist that generated them), even when dynamic is specified, and wheels having different different metadata where the only difference is fields marked as dynamic. The former seems to undercut the purpose of dynamic, and does not seem reasonable in the presence of non-pure-python code (and will only make metadata worse if enforced, as packages will not include correct metadata), whereas the latter would seem to give installers a chance to correctly handle the situation.

There would seem to be 4 cases if we choose the latter form of “consistent” metadata:

  1. The metadata format used in the sdists and wheels is new enough such that the generator (such as a build backend) of said metadata would have been able to accurately reflect and check for the use of dynamic (and such checks could be done by PyPI as well), and the sdist has no dynamic data. This allows installers to assume that all the metadata will be identical, and can use appropriate fast paths.
  2. The metadata format used in the sdists and wheels is new enough such that the generator of said metadata would have been able to accurately reflect and check for the use of dynamic, and the sdist does have dynamic used. Installers could error if they want identical metadata (but that should be documented by the installer, and users may then choose alternatives), or use the appropriate wheel (if available), and if locking track the different known paths on different system.
  3. The metadata format used in the wheels is new enough such that the generator of said metadata would have been able to accurately reflect and check for the use of dynamic, but there is no matching sdist, and the wheels differ in their metadata. This is similar to case 2 (and could be handled in a similar way), but there’s no way to confirm if the metadata was supposed to dynamic or not.
  4. The metadata format used in the wheel (and sdist if available) is pre-2.2, and so the metadata may or may not be dynamic. Similar to case 3, but is effectively the “legacy releases” option that will be a smaller part of the ecosystem as times goes on (whereas the other cases are effectively a choice of the maintainer of the uploaded package).
1 Like

Fair enough. I’d like to see at least an agreement in this thread that in principle some form of extra metadata to allow splitting large binary wheels in separate parts is desirable. Otherwise we’re just going further down a road that makes dealing with this probably more important issue later more difficult.

This thread nor PEP 643 have any details on this, so let me ask a few questions:

  • Don’t you anyway have problems if you produce a lockfile that ends up including an sdist for any non-pure-Python package? When you install from such a lockfile, the build process is likely to fail or produce a broken wheel.
    • Is there a way to prevent such lockfiles from being created by the user (like an --only-binary flag)?
    • Or is the problem not that you’re using sdists, but that the sdist’s metadata is the thing you end up using through the PyPI JSON API?
  • I’m curious what happens when the set of packages to install contains a single package where dependencies are marked as dynamic. Can you still produce a lockfile at all?

Agreed, it’s probably unfixable in general, and a bit distracting in this thread. I’m only interested in splitting wheels in such a way where the wheel/package author takes responsibility of both (or multiple) pieces and coordinates keeping those pieces in sync. This is hard enough already - but way easier than dealing with the general ABI compat issues.


It is not clear to me what constitutes a reasonable lockfile. Is the lockfile expected to pin exact file hashes for all wheels that would be installed?

Let’s take the simple case and assume that we are only talking about packages that have wheels and sdists on PyPI. If running pip install A B C can resolve the dependencies and install the wheels then why is it not possible to put the file hashes of the wheels in a lockfile?

If the intention is that the lockfile should work for all platforms then the locker will need to resolve separately for each OS etc anyway because of environment markers. So is the problem just about knowing where to look for conditionality that would affect the resolution?

If the issue is just that locker tool wants the sdist to contain all information needed for locking then is it not sufficient for the sdist to declare some things as dynamic? Could the locker tolerate something in between fully static and just an open-ended “dynamic”?

Is the intention that the lockfile should be usable to build everything from source? That is going to require a lot more than pinning PyPI artifacts for the packages that we are talking about.

While I’m fine with getting a bit more explanation on this question, can I just ask that we don’t go too far down this route? How lockfiles work is not the topic of this thread, and I don’t want to end up with important information about lockfiles being spread over multiple threads (the lockfile proposal thread is already complex enough :slightly_smiling_face:)

It’s true that one of the prompts for this question was the fact that lockfile producers are assuming consistent metadata, and hence “can we make that a guarantee” is a natural thought. But if we can’t, then how lockfile producers can do their job is a question for the lockfile proposal. And the answer may be simply “well, assume that if you need to but the risk’s on you because the standards don’t give you that guarantee”.

I’ll make a similar comment here. I’m happy for this thread to come to the conclusion we shouldn’t try to enforce consistent metadata without finding a way for that rule to co-exist with a solution for splitting large binary wheels. But any discussion on what options there are for splitting large wheels is a separate matter, and should have its own thread.

I’m currently trying to work out whether any other reason has been given yet in this discussion for why enforcing consistent metadata wouldn’t work. I think all of the problems boil down to some variation on bundling shared library dependencies, but I plan on going back through the discussion once things have settled down to see if I’ve missed anything else.

1 Like

Sure, let me answer them piece by piece:

When you install from such a lockfile, the build process is likely to fail or produce a broken wheel.

In my experience, no. On debian/ubuntu the build-essential git cmake is normally sufficient to build source dists. There are some famous different examples, e.g. database drivers that need the database to be installed for the headers, but they are usually well documented. Every now and then you have to install some other -dev package for a header. I never had the build process produce a broken wheel.

Is there a way to prevent such lockfiles from being created by the user (like an --only-binary flag)?

Yes, --only-binary :all:.

I’m curious what happens when the set of packages to install contains a single package where dependencies are marked as dynamic. Can you still produce a lockfile at all?

We build the source dist once, and we assume the metadata is consistent with future builds. If we can’t make this assumption, we can’t produce lockfiles at all: imagine we locked foo with a dep on bar, but at install time foo suddenly wants baz too, we don’t have a version for baz and baz’s deps might conflict with the lock.

Note that these problems apply to all resolvers that cache, even pip-tools and to some extend even pip!

pip-tools creates a requirements.txt assuming that metadata from a build source distribution remains consistent. While requirements.txt technically is only valid for the exact PEP 508 env it was created in, in practice it is often used across python versions and platforms, enforcing a certain level of consistency.

For pip, let’s take mysqlclient 2.1.1 as example. This package has only source dists on linux. If i do pip install mysqlclient==2.1.1 on pyhton 3.11 on my linux machine it works, even though i’ve already removed the mysql header needed to compile the package. Pip uses a wheel from the cache, matched by afaik only the wheel tags. If the metadata was truly dynamic, pip would need to stop caching wheel builds and rebuild on every single installation. Any sort of backtracking resolver would stop working since we can’t assume that the build has the same deps in the resolved env as in the build env during a resolution step. I believe this is why the assumption holds in practice: pip already effectively enforces source dist consistency.

Is this truly blocked on the resolver? What blocks you from specifying shared library provider or other large file provider as regular dependencies and adding an install time mechanism from hooking them up? (I think that’s effectively what conda is doing, doesn’t it effectively declare shared lib deps on package and then controls the library loading path in the env?)

Fair enough. Let me just clarify the reason that I asked and how it relates to this thread:

I don’t think that the situations that Ralf and I are describing are ones that should be complicated for a locker/installer tool in principle. It is not as simple as saying that the requirements of an sdist (API compatibility) are the same as those of a built wheel (ABI compatibility). However the requirements are not truly dynamic and at least as far as PyPI wheels go there is no reason that they could not be expressed in some metadata in a static way like:

# pyproject.toml
dont-build-by-default = true
pypiwheels-requirements = libgmp==6.3.0

If you could express this information then it would be possible to require that the corresponding wheels be consistent with it.

1 Like

I don’t know if this is fully true. Wouldn’t it be more fair to say that it’s possible to have a compromising UX such that most things are locked but there is an opt-in allowance for source distributions with dynamic dependencies?

If you want a simple counterexample,

# setup.py
    name = "foo",
    version = "1.0",
    install_requires = random.sample(["requests", "rich", "numpy"], 2)

This is 100% valid according to standards. The dependency data will be dynamic in the sdist, and every time you build a wheel you’ll get something different. Of course it’s stupid, but unfortunately no-one has yet managed to get a standard approved that says “people aren’t allowed to do stupid things” :slightly_smiling_face:

Lockfiles do work, in practical situations. It’s only when we try to pin down standards that say that it’s OK to ignore the scenarios where they won’t work, that we find that there are people relying on the flexibility.

If we’d started with a very limited packaging ecosystem, and added functionality incrementally, things would have been a lot easier. But we didn’t. We started with a system (distutils) that allowed users to do anything, and we’ve been slowly trying to remove capabilities when they turn out to cause issues. That’s far harder.

Relevant XKCD: xkcd: Workflow