Thanks everyone for the feedback!
So it’s probably obvious but just to be clear, the actual implementation in pip is up to the pip maintainers – the code in the PEP is only to illustrate which wheels are supposed to be installable on which system, and any code that ends up doing that is fine. (I do wonder if pip might want to stop generating all the tags at some point, since pep425tags.py is getting pretty convoluted and has accumulated a number of dubious edge cases, as @brettcannon has noted. But that’s a separate issue :-).)
Anyway, it should be possible to generate all the supported manylinux tags using this algorithm:
- fetch the current glibc version (pip already has code for this)
- enumerate all the versions between some lower bound (let’s say 2.5 = manylinux1) and the current version. So e.g. if the current glibc is 2.29, we’d enumerate: 2.5, 2.6, 2.7, …, 2.28, 2.29
- fetch the current platform tag (pip already has code for this), e.g.
- use these two pieces of information to generate all the candidate tags:
- for each candidate tag, run the “manual override” logic
Comparing this to the text in the PEP, I can see two places where this would break down currently:
If we’re running on a hypothetical future system with glibc 3.x installed, then we can’t enumerate all the supported tags without somehow knowing what the maximal glibc 2.x version is. This is kind of an inherent limitation of the “generate all tags” approach. As a hack I’d suggest that if we’re on a glibc 3.x system, then generate all tags up to 2.99, and then 3.0 through 3.x. Since this is just for speculative future-proofing, it’s probably not worth worrying about too much; worst case we’ll just fix things later after the glibc devs actually start making 3.x plans.
In the PEP, we currently allow “manual overrides” to declare that systems are compatible with arbitrary manylinux wheels, e.g. a macos-on-ARM system could declare that no really it’s totally compatible with linux-glibc-on-x86-64 wheels. This is kinda silly, and causes problems for the enumeration approach. I edited the PEP to move the manual override checks down below the normal compatibility checks, so that now the manual overrides can only rule out compatibility, not rule it in. That fixes this issue.
Technically my edit introduces a tiny backwards-compatibility break from how pip works currently. Right now pip only checks the manylinux overrides if the platform is
linux_i686, so you can’t declare that a macOS system supports manylinux, or that an ARM system supports manylinux. But before you could declare that a system with an ancient glibc or musl can install recent manylinux wheels, and my updated text prevents this. But this never did anything useful anyway, so I don’t think it matters. In fact, it’s not clear that anyone uses the override system at all, and if they do I’m pretty sure it’s only to disable manylinux wheels entirely (e.g. Nixos used to do this).
The edits I mentioned are here: https://github.com/python/peps/pull/1191
This was exactly the problem we had when we were writing the first manylinux spec. Binary compatibility on Linux is a vast unknown! Nobody knows what dragons lurk there! etc. Fortunately that turned out OK.
What makes me confident now is that we’ve shipped more than 3.2 billion manylinux wheels over the last ~3 years. In that time we’ve found tons of edge cases in wheel building that needed fixes in auditwheel or the build image. We’ve found a few edge cases in system detection that needed fixes in pip (two that come to mind: handling 32-bit python running on a 64-bit kernel, and glibc redistributors who append weird text at the end of the glibc version string). We haven’t found a single issue that called into question the basic approach, and PEP 600 only codifies the basic approach, nothing else.
Also, while I get that it’s impossible to prove a negative, we can make probabilistic estimates about negatives, and I’m confused about why you would think this is a particularly risky transition, even if you aren’t familiar with all that detailed history. Fundamentally the only difference from manylinux1 → manylinux2010 → manylinux2014 is dropping support for old platforms. From the perspective of wheel builders, everything that worked in the old specs is still possible – every manylinux1 wheel is also a manylinux2010 wheel. So it’s hard to imagine how the transition could uncover fundamental problems that invalidate what came before, even in principle.
Oh man I wish that were true; I’d get like a year of my life back. The whole scientific Python stack on Windows is totally dependent on convincing GCC and MSVC to play nicely together via obscure black magic. The first Windows wheels for numpy/scipy took substantially more effort than the first Linux wheels, and that’s including “inventing manylinux wheels” as part of the Linux efforts.
And FWIW, Python 3.8 had to break the “stable ABI” on Windows in order to keep up with a Microsoft-driven deprecation, and this broke PyQt’s wheels. If there was a “manywindowsX” PEP we would have had to update it. This stuff happens sometimes. The best thing is accept that and make it as painless as possible to adapt. Which is the goal of PEP 600 :-).