The next manylinux specification

dustin · March 22, 2019, 7:59pm

TL;DR: I’m currently attempting to bridge the gap between the TensorFlow SIG Build group and the PyPA to try and determine the future for the manylinux spec. We’ve got a couple options and need to determine what makes sense for us and for interested stakeholders.

The current status

You might say:

“Why are we talking about the next manylinux, when we don’t even have manylinux2010 yet?”

The rollout for manylinux2010 has been long and painful, and in the meantime, the projects which are most interested in using it are ready to move on to newer tooling than it originally specified.

The rollout for manylinux2010 has been tracked in pypa/manylinux#179, here’s a rough timeline (partially courtesy @njs):

March 2017: CentOS 5 went EOL
April 2018: PEP 517 accepted
May 2018: support for manylinux2010 lands in warehouse
Nov 2018: support lands in auditwheel, and pip master
Jan 2019: Pip 19.0 released with manylinux2010 platform tag support
Jan 2019: Auditwheel 2.0.0 released with manylinux2010 support
March 2019: Two years after CentOS 5 EOL, we still don’t have an official build environment
Nov 2020: CentOS 6 will go EOL

There’s currently three open PRs in pypa/manylinux attempting to add the build environment, but they are moving slowly and it’s not clear who should review them, or if anyone is able to review them.

In short: Users are ready to use the next manylinux, but we haven’t even fully implemented the last one yet.

Spec violations

It’s a common claim that certain projects are violating the manylinux specification(s) by uploading wheels that say they are manylinux compliant, but aren’t. This is true, and it’s true for a number of projects. However since TensorFlow is held as an example here most often, let’s pick on them.

According to the TensorFlow team, these wheels are attempting to be manylinux2010 compliant. They build TensorFlow packages on Ubuntu 14.04 to support 14.04 users (as it is the most common distro) and to be as close as possible to being manylinux2010-compliant. However, since an official build environment doesn’t exist yet, they are currently claiming that they are manylinux1 instead in order to just get something up on PyPI. This can cause issues for users who actually expect manylinux1 wheels.

(There are other issues causing non-compliance as well, but the team has plans to fix these problems.)

There was a recent proposal to “lint” for invalid wheels, due to this tweet from the creator of PyTorch (another project that publishes non-compliant wheels)

I’m in support of doing this… eventually. Doing it “right now” would essentially prevent all of these non-conforming projects from making any release at all until we either a) finish the manylinux2010 rollout, or b) have a better specification (and corresponding tooling).

So why not both?!

Finishing manylinux2010

We’re very close here. The big thing left to do is to finish the build environment. If you think you can help, see my comment and subsequent comments on the rollout issue.

The next manylinux spec

We have essentially two options here:

Option 1: maintain the status quo
Option 2: try to do some future-proofing

Option 1: manylinux2014

The TensorFlow SIG Build team has put together a proposal for a manylinux2014 specification. Their proposal is:

Change build environment to CentOS 7, devtoolset-7
Retain backwards compatibility with manylinux2010
Officially add support for a subset of architectures that CentOS 7 supports:
- PowerPC: ppc64, ppc64le
- ARM: armhfp, aarch64

Aside from the additional architectures, this is essentially a continuation of the manylinux1 to manylinux2010 transition, and the smallest possible change to support a newer toolchain.

The TensorFlow SIG Build group has expressed interest in helping ship this, and given our experience with manylinux2010, I’d be optimistic that it would go faster than last time. However, this still might take several years from now to fully implement – CentOS 6 will go fully EOL on November 30th, 2020, so it might be EOL by the time the rollout is finished.

Option 2: A ‘perennial’ manylinux

In “Idea: perennial manylinux tag”, @njs reflected on the fact that the manylinux2010 rollout has been slow and painful, and that we might be able to avoid this pain by instead adding platform/glibc tags for manylinux distributions, allowing them to just naturally occur as new versions of glibc come into existence. The tag would be something like:

manylinux_${libc provider}_${libc version}_${arch}

which essentially means:

“I promise this wheel will work on any Linux system with glibc >={glibc version} and an {arch} processor”.

So equivalents would be:

manylinux1_x86_64 → manylinux_glibc_2_5_x86_64
manylinux2010_x86_64 → manylinux_glibc_2_12_x86_64
manylinux2014_x86_64 → manylinux_glibc_2_17_x86_64

Installers like pip only need to be able to determine the glibc version and architecture, and won’t need updated every time a new manylinux spec is published – in fact, we won’t need to have this discussion and publish a spec everytime a project like Tensorflow would like to use a newer toolchain. To quote from @njs:

Instead, they’ll be [smaller] routine engineering problems for the docker image+auditwheel maintainers to solve.

This is about the same amount of work upfront, but with much less work down the line. I think this could fully support the needs of the manylinux2014 proposal.

Side note: Architectures

Both of these specifications would allow us to add some additional architectures that CentOS 7 supports. Being able to provide wheels for these architectures on PyPI would be highly desirable for a subset of our community that is not currently served by the current manylinux standards.

Conclusion

Against my better judgement, I’ve volunteered to author the PEP that defines this next specification.

However, I’d like to avoid writing two PEPs and leaving the Steering Council to decide. So I’m looking for buy-in from folks here about whether it’s worth continuing with the status quo, or whether it’s worth re-architecting this a bit to be more flexible and require less work in the future.

I’ve made the TensorFlow SIG Build group aware of this discussion topic, so they should be able to observe this thread and comment as necessary.

References:

PEP 513 – A Platform Tag for Portable Linux Built Distributions
PEP 571 – The manylinux2010 Platform Tag
Tracking issue for manylinux2010 rollout
‘Perennial’ manylinux discussion: [distutils-sig] Idea: perennial manylinux tag
TensorFlow SIG Build manylinx2014 proposal: manylinux2014 PEP Planning
PyPI feature: Run auditwheel on new manylinux uploads, reject if it fails
CentOS Product Specifications

brettcannon · March 22, 2019, 9:36pm

What about the .so files manylinux supports already existing? How is that going to be managed going forward since that list won’t be versioned anymore?

You implicitly make this clearer further down, but I think this really is:

manylinux_%{libc provider}_${libc version}_${arch}

I.e. you’re specifying more than just the glibc version but glibc itself (which opens the door for musl (which is great ).

I appreciate that.

dustin · March 22, 2019, 10:40pm

@takluyver Asked a similar question in the “perennial manylinux” thread:

I’m still a bit unsure how this works with the other libraries specified in PEP 571
(glib, libXrender, etc.). Would they be entirely dropped from a hypothetical
manylinux_2_20, so wheels need to bundle everything apart from glibc itself? Or is it
reasonable to assume that any system built with glibc has certain other libraries
available? And is there any need to specify versions of these libraries, or is e.g.
libX11.so.6 sticking around forever?

It might be worth mentioning that the only difference between this list for manylinux1 to manylinux2010 was the removal of libpanelw.so.5 and the addition of libncursesw.so.5, so perhaps it is reasonable to make this assumption?

brettcannon:

You implicitly make this clearer further down, but I think this really is:
manylinux_%{libc provider}_${libc version}_${arch}
I.e. you’re specifying more than just the glibc version but glibc itself (which opens the door for musl (which is great ).

Yep! Thank you for catching this. I want to focus just on glibc for now, but the nice thing about the perennial manylinux proposal is that it does give us an easier path towards supporting musl.

gpshead · March 22, 2019, 11:47pm

Could we please include something explicitly guaranteed to be compatible with a modern Raspbian?

I don’t trust CentOS 7 armhfp to be compatible until someone proves otherwise. If it is, great! But we should call that out in documentation as RaspberryPi seems a lot more important to the Python world than 32-bit arm running CentOS.

Raspbian might be a bit of an odd duck due to its rpi v1 arm6hf lowest common denominator.

brettcannon · March 23, 2019, 12:01am

It’s reasonable for now, but if we’re trying to future-proof then this won’t hold the next time someone decides that a library should be added/removed from the blessed list of assumptions. I mean the only way this could potentially work would be to make it date-based and say that all wheels built in this timeframe assumed these libraries, but I don’t know if people really want to go down that road.

steve.dower · March 23, 2019, 12:36am

Overall I don’t personally have a lot of thoughts on this, but I’d like to at least warn that having flexible support for libc just pushes the same problem to the next dependency, whatever that happens to be for a particular wheel.

I too dislike when someone says “this doesn’t solve all the hypothetical problems so don’t use it for the one real one”, but in this case I think it would be creating work to support implied tag “ranges” that really ought to be a much deeper system dependency evaluation mechanism (that can also do, eg, GPU support). Making the tools do it once for a narrow case will cause more constraints on a future design than if we stick with tagged platforms (e.g. manylinux2014) for now.

njs · March 23, 2019, 5:10am

Thanks @dustin for pushing this forward!

I think the “perennial manylinux” idea is definitely the right way forward, and it seemed to have broad support on disutils-sig. It’s just as good as manylinux2014 in the short term, and vastly better in the long term.

There’s a bunch more discussion of this in that thread, but I’ll summarize. There are a lot of pieces that work together to make the original manylinux successful.

First, the PEP uses English prose to describe (a) what manylinux1 tags looked like, (b) what the requirements were to create a manylinux1 wheel, (c) how installers could decide whether to install a given manylinux1 wheel. Then, we implement the prose using code: PyPI has an implementation of (a), to allow manylinux wheels to be uploaded. The pypa-maintained docker images and auditwheel have an implementation of (b). And pip (and any other wheel installers that might come along) have an implementation of (c).

In practice, the PyPI and pip parts of the equation are really simple. The heuristic pip uses to decide if a manylinux wheel is compatible with given system is just “do we have a new enough glibc”. OTOH, the docker images/auditwheel part is extremely complex and full of hard-won knowledge. That’s what lets pip get away with using such a simple heuristic.

The problem with this is that it means that the complicated part – how to generate compatible wheels – has two different specifications: the one in the PEP, and the one in code. And the one in code is the one that actually gets maintained, because it’s the one that people use. And besides, no-one cares about whether a wheel matches the exact details in the PEP, they care about whether it works on real systems. You can tell that this is the real rule, because whenever someone reports that auditwheel let through a wheel that didn’t work, then we fix auditwheel, we don’t say “well, the PEP is normative so the wheel is fine”.

So the perennial manylinux idea isn’t a huge change in practice. It’s just simplifying the PEP to say what we actually mean (= “the wheel has to work on real systems”), and removing the duplication between the PEP prose and auditwheel’s code, since auditwheel’s code is more reliable anyway. And then as a bonus, we make the process of rolling out new manylinux versions wayyyy simpler.

brettcannon:

You implicitly make this clearer further down, but I think this really is:
manylinux_%{libc provider}_${libc version}_${arch}
I.e. you’re specifying more than just the glibc version but glibc itself (which opens the door for musl (which is great ).

Eh… realistically it really is just glibc for now. We could allow arbitrary values for ${libc_provider}, but it’d be purely aspirational, so I’d rather not.

The obvious issue: if pip sees a wheel labeled manylinux_musl or manylinux_bionic or manylinux_mynewlibcijustinvented, then what does it do with that? Unfortunately, there is no generic mechanism that pip can use to query the currently running executable and find out which libc vendor and version it’s running against. The glibc detection code isn’t super complicated, but it is totally glibc-specific: pip/src/pip/_internal/utils/glibc.py at main · pypa/pip · GitHub. So even if we let people upload arbitrary manylinux_${libc_vendor}_${libc_version} wheels to PyPI, pip won’t be able to install them, which makes it kinda useless.

The more nebulous issue is: we know that in the current ecosystem of glibc-based linux distros, there’s a core set of libraries whose ABIs have a certain degree of compatibility across time and across all popular vendors, and that evolve roughly together, and that this makes the glibc version a useful heuristic for telling whether a binary will work on one of these systems. But none of these things had to be true, as a matter of mathematics or even policy; it’s an empirical observation, based on the experience of other folks distribution software on linux and our own experience of shipping hundreds of millions of manylinux wheels. It’s really not clear whether any of this generalizes to other libcs.

As a matter of policy, musl tries to make it hard to tell which version of musl your system is using, or even that it’s using musl. I guess essentially 100% of the people who want musl support really want “wheels that work on alpine-based docker images” support. Looking at the current official docker alpine:latest tag, the only libraries provided are libc, libz, and openssl. It seems plausible that if your real goal is to target alpine, then a wheel tag that says alpine in it and uses the openssl version as the “clock” might be more useful than anything involving the musl version…

Also, just as a practical matter, we don’t have any experience or auditwheel support for building portable binaries targeting non-glibc platforms.

I don’t know enough about ARM ABIs to figure out what would be required to do this. No-one else who knows enough about ARM ABIs to make this happen has stepped forward to do the work. I agree that it would be a great feature to have, but there’s no reason to hold up the rest of the manylinux work while looking for a volunteer to figure out this part.

Small note: the Steering Council doesn’t have much to do with this. Packaging has operated pretty much independently from the normal BDFL/python-dev governance for a long time, technically via a standing BDFL delegation (currently to @pf_moore), but in practice it’s basically its own thing. I think everyone is assuming the SC will continue this, and there have been discussions of spitting the PyPA off formally to be a “sibling” of CPython under the PSF, just to make the de jure setup better match the existing de facto status. I tried to summarize the history and status here:

pf_moore · March 23, 2019, 11:32am

I’m working on that assumption also. In this case, however, I have essentially no knowledge or understanding of the complexities around Linux binary compatibility¹ so I’d be deferring to the subject experts for the technical details, and restricting myself to “managing” the discussion. If anyone feels that’s not sufficient, I’m happy to pass the decision making to someone else (or to the SC, if they are interested - but I suspect they aren’t )

¹ Can we replace the manylinux spec with a note saying “please switch from Linux to Windows”?

takluyver · March 23, 2019, 4:02pm

I’m sure I’m betraying my ignorance of Linux systems, but I still don’t understand how these core libraries work in the ‘perennial manylinux’ idea. Could a hypothetical manylinux_glibc_2_28_x86_64 wheel contain libraries linked against e.g. libX11? And if so, what version?

Would the permitted libraries be frozen forever at those specified for manylinux2010? Or is there some way to derive the versions of these libraries from that of glibc? Or would manylinux wheels only be allowed to link against glibc itself, and absolutely nothing else?

ncoghlan · March 23, 2019, 7:53pm

I think we’re seeing a lot of folks in this thread get confused by the fact that the proposal calls what is actually the “heuristic name” field in the wheel filename specifically “libc provider”.

Even though writing it up that way is entirely understandable given our past discussions, doing so is causing problems in two directions:

It’s too specific, since it rules out other heuristics (e.g. a “distro” heuristic that checks /etc/os-release)
It’s too vague, in that it doesn’t explain that there’s still a whole host of expectations behind that heuristic, and hence you can’t go sticking any old libc implementation name in there without a document to back it up (and even if you link to glibc, you’re still bound by all the other pragmatic requirements of maintaining manylinux compatibility)

If folks already heavily involved in the development of packaging tools are getting confused on that front, then we can be confident it’s a poor way of framing the proposal, even though it’s the one that occurred to us first.

Accordingly, while I still favour the “Figure out a way to let installers auto-adapt to version updates on the publishing side” idea over just doing manylinux2014, I now think it will be clearer if it’s explicitly framed as a heuristic name, and imposes the following requirements on new heuristics:

the heuristic must specify the remaining fields in the wheel filename (e.g. version and CPU architecture for a glibc heuristic; distro name, version, and CPU architecture for a distro heuristic)
the heuristic must specify a way for an installer to check whether or not the running system meets the constraints documented in a candidate wheel filename
the heuristic must specify a responsible party that defines the compatible build environment for the target platform
adding an entirely new heuristic still requires a PEP to update the manylinux specification, since installers need to know how to implement the install time check for the heuristic, and publishers need to know who gets to define what the heuristic version numbers mean
the Python Package Index maintainers may choose to disallow heuristics that they consider insufficiently general, and instead leave those wheels to other organisations to host using a caching proxy like devpi

So under that framing, you end up with a top level format that looks like:

manylinux_${heuristic name}_${heuristic fields}_${arch}

The glibc heuristic would then be defined as:

Heuristic name: glibc
Heuristic fields: ${glibc_major}_${glibc_minor}
Install time heuristic check: as per the original manylinux1 spec
Publication time heuristic check: delegated to the auditwheel and manylinux projects

While I wouldn’t include them in the initial PEP switching to this format, two other potential heuristics that come to mind would be for Distros and VFX Platform:

Heuristic name: distro
Heuristic fields: ${os_release_id}_${os_release_version_id} (with some extra rules to cope with ., _, and -, such as normalising them all to _)
Install time heuristic check: exact checks based on https://www.freedesktop.org/software/systemd/man/os-release.html, rely on https://github.com/pypa/pip/issues/5453 (or something similar) to use a platform tag other than the one in os-release
Publication time heuristic check: build on the named distro, specify the OS level dependencies that need to be installed in a distro-specific format
Heuristic name: vfx
Heuristic fields: ${vfx_year_id} (CY2019, etc)
Install time heuristic check: none, rely on an explicit platform tag as per https://github.com/pypa/pip/issues/5453
Publication time heuristic check: as defined by https://www.vfxplatform.com/

njs · March 24, 2019, 1:22am

There are a few different ways to think about this.

You might be asking: suppose I made a wheel with the tag manylinux_glibc_2_28_x86_64, that links against some version of libX11. Is that a PEP-compliant use of the tag?

In the perennial manylinux approach, the answer would be “well, if it works for your users, then it’s compliant, and if not, then it’s not.” This is admittedly less satisfying than having a detailed answer directly in the PEP. But, it is still an objective criterion, e.g. it would still be clear that if you labeled the current tensorflow wheels as manylinux_glibc_2_5_x86_64, then that wouldn’t be PEP-compliant, because we know that there are plenty of popular distros with glibc 2.5 where those wheels don’t work. And, this is exactly the same way the current Windows and macOS tags work. There’s no PEP listing all the symbols and DLLs you’re allowed to use in a win_amd64 wheel. Heck, we don’t even bother specifying which Windows versions are supported by a given Windows wheel, so perennial manylinux would actually be substantially better-specified than win_amd64.

And… the question “is this PEP-compliant?” is almost never an interesting question anyway. It has almost no effect on the real world. Maybe instead you’re asking: suppose I’m trying to ship a package that uses libX11. How do I do that and get it to work?

The main reason manylinux1 succeeded is that we had an answer to this question, that worked for ordinary package maintainers: you use the docker image, you use auditwheel, it knows all the arcane details so you don’t need to. This part would all work the same – there’d be a small menu of available docker images for more-or-less recent linux distros, each one would have some associated auditwheel profile, and when auditwheel sees that your package uses libX11, the profile tells it whether it needs to vendor that or not. If it turns out to make a bad decision, someone files a bug and we update the profile.

There’s also the question: suppose people are somehow abusing the system, and publishing wheels that don’t work, and they don’t care, and the blowback is hitting us as packaging infrastructure maintainers. What can we do?

If we decide that this is a problem we need to deal with, then IMO the best solution is a technical one: we should adjust PyPI to only accept manylinux wheels where auditwheel has a profile defined, and the wheel passes that profile.

The alternative is to try to shame the offenders by pointing at the PEP and saying they’re non-compliant, but as we’ve seen, that doesn’t accomplish much. The people who are trying to follow the rules need tools to do that, not shaming. And the people who have some overriding concern that makes it impossible to follow the rules might feel ashamed but will keep going anyway.

pitrou · March 24, 2019, 10:36am

Is that really the objective criterion for whether a wheel is “perennial manylinux”-compliant?

If it is, I find that it’s an extremely bad idea. After all, Tensorflow and PyTorch have been pushing rogue wheels because “it works for their users”. But it also wastes the life of users of other libraries such as PyArrow (and, consequently, the life of developers of those other libraries). Should those wheels be considered “compliant”?

ncoghlan · March 24, 2019, 10:53am

No, it isn’t, and I’d recommend to Paul that he reject any PEP that was that vague.

I think only spelling out the install time heuristic at the PEP level and delegating the publication time check to the auditwheel project would be fine, though (i.e. “If auditwheel says it’s fine for the tag you’re claiming to meet, then you’re fine, if auditwheel objects, then figure out how to make it happy”).

pf_moore · March 24, 2019, 12:14pm

Yep, I agree - the PEP would have to be clearer than that.

My absolute minimum criterion for a PEP would be that it makes it possible for any interested 3rd party to confirm that a given wheel is compliant. That means either the documentation (= the PEP) clearly states the criteria that need to be met, in a manually checkable way,or that the PEP explicitly defers to a tool like auditwheel.

Ideally the PEP should also impose some constraints on any checking tool it relies on - it would be unacceptable, for example, for the tool to one day say that a given wheel passes, and the next day that it doesn’t. So tags that imply different constraints over time would be required to only ever get more lenient.

The point about Windows wheels is a fair one, though - I’d say that an cp37-win_amd64 wheel should work on any 64-bit Windows system that CPython 3.7 supports, but that’s not explicitly stated anywhere, and there’s no checking tool. But I’d take that as the starting point - and similarly, cp37-<whatever manylinux tag> should run on any Linux system that CPython 3.7 supports, with the manylinux tag imposing further defined restrictions according to the relevant manylinux PEP.

If anyone’s interested in writing a clarification to PEP 425 to capture that interpretation, I’d support it (but I think it’s explanatory, clarifying the implied and commonly assumed behaviour, rather than changing the spec at all).

njs · March 25, 2019, 2:08am

Literally the next few words after you cut off your quote were my answer to this question…

In my post I did paraphrase the perennial manylinux proposal in a somewhat sloppy way though. What I’d actually suggest we put in the PEP is: if someone tags a wheel manylinux_glibc_2_${N}_${platform}, then that means this wheel should work on any mainstream distro that ships with glibc 2.{N} on {platform}.

There’s still a bit of subjective wiggle-room here in the word “mainstream”, to avoid being rules-lawyered by some weirdo who builds their own personal distro with like, glibc from 2019 and libstdc++ from 2010, just to mess with us. But I think this won’t matter much – e.g. the controversial tensorflow wheels don’t work on any distro with glibc 2.5, so they’re obviously non-compliant. And as long as everyone can agree on the big picture, it’ll be OK if we let individual projects make judgement calls about how they want to handle these kinds of obscure edge cases.

That’s why I think the point about Windows tags is important: sure, we could specify it in more detail, but the point is that we’ve never felt the need to. There’s some grey area, but it’s never caused any problems.

We could explicitly incorporate auditwheel’s semantics into the PEP by reference, but usually we try to avoid naming specific tools like this in PEPs. If someone invents auditwheel2, that uses a slightly different mechanism to produce wheels that work just as well, then that seems fine to me. And the exact technical measures that PyPI uses to validate uploads are something we usually leave up to the PyPI maintainers, rather than specifying in a PEP.

Note that according to this definition, it’s not allowed to upload a cp27-win_amd64 wheel to PyPI unless you’ve made sure that your wheel works on not just Windows XP, but also Windows 2000. (Rationale: PEP 11 says that if a Windows version is still in “extended support” on the date when CPython X.Y.0 is released, then all CPython X.Y releases will officially support that version of Windows, and then your rule extends that to all cpXY-win wheels. CPython 2.7 was released on July 3, 2010, and extended support for Win2k ended on July 13, 2010, a whole ten days later, so CPython 2.7 officially supports Win2k.)

Of course in reality no-one tests on Win2k, and no-one uses it either, so probably a large proportion of Windows wheels on PyPI are violating your rule, but it doesn’t matter. The point of specs is to solve problems, and there’s no problem here. Going around and filing bugs on those projects saying that their wheels are technically non-compliant would be a waste of everyone’s time.

(Your rule is also incomplete, because it doesn’t tell us how to interpret py2.py3-none-win_amd64. But again, it’s not a problem in practice.)

takluyver · March 25, 2019, 10:13am

I see your point that the important thing is that the wheels work for the intended users. But I’m still uncomfortable with the idea of a specification that doesn’t actually specify what a compliant wheel can use.

I think the PEP should define this somehow, even if that’s by pointing to auditwheel. If a hypothetical auditwheel2 is later written and gains popularity, we could update the PEP to refer to that instead. It’s not a perfect solution, but I prefer it to leaving the external libraries undefined.

If we’d ever consider integrating auditwheel in PyPI, that would cement its status as the standard definition of the rules, so to my mind it would then be totally OK to refer to it in the PEP.

pitrou · March 25, 2019, 11:23am

Thanks for the clarification.

steve.dower · March 25, 2019, 2:00pm

Maybe there’s a space here for platforms to be represented by validation libraries instead (as well)? I could see something similar being useful for “choose this if Cuda will work” etc. To generate the tags for those platforms, you can either list them manually or install a hook (into where? Pep425tags? TBD) that will check your system to see if it supports it.

Then the definition of the tag is indeed in the libraries, and it can stay that way. Of course, now installation may get more complicated if you need to be figuring out whether your platform supports a package or not, but, err, that’s kinda where we are already? It just moves it to install time rather than runtime.

(Additionally, we can endorse certain libraries by allowing their tags to go onto PyPI, and use that as a way to prevent them changing their definitions.)

pf_moore · March 25, 2019, 2:34pm

I think we need to remember how tags work in practice. A lot of the confusion here seems to come from a misinterpretation of tags as providing some sort of absolute definition of binary compatibility - that’s an essentially insoluble problem, and not what we should be tackling here.

Taking a step back, the process of selecting a wheel for installation goes as follows (in the abstract):

The index provides a set of candidate wheels.
The client discards any that have tags that are not in the client’s list of supported tag combinations.

That’s basically it. The key aspect of the tag spec is defining how a client should determine whether to include a tag in its supported list. So, for example, clients support py2 or py3 depending on whether they are Python 2 or 3. Nothing more. Any spec that doesn’t clearly define how a client chooses to add a tag to its supported list is in practice unusable.

On the publisher side, we need to provide a means for the publisher to know whether it’s acceptable to apply a given set of tags to their wheel. Again, the rules for that need to be clear, otherwise there’s no (standard compliant) way to publish wheels using those tags. (It is of course always possible for publishers to create wheels that are incorrectly tagged - but that is a violation of the specs, and should
be treated as such). At the most fundamental level, the rule for publishers is “you can only claim compatibility if you will work on all clients that can claim support for a given tag”.

The practical problem here is twofold:

Tags have to be reasonably broad, or there’s no realistic possibility of publishers being able to provide wheels that cover a sensible population of users (there’s no value in a tag that says the code will work on exactly one machine).
The difficulty of the binary compatibility problem means that broad tags will impose impractically broad constraints on publishers.

The factor that resolves this dilemma is the index. The space of “all possible clients we have to consider” is reduced, from “absolutely everything conceivable” to “all of the users of the index that this wheel is published on”.

That’s why a wheel on PyPI tagged as py2.py3-none-win32 is OK, even though it probably won’t work on Python 2.1 running under Windows 95. It’s because the number of such users accessing PyPI is effectively zero, and so we can ignore that possibility. I think that is what @njs had in mind when he said “works for our users” - but it’s not a property of the wheel tags in isolation, rather it’s dependent on where you publish the wheel.

This suggests a way that tensorflow could have solved their issues without (quite as badly) violating the existing standards - publish their wheels on a custom index, and document the (additional) requirements that clients should satisfy in order to use that index.

It also clarifies the role of auditwheel - it’s responsible for taking the overly-broad requirement “this wheel must work on any system that can legitimately accept the given set of tags” and refining it down to something publishers can actually satisfy. It does this in effect by making a judgement about what the reasonable client population might be - making assumptions about “typical users of PyPI” (in the same way that the Windows platform informally assumes that Python 2.1 on Windows 95 isn’t an important case).

I’m not sure how much the above helps. For me, it clarifies two main points:

Standardising the definition of tags is fundamentally about what the client will accept.
Conversely, what publishers can claim is much more dependent on the question of what index they will publish on. That’s all about judging how much variation in clients exists in practical terms for a given index (for private indexes, very little, but for PyPI a significant amount). Experience with the original “linux” tag showed us that the formal standards have to look at “all PyPI users” if they are to be usable, but that doesn’t prohibit alternate indexes having different rules. Auditwheel is focused on that question, validating that wheels match the spec sufficiently to be published on PyPI, but maybe there’s a reasonable case to leave the spec open to allowing sets of checks that are narrower, but can only be used on a custom index.

(Please take this as speculative might-be-useful thoughts only. As I say, I’m not an expert by any means in Linux compatibility issues).

pf_moore · March 25, 2019, 2:54pm

Given that there’s no inclusion relationship for tags, would a client have to claim support for

('manylinux_glibc_2_5_x86_64', 'manylinux_glibc_2_12_x86_64', 'manylinux_glibc_2_17_x86_64')

if they wanted to say “manylinux2014 or earlier”? Or are different glibc versions completely incompatible, and publishers would need to build and publish separate wheels for each case?

If you look at the client (pip) code for tags, you’ll see that we currently enumerate all of the Python tags that a client can support (see the code here). I don’t think we want to start doing that for libc versions as well. But the alternative would be for wheels to have ludicrous names like

foo-1.0.0-cp37-cp37m-manylinux_glibc_2_5_x86_64.manylinux_glibc_2_12_x86_64.manylinux_glibc_2_17_x86_64.whl

The principle seems straightforward here, but the practicalities of managing tags like this are a lot less simple