The next manylinux specification

LGTM, thanks

I don’t have a horse in this race, but when @pf_moore said that level of detail wasn’t necessary, I thought he was referring to the yank / removal distinction. But the updated PR is now silent on whether non-compliant wheels can / should be removed (using any method like yanking or flat-out removal).

Correct, but the PEP isn’t about PyPI enforcement, so as long as it doesn’t preclude yanking/removal, we can work that out later.

2 Likes

It isn’t suddenly mandatory - it was a major topic of discussion when naming manylinux2010 (with that name winning over manylinux2 by virtue of conveying more information that’s useful to publishers), and it was one of the first concerns raised by folks hearing about the perennial manylinux proposal for the first time.

My understanding is that folks generally answer this question by inheriting their build flags from the corresponding CPython binary release, thus outsourcing the question to the CPython binary build maintainers (currently Ned Deily for Mac OS X, and Steve Dower for Windows). Apple and Microsoft take care of naming the available target ABIs, and defining what they each mean.

That answer doesn’t work for Linux, since there aren’t any distro-independent CPython binaries published via the PSF to establish a standard baseline, and each different distro defines their own independent ABI, so the PyPA ends up having to take on all three tasks of defining the target ABI, naming the target ABI, and communicating the relationship between the two.

With the CalVer naming, folks that are willing to trust our naming don’t need to look any further than “What year was the oldest major distro version I want to support first published?”, on the assumption that if there was a major distro release in 2014 that was incompatible, we wouldn’t have called the baseline manylinux2014 in the first place - we’d have called it manylinux2015 or manylinux2016 as appropriate.

The release year is trivial to figure out for Ubuntu (since the year is right there in the version number), and well documented for both Debian (https://wiki.debian.org/DebianReleases#Production_Releases) and for RHEL/CentOS (https://access.redhat.com/support/policy/updates/errata/#Life_Cycle_Dates).

It’s only if folks are thinking “Are, but what if they made a mistake in naming this version?” that they’ll need to go check our work by comparing the major library versions used by the distros at the time (in the case of manylinux2014, that would be RHEL/CentOS 7, Ubuntu 14.04, and Debian 8. For a hypothetical manylinux2019 spec, the major distros of interest would be RHEL/CentOS 8, Debian 10, and Ubuntu 20.04, skipping over Debian 9 and Ubuntu 18.04 due to the slower RHEL/CentOS update cycle, just as we’ve skipped versions in going from manylinux1 to manylinux2010 and now to manylinux2014).

I do hope that accepting manylinux2014 won’t kill the motivation towards finding a better way to handle defining manylinxu2019 - I just don’t want to put the design work for that evergreen solution on the critical path for handling the immediate requirement to allow publishers to target manylinux2014.

3 Likes

On that front, I split out a separate thread to discuss the idea of having packaging.tags and pip look for a dedicated PyPI package that just calculated the appropriate manylinux tags for a target system: Publishing manylinux install tag heuristics as a dedicated PyPI package?

Approving manylinux2014

OK, it’s the end of July, and there has been no real further discussion here in the last few days, so I think it’s time to bring the debate to a conclusion.

I’m going to approve manylinux2014 as the next version of manylinux. Congratulations @dustin and thanks to everyone who participated in the debate.

Next tasks

There are some caveats, however. In spite of a number of questions being asked about “how do we know this will be delivered more quickly than manylinux2010?” I’ve seen no real response. Having people ready to work on the proposal is not enough - if it’s anything like other situations I’ve seen, it’s easy to get people to work on the technical stuff, and nearly impossible to get them to work on documentation, planning, looking at the bigger picture etc. So, to the extent that I can demand anything, I want to see the manylinux2014 “team” publish a review of what the sticking points were with manylinux2010 deployment, and how they intend to address them. I’m looking very specifically here at @dustin and the people he’s got waiting to work on manylinux2014 - it’s not fair or reasonable to expect the people who worked on 2010 to do this themselves. If no-one, from the people willing to commit to working on manylinux2014, is able to do this, then I fear that manylinux2014 will have the same problems as 2010.

I’m also assuming that there will still be technical work going on with the spec. I’ve seen the vsyscall discussions, and I know the investigation into the TensorFlow crashes is still ongoing. The discussion here around perennial manylinux has made it clear that this is a normal part of maintaining the specs. But I would insist that the spec is kept up to date and any non-trivial changes are at a minimum publicised here. After all, the main point of the perennial debate was that “maintaining the PEPs is an unnecessary overhead” - if the manylinux2014 supporters dismiss that objection and then fail to do that spec maintenance, that’s cheating :frowning:

The future

I don’t want to see another manylinux20xx proposal after this one. In my view, the perennial manylinux proposal raised some important sustainability questions which we must answer. To that end, I’d expect the next manylinux specification after this one to be some form of perennial approach. Whether it’s the existing perennial proposal, updated to address the concerns and issues raised during this discussion, or an independently developed proposal, I don’t mind, but we need something that gives us an ongoing solution.

In particular, I’d like discussion to start relatively soon. I know that people are burned out by now, and the work to actually implement manylinux2014 will be a distraction, so we all need a break, but it’s not like distribution EOL dates come as a surprise. I wasn’t particularly comfortable with us being under pressure to find a solution this time because “we need something quickly”, and I won’t accept that argument in future.

Acknowledgements

Thanks to everyone who participated in the discussion. It’s often hard to get people involved in this type of debate, so thanks everyone for the work you put in.

In particular, thanks to @njs for arguing for the perennial manylinux proposal. In spite of the fact that I ultimately approved manylinux2014, your comments were important and resulted in a much more useful discussion. I don’t see my decision as in any way rejecting the “perennial” idea, but more as a way to give it the time it needs to be fully developed, without unwanted pressure that “we need something right now”.

7 Likes

Yeah, it’s a really difficult topic: it has an incredibly obscure and intricate set of domain-specific technical details, there’s a ton of potential for scope creep, and most of our general packaging experts aren’t experts in this specific domain – yet are stuck trying to steer the proposal despite that.

When I first came up with the idea of manylinux, and when were writing the original PEP, there was intense skepticism and push-back and we had to fight hard to get it through with a reasonable scope. It’s sort of ironic that now it’s the original PEP that everyone takes for granted and the idea of doing something different that makes people nervous, but I guess that’s how these things go.

I’ve actually never maintained auditwheel. My main role in manylinux has been as a kind of technical lead, dealing with the overall vision, PEP process, project management, and acting as a problem-solver-of-last-resort. The perennial proposal is scoped the way it is because AFAICT it’s the best available architecture for the ecosystem as a whole, not because I’m trying to selfishly save myself some work.

Not sure what you mean here. Creating and maintaining the build environments is certainly a pain point, in the sense that it would be nicer if they just magically existed without anyone having to do any work. But unfortunately a PEP cannot wish a build environment into existence :-). And the actual build environment maintenance and availability is identical across every possible proposal I can think of.

Yes, it’d be nice to have better support for more manylinux versions in more tools. But again, that’s beyond the powers of a PEP; either someone will do the work or they won’t.

There are lots of ways to solve this – we can squint at download statistics, we can survey maintainers, we can collect together relevant information and put it in our documentation – but I don’t see how any of them involve the PEP process.

This is pretty similar for all the proposals we’ve seen: manylinux20XX makes it a bit easier to get a rough guess about whether it will work, and perennial manylinux makes it a bit easier to get a definitive answer. Either way, our main focus is on providing wheels that just work on as broad a range of platforms as possible.

I don’t see how it’s possible to do better, because there simply isn’t any widely-understood versioning scheme shared across different Linux distros.

1 Like

In that spirit, I went ahead and posted the rewrite of the perennial manylinux PEP that I’ve been working on for the last few weeks:

PR: https://github.com/python/peps/pull/1137
Rendered: https://github.com/python/peps/blob/36b7d644f56cfa764b6d0a61fd03084bffe47fda/pep-0600.rst

There’s a lot of changes to the text, but the core proposal hasn’t changed at all (indeed, it hasn’t changed for a year+ now). There are some minor technical tweaks, like incorporating the advice from the glibc maintainers about how to interpret glibc version numbers, and more details on exactly how backcompat with manylinux1/manylinux20XX should work. But my main goals were to flesh out the rationale text, and explicitly incorporate all the relevant points from the discussion here.

As far as I know, it is fully developed. I believe that draft has a stable core, fully worked out details, and addresses all the concerns that have been raised.

If anyone reads that new draft and still thinks there’s something that needs “further development”, I would greatly appreciate specific comments on what you think is missing.

[Edit: well, OK, one obvious thing that’s missing is that I didn’t update the legacy manylinux section to account for the acceptance of manylinux2014 :-). That should be pretty mechanical and I’ll let Paul say whether he thinks that’s an important thing to do right now.]

2 Likes

Once the PEP is formally marked as accepted @sumanah and I will work together to do this. I meet with the Tensorflow SIG-Build team again on Aug 6th, @pf_moore do you think we can do this before then? Any other blockers here?

Oh this front, the Tensorflow team has announced that they currently have a manylinux2010-compatible build and will be publishing it within the next two weeks. This build also does not experience the previously reported crashes when loading together with other C++ based packages.

1 Like

I’m not sure what you want here - if you’re just after me updating the PEP status, I’ve now done that (sorry! should have done that sooner). Let me know if there’s anything else blocking you.

1 Like

It’s been ~2 weeks since we posted the new version of PEP 600 that tries to address all the issues raised here. Does anyone have any feedback? Latest draft is visible here:

1 Like

@dustin has been ill this week. He, @zwol, and I spoke earlier this week, and based on that I am speaking in his/our stead on this, and have made a draft roadmap for manylinux2014 rollout.

Reasons the manylinux2010 rollout snagged and went too slowly:

  • it wasn’t clear who should be doing what (often partly because of scarce maintainer time)
  • there were a few competing PRs implementing the build environment and it wasn’t clear who should review things (ditto)
  • the order of operations wasn’t clear until Nick wrote up the tracking issue
  • we haven’t ensured ALL helper utilities support manylinux2010, like cibuildwheel, multibuild, and dockcross, and that’s slowed down adoption
  • we haven’t concisely explained stuff like “now do x, it’s safe and it’s now the recommended approach, this is orthogonal to distutils/setuptools/flit/poetry/ and similar kinds of packaging tooling choices, here are the consequences of changing your build environment, these are the compiler gotchas, these are the ways we’re deprecating support for things your users may still want” and spread that message in the right places

Thus, in order to make the manylinux2014 rollout go smoother, we will:

  • identify interested parties early and get tentative commitments for work (seeking volunteers as well as institutional support for development, technical advisory, code review, test case inventory, testing, infrastructure-building, personnel gathering and management, maintainer outreach, documentation, bug tracker caretaking, and outward-facing online community liaison work)
  • have a roadmap upfront
  • in that roadmap and in the rollout, ensure that adding documentation to the Python Packaging Guide is a first-class priority
  • point people to manylinux2010 issues that will help them build manylinux2014 equivalents

@pf_moore I hope this begins to address your concerns – thank you for making this request.

2 Likes

BTW, in my “hey, let’s get companies and grants to fund stuff” description of work needed to improve wheel-building, I include:

This is probably not the right thread for corrections/additions to that list of requests or that summary, but I am open to corrections and additions.

Thanks, yes this looks like a good plan.

It’s now been ~1 month since the new version of PEP 600 was posted, and there still haven’t been any issues raised. Maybe this means it addressed everyone’s concerns? Either way, it’s impossible to address concerns that haven’t been raised, so I guess that if it stays quiet for a few more days I’ll ask Paul to formally accept it.

I’d like explicit confirmation from @ncoghlan and @dstufft at least that their objections have been addressed, as they had the most significant concerns. I’ve still to find some time to review the new version myself, but as I’ll have to do so before accepting it, I’m reasonably happy that my views won’t get accidentally missed :slightly_smiling_face: (I think my reservations were echoed by others, though, so if we get the acceptances I’ve asked for, I doubt I’ll disagree).

1 Like

@pf_moore @njs Regardless of whether there are any remaining concerns, I believe it would be more appropriate to defer PEP 600 while we see how the manylinux2014 rollout goes. We could very well discover new problems in the process.

Also, here is a concrete concern with PEP 600: I still suspect that tying manylinux version numbers to glibc version numbers, and only to glibc version numbers, may be unworkable in practice. In particular, I think it would be a mistake to accept PEP 600 until the actual root cause of the infamous "import pyarrow, tensorflow = interpreter crashes" problem is found and fully understood, because a permanent solution to this problem may very well require the manylinux versioning scheme to encode the g++ version number as well.

The tensorflow maintainers seem to have lost interest in investigating this bug after finding that they could reproduce it with their manylinux1 packages but not their manylinux2010 packages. The root cause is still not known. From the limited investigation I did myself, I am inclined to suspect auditwheel of doing something wrong in the process of copying shared libraries into wheels, but that’s as far as I got.

I have a much longer post addressing the points you made here, but before that, I have a specific technical question regarding manylinux.

The “tensorflow crash” issue is about two binary wheels, both declared manylinux1 compatible, causing a crash when used together in the same Python process. That’s obviously an issue for the end user, who has a reasonable expectation that when we say “compatible” we imply “with each other” (even if technically all the tag guarantees is “with the platform”).

But what about a manylinux wheel being used alongside a locally compiled wheel? If some user did pip install --no-binary tensorflow tensorflow arrow, to install a manylinux binary of arrow and a locally-compiled tensorflow, do we guarantee that the two packages are compatible in that case?

I’m asking because I’m struggling with the question of precisely what we think manylinux compatible wheels are compatible with - my experience on Windows, where compiled extensions are essentially required to all be built with exactly the same compiler, with exactly the same compile options, is of no use here :slightly_frowning_face:

The manylinux2014 PEP is literally a copy/paste of the manylinux2010 PEP with minor updates, which in turn was literally a copy/paste of the manylinux1 PEP with minor updates, and PEP 600 is simply replacing the copy/paste with a generic template. Of course you’re right at some level – the reality of software development is that you can always discover new problems at any point – and I know that you weren’t here for the last two rollouts, so that probably makes the unknowns seem larger. But we can’t stop all progress out of the vague fear that some unknown problem might be lurking; we have to balance the risks and the unknowns. In this case, PEP 600 is very conservative and low risk to start with, and I think it’s very unlikely that going through a third roll-out is going to reveal some huge new issue that we missed in the first two roll-outs.

So there’s a few issues here:

  • This is completely orthogonal to PEP 600. If you’re right and it turns out that tying manylinux versions to glibc version numbers really is unworkable, then we would need to tear up all the existing manylinux PEPs and start over. That’s true regardless of whether PEP 600 is accepted, and PEP 600’s acceptance or rejection doesn’t materially affect the remediation we’d have to do.

  • But, that’s not going to happen :-). Even if the g++ maintainers decided that they didn’t care about ABI backwards compatibility anymore, and it became impossible to distribute C++ code that worked across Linux distros, then according to PEP 600 that would simply mean that putting C++ into a manylinux wheel was non-compliant – basically PEP 600 already handles this case. Of course we’d still have to figure out some solution for distributing C++ wheels, but PEP 600 would continue to work fine for all the C wheels out there; we’d just need to extend it with some extra C+±specific tags, not throw it out and start over.

  • But, that’s not going to happen either :-). The g++ maintainers aren’t going to throw away ABI backwards compatibility; they’ve explicitly committed not to do that, and they spend a huge amount of effort on making sure that C++ code built using an older g++/libstdc++ will continue to work with a newer libstdc++, which is what we need. The tensorflow issue isn’t a generic problem with shipping C++ code on Linux; it’s some specific bug in libstdc++, or in the old Redhat devtoolset compilers that the manylinux1 image uses (← this is my guess), or in auditwheel, or in some combination of the three, that’s being triggered by some specific thing tensorflow is doing. PEP 600 is already clear about how to handle this situation: it says that it’s the tensorflow maintainers’ job to figure out how to make their wheels stop crashing. If this were some generic problem with all of C++, then we’d need a more generic solution, but since it’s actually just a specific bug in one specific C++ toolchain/package combination, PEP 600’s solution is the right one.

tl;dr: the tensorflow bug is frustrating, but it doesn’t affect how we tag manylinux wheels, and even if it did, then PEP 600 would still be the right next step.

This should work in general, yes. PEP 600 makes it explicit through the “play well with others” rule: compliant manylinux wheels are required to work in whatever environment they find themselves, and that includes environments that contain arbitrary locally-compiled copies of tensorflow. But, this isn’t really new in PEP 600 – it’s always been an explicit goal of the manylinux work. It’s just that before, we thought that it was so obvious that it didn’t occur to us to write it down :slight_smile:

In the specific situation you describe: the tensorflow bug is somehow related to the janky old compilers used in the manylinux1 image, which use a non-standard, unmaintained fork (!!) of g++ that uses custom hacks to try to support new C++ on old systems. We know this because the bug went away when tensorflow switched to the manylinux2010 image, and the only significant difference between the manylinux1 and manylinux2010 image is that manylinux2010 has a newer, closer-to-standard toolchain.

In your scenario, the user is compiling tensorflow locally using their regular toolchain, not the weird manylinux1 toolchain. So, I’m pretty sure it would Just Work. But, if it didn’t work, then PEP 600 is unambiguous: that would be a bug, and the tensorflow and/or pyarrow maintainers would be responsible for finding a fix or workaround.

2 Likes

Actually on further thought, I might be wrong about some detail here: maybe the bad tensorflow wheels were actually not using the manylinux1 toolchain, but rather some manylinux2010-ish toolchain that google cooked up internally? Sorry, the details on this issue are super confusing and poorly documented, and I don’t want to spread false information.

But: we do know that tensorflow and pyarrow were using some combination of toolchains with weird hacks to try to make new C++ work on old systems, that the segfault backtraces showed code that we know is involved in those hacks, and that the crashes have apparently gone away as everyone moved towards newer more-standard toolchains with fewer hacks.

And, maybe most importantly for the manylinux PEP discussion: we know that g++/libstdc++ upstream explicitly guarantee that everything we want to do is supported and should work, except for these special unsupported toolchain hacks. And if the hacks continue to be a problem then it doesn’t affect manylinux in general, it just means we’ll have to switch back to the regular toolchains, and projects that want to use the latest C++ features will have to target a newer manylinux profile.