The next manylinux specification

In that spirit, I went ahead and posted the rewrite of the perennial manylinux PEP that I’ve been working on for the last few weeks:

PR: PEP 600: Update text to align with discussion by njsmith · Pull Request #1137 · python/peps · GitHub
Rendered: peps/pep-0600.rst at 36b7d644f56cfa764b6d0a61fd03084bffe47fda · python/peps · GitHub

There’s a lot of changes to the text, but the core proposal hasn’t changed at all (indeed, it hasn’t changed for a year+ now). There are some minor technical tweaks, like incorporating the advice from the glibc maintainers about how to interpret glibc version numbers, and more details on exactly how backcompat with manylinux1/manylinux20XX should work. But my main goals were to flesh out the rationale text, and explicitly incorporate all the relevant points from the discussion here.

As far as I know, it is fully developed. I believe that draft has a stable core, fully worked out details, and addresses all the concerns that have been raised.

If anyone reads that new draft and still thinks there’s something that needs “further development”, I would greatly appreciate specific comments on what you think is missing.

[Edit: well, OK, one obvious thing that’s missing is that I didn’t update the legacy manylinux section to account for the acceptance of manylinux2014 :-). That should be pretty mechanical and I’ll let Paul say whether he thinks that’s an important thing to do right now.]

2 Likes

Once the PEP is formally marked as accepted @sumanah and I will work together to do this. I meet with the Tensorflow SIG-Build team again on Aug 6th, @pf_moore do you think we can do this before then? Any other blockers here?

Oh this front, the Tensorflow team has announced that they currently have a manylinux2010-compatible build and will be publishing it within the next two weeks. This build also does not experience the previously reported crashes when loading together with other C++ based packages.

1 Like

I’m not sure what you want here - if you’re just after me updating the PEP status, I’ve now done that (sorry! should have done that sooner). Let me know if there’s anything else blocking you.

1 Like

It’s been ~2 weeks since we posted the new version of PEP 600 that tries to address all the issues raised here. Does anyone have any feedback? Latest draft is visible here:

1 Like

@dustin has been ill this week. He, @zwol, and I spoke earlier this week, and based on that I am speaking in his/our stead on this, and have made a draft roadmap for manylinux2014 rollout.

Reasons the manylinux2010 rollout snagged and went too slowly:

  • it wasn’t clear who should be doing what (often partly because of scarce maintainer time)
  • there were a few competing PRs implementing the build environment and it wasn’t clear who should review things (ditto)
  • the order of operations wasn’t clear until Nick wrote up the tracking issue
  • we haven’t ensured ALL helper utilities support manylinux2010, like cibuildwheel, multibuild, and dockcross, and that’s slowed down adoption
  • we haven’t concisely explained stuff like “now do x, it’s safe and it’s now the recommended approach, this is orthogonal to distutils/setuptools/flit/poetry/ and similar kinds of packaging tooling choices, here are the consequences of changing your build environment, these are the compiler gotchas, these are the ways we’re deprecating support for things your users may still want” and spread that message in the right places

Thus, in order to make the manylinux2014 rollout go smoother, we will:

  • identify interested parties early and get tentative commitments for work (seeking volunteers as well as institutional support for development, technical advisory, code review, test case inventory, testing, infrastructure-building, personnel gathering and management, maintainer outreach, documentation, bug tracker caretaking, and outward-facing online community liaison work)
  • have a roadmap upfront
  • in that roadmap and in the rollout, ensure that adding documentation to the Python Packaging Guide is a first-class priority
  • point people to manylinux2010 issues that will help them build manylinux2014 equivalents

@pf_moore I hope this begins to address your concerns – thank you for making this request.

2 Likes

BTW, in my “hey, let’s get companies and grants to fund stuff” description of work needed to improve wheel-building, I include:

This is probably not the right thread for corrections/additions to that list of requests or that summary, but I am open to corrections and additions.

Thanks, yes this looks like a good plan.

It’s now been ~1 month since the new version of PEP 600 was posted, and there still haven’t been any issues raised. Maybe this means it addressed everyone’s concerns? Either way, it’s impossible to address concerns that haven’t been raised, so I guess that if it stays quiet for a few more days I’ll ask Paul to formally accept it.

I’d like explicit confirmation from @ncoghlan and @dstufft at least that their objections have been addressed, as they had the most significant concerns. I’ve still to find some time to review the new version myself, but as I’ll have to do so before accepting it, I’m reasonably happy that my views won’t get accidentally missed :slightly_smiling_face: (I think my reservations were echoed by others, though, so if we get the acceptances I’ve asked for, I doubt I’ll disagree).

1 Like

@pf_moore @njs Regardless of whether there are any remaining concerns, I believe it would be more appropriate to defer PEP 600 while we see how the manylinux2014 rollout goes. We could very well discover new problems in the process.

Also, here is a concrete concern with PEP 600: I still suspect that tying manylinux version numbers to glibc version numbers, and only to glibc version numbers, may be unworkable in practice. In particular, I think it would be a mistake to accept PEP 600 until the actual root cause of the infamous "import pyarrow, tensorflow = interpreter crashes" problem is found and fully understood, because a permanent solution to this problem may very well require the manylinux versioning scheme to encode the g++ version number as well.

The tensorflow maintainers seem to have lost interest in investigating this bug after finding that they could reproduce it with their manylinux1 packages but not their manylinux2010 packages. The root cause is still not known. From the limited investigation I did myself, I am inclined to suspect auditwheel of doing something wrong in the process of copying shared libraries into wheels, but that’s as far as I got.

I have a much longer post addressing the points you made here, but before that, I have a specific technical question regarding manylinux.

The “tensorflow crash” issue is about two binary wheels, both declared manylinux1 compatible, causing a crash when used together in the same Python process. That’s obviously an issue for the end user, who has a reasonable expectation that when we say “compatible” we imply “with each other” (even if technically all the tag guarantees is “with the platform”).

But what about a manylinux wheel being used alongside a locally compiled wheel? If some user did pip install --no-binary tensorflow tensorflow arrow, to install a manylinux binary of arrow and a locally-compiled tensorflow, do we guarantee that the two packages are compatible in that case?

I’m asking because I’m struggling with the question of precisely what we think manylinux compatible wheels are compatible with - my experience on Windows, where compiled extensions are essentially required to all be built with exactly the same compiler, with exactly the same compile options, is of no use here :slightly_frowning_face:

The manylinux2014 PEP is literally a copy/paste of the manylinux2010 PEP with minor updates, which in turn was literally a copy/paste of the manylinux1 PEP with minor updates, and PEP 600 is simply replacing the copy/paste with a generic template. Of course you’re right at some level – the reality of software development is that you can always discover new problems at any point – and I know that you weren’t here for the last two rollouts, so that probably makes the unknowns seem larger. But we can’t stop all progress out of the vague fear that some unknown problem might be lurking; we have to balance the risks and the unknowns. In this case, PEP 600 is very conservative and low risk to start with, and I think it’s very unlikely that going through a third roll-out is going to reveal some huge new issue that we missed in the first two roll-outs.

So there’s a few issues here:

  • This is completely orthogonal to PEP 600. If you’re right and it turns out that tying manylinux versions to glibc version numbers really is unworkable, then we would need to tear up all the existing manylinux PEPs and start over. That’s true regardless of whether PEP 600 is accepted, and PEP 600’s acceptance or rejection doesn’t materially affect the remediation we’d have to do.

  • But, that’s not going to happen :-). Even if the g++ maintainers decided that they didn’t care about ABI backwards compatibility anymore, and it became impossible to distribute C++ code that worked across Linux distros, then according to PEP 600 that would simply mean that putting C++ into a manylinux wheel was non-compliant – basically PEP 600 already handles this case. Of course we’d still have to figure out some solution for distributing C++ wheels, but PEP 600 would continue to work fine for all the C wheels out there; we’d just need to extend it with some extra C+±specific tags, not throw it out and start over.

  • But, that’s not going to happen either :-). The g++ maintainers aren’t going to throw away ABI backwards compatibility; they’ve explicitly committed not to do that, and they spend a huge amount of effort on making sure that C++ code built using an older g++/libstdc++ will continue to work with a newer libstdc++, which is what we need. The tensorflow issue isn’t a generic problem with shipping C++ code on Linux; it’s some specific bug in libstdc++, or in the old Redhat devtoolset compilers that the manylinux1 image uses (← this is my guess), or in auditwheel, or in some combination of the three, that’s being triggered by some specific thing tensorflow is doing. PEP 600 is already clear about how to handle this situation: it says that it’s the tensorflow maintainers’ job to figure out how to make their wheels stop crashing. If this were some generic problem with all of C++, then we’d need a more generic solution, but since it’s actually just a specific bug in one specific C++ toolchain/package combination, PEP 600’s solution is the right one.

tl;dr: the tensorflow bug is frustrating, but it doesn’t affect how we tag manylinux wheels, and even if it did, then PEP 600 would still be the right next step.

This should work in general, yes. PEP 600 makes it explicit through the “play well with others” rule: compliant manylinux wheels are required to work in whatever environment they find themselves, and that includes environments that contain arbitrary locally-compiled copies of tensorflow. But, this isn’t really new in PEP 600 – it’s always been an explicit goal of the manylinux work. It’s just that before, we thought that it was so obvious that it didn’t occur to us to write it down :slight_smile:

In the specific situation you describe: the tensorflow bug is somehow related to the janky old compilers used in the manylinux1 image, which use a non-standard, unmaintained fork (!!) of g++ that uses custom hacks to try to support new C++ on old systems. We know this because the bug went away when tensorflow switched to the manylinux2010 image, and the only significant difference between the manylinux1 and manylinux2010 image is that manylinux2010 has a newer, closer-to-standard toolchain.

In your scenario, the user is compiling tensorflow locally using their regular toolchain, not the weird manylinux1 toolchain. So, I’m pretty sure it would Just Work. But, if it didn’t work, then PEP 600 is unambiguous: that would be a bug, and the tensorflow and/or pyarrow maintainers would be responsible for finding a fix or workaround.

2 Likes

Actually on further thought, I might be wrong about some detail here: maybe the bad tensorflow wheels were actually not using the manylinux1 toolchain, but rather some manylinux2010-ish toolchain that google cooked up internally? Sorry, the details on this issue are super confusing and poorly documented, and I don’t want to spread false information.

But: we do know that tensorflow and pyarrow were using some combination of toolchains with weird hacks to try to make new C++ work on old systems, that the segfault backtraces showed code that we know is involved in those hacks, and that the crashes have apparently gone away as everyone moved towards newer more-standard toolchains with fewer hacks.

And, maybe most importantly for the manylinux PEP discussion: we know that g++/libstdc++ upstream explicitly guarantee that everything we want to do is supported and should work, except for these special unsupported toolchain hacks. And if the hacks continue to be a problem then it doesn’t affect manylinux in general, it just means we’ll have to switch back to the regular toolchains, and projects that want to use the latest C++ features will have to target a newer manylinux profile.

This is definitely something that many users of Python on Linux will expect to work, and in at least a majority of cases it should indeed work. I think we might even be in a place where we could declare it to be a bug in CPython and/or pip if it didn’t work, as long as both of the wheels contained only “ordinary C” code, for some value of “ordinary C” to be determined. (The line probably falls somewhere among the features added to C in C2011.)

On the other hand, we know this doesn’t always work when C++, threads, static initializers and deinitializers, and esoteric features of the dynamic linker get involved. How hard it would be to guarantee it in all cases? At least as hard as a proper investigation of the original tensorflow-vs-pyarrow bug – see the reply to Nathaniel that I will be posting shortly. I think it’s doable, but I don’t think we’re there yet.

This is basically the same reaction I got from the tensorflow build group. They saw crashes with their allegedly-manylinux1 wheels, and no crashes with their allegedly-manylinux2010 wheels (loaded next to the same allegedly-manylinux1 wheel of PyArrow, IIRC) and they said, ok, problem solved, we’re done here.

We are not done here.

Until someone finds out the true root cause of the original crash, and determines why the problem went away with the allegedly-manylinux2010 wheel of TensorFlow, we do not know that the hacked toolchain really was at fault, we do not know whether similar crashes might recur in the future, and we do not know what we actually need to do to make C++ code work reliably in manylinux wheels.

I no longer know enough about how C++ is implemented to root-cause it myself. But I can say with confidence that a comprehensive analysis of this bug will require at least a full week of investigation by an expert, and that expert’s report will be several thousand words long. Anyone who comes in with less than that, I’m not going to believe they actually understand the problem.

I honestly don’t think we know that yet, either. I have not yet seen an actual list of “everything we want to do” with vendored shared libraries and/or C++ – it’s not in any of the manylinux PEPs as far as I can tell – and, in the absence of the aforementioned expert report, I don’t believe we know whether “everything we want to do” is supported.

@njs, you just caused me to throw away half an evening’s work composing a reply. I hope you’re pleased with yourself :grinning:

I mostly agree with what you say, so I’ll just pick on some high spots:

  1. I agree regarding deferral. After all, the whole point of manylinux2014 was to not be substantially different from 2010. So no, I see no value in waiting even longer.

  2. I don’t think it’s entirely true that we can ignore the possibility that the glibc version isn’t a sufficient heuristic. After all, unlike previous PEPs, the whole point of perennial manylinux is to avoid needing changes in the future, and while we can’t predict everything, we should at least do due diligence. Having said that, if no-one is prepared to do the work to back up the claim that the crashes demonstrate a need for a better heuristic, then tough.

  3. How much input has there been from package maintainers into the PEP? Are they OK with incompatibility crashes being the responsibility of the projects? What about end users? The PEP is basically saying that what happened with tensorflow/pyarrow is fine. I didn’t get the impression that users thought it was fine at the time…

OK, so precisely what do you want to happen next? How do you propose to deliver that week’s worth of expert time, that several thousand word report, and the resources needed to read that report, understand it, and formulate a proposal based on it?

I’m sympathetic to the view that rushing into a solution with known problems is not ideal. But nor is endlessly doing nothing in pursuit of a perfect answer.

At this point, I’m starting to feel that we need to take the position that the point of the PEP process is to ensure that all views have been considered and responded to, but stalling forever hoping for unanimous agreement that will never come is counter-productive. So unless you have a plan of action, I’m inclined to consider your comments on the C++ compatibility question heard and responded to, and move on.

I still want to hear from @ncoghlan and @dstufft, and I’d like some indication of package maintainer and end user response, as noted above (although we may not be able to get the latter). So maybe we can put the tensorflow crash issue on the back burner for a day or two? No decision will happen in that time, so there’s no pressure to get everything responded to tonight.

1 Like

I think I finally understand why you think perennial is done and I don’t.

Perennial aims to be the final PEP on the subject of what makes a “manylinux” wheel. But it only covers the same subject matter that the earlier PEPs (for manylinux1, -2010, and -2014) covered. You think that’s all that it needs to cover. But I think it also needs to cover the process of updating everyone involved from version X to version X+1. And it is exactly that process that has never yet successfully been completed. The ml2010 and ml2014 PEPs have been published, but most everyone is still using ml1!

My position is that until we carry out one of those migrations successfully, we won’t even know what the missing pieces of the perennial design are, and therefore accepting the PEP would be premature.

I think I may have misstated my concern here. Whether the version number of the tag is based on glibc version numbers, years, or whatever doesn’t prevent someone (the PEP writers, under the old process; the auditwheel and build-image maintainers, under perennial) from specifying that official “manylinuxWHATEVER” wheels are to be built using glibc vX, g++ vY, etc. And that certainly ought to be enough to deal with the actual C++ binary compatibility problems we have observed to date. (I reserve the right to retract this statement based on the conclusions of the root-cause analysis for the pyarrow+tensorflow crash.)

The problem is what happens if someone builds unofficial wheels in their custom lash-up environment that has, I dunno, glibc 2.12 but the very newest C++ compiler, and calls that a manylinux_2_12 wheel and ships it to their fanbase - probably outside PyPI - and then PyPA takes the blame when it doesn’t work properly. You know, exactly like what happened with TensorFlow and manylinux1. :wink:

With year-based versioning we have a nontechnical defense against this kind of behavior: we can say “that’s not really a manylinux2010 wheel.” I fear it will be harder to make that argument with version numbers that are explicitly tied to a glibc version number but not to anything else.

1 Like

First off, I want to ask you to read my second reply to @njs, which makes what I believe to be a much stronger case for deferring PEP 600 (based on lack of experience with the actual migration process).

Regarding TensorFlow and specifically

me and @sumanah have tentative plans to scare up funding, to hire someone with current experience with the guts of the GNU Compiler Collection, to put in that time and write that report. Once we have it, I think I can promise to produce a proposal.

I can’t commit to a timeframe for any of that at the moment, but since I think PEP 600 should be deferred until after the manylinux2014 rollout has happened anyway, and that’s going to be several months at least, I don’t see that as a problem.

I mean, I basically agree with this – I would very much like to understand wtf happened here, because it definitely indicates some kind of gap in our understanding.

But all our target platforms are super complicated, no-one understands them fully, and they keep changing, so we’re always going to have gaps in our understanding. It’s not reasonable to say “we have some evidence that there’s a bug somewhere and we’re not sure if it’s gone or not. SHUT EVERYTHING DOWN UNTIL IT’S FIXED”. And it’s particularly unreasonable to say we need to shut down PEP 600 but not manylinux1/manylinux2010/manylinux2014, since they’re the same at the technical level; the difference is in how we manage the human parts, policy and coordination.

Uh… sorry?

Yes and no… the point of PEP 600 is to avoid the overhead of copy/pasting the same policy from version to version, for the core case of Linux wheels that run on a broad set of mainstream systems. I expect we’ll still have PEPs to define new wheel tags in the future, just they’ll be focused on solving actually new problems, like “wheels that work in alpine”. I think Zack is being over-pessimistic when he says we’ll need a way to tag C++ versions in wheels, but if it turns out I’m wrong then we’ll add a new tag scheme for that (e.g. manylinux_${glibc version}_with_c++_${libstdc++ version}), and the PEP 600 tags will remain useful for all the non-C++ projects.

I’m not sure where you’re getting this impression… the text says:

Example: we’ve observed certain wheels using C++ in ways that interfere with other packages via an unclear mechanism. This is also a violation of the “play well with others” rule, so those wheels aren’t compliant with this specification.

“Not compliant” != “fine” :slight_smile:

I’m not sure what kind of input you’re looking for… at the PEP level, basically the only two things we can say are “incompatibility crashes are great” or “incompatibility crashes are bad”; if we say they’re bad then it means we think someone should fix them, but a PEP can’t force any specific person to do that work. And I’m pretty sure we’re going to stick with “incompatibility crashes are bad” no matter what input we get from package maintainers :-).

The PEP is explicit that the PyPI maintainers have the right to block packages that they think will cause problems, but it leaves the exact checks and mechanism up to their discretion.

I won’t claim it’s impossible for new issues to be discovered here… it’s always possible to discover new issues with software systems, any time you do anything with them. But I don’t understand why you think this specifically is a high-risk situation where we need more data. We’ve seen tons of projects migrate from no-linux-wheels to ml1, we’ve finished all the ecosystem-level work for the ml1→ml2010 transition, and fundamentally ml2010 is exactly like ml1 except that you’re allowed to use some newer features. There are >1000 different ml2010 wheels on PyPI now, and I haven’t heard of any issues. It’s true that there are lots of projects that haven’t switched yet, but I’m really struggling to think of any plausible mechanism for how the next project switching ml1→ml2010 will discover some deep problem that forces us to completely rethink how we handle linux wheels. Can you give an example of the kind of issue you’re worried about?

Have you had a chance to read PEP 600 yet? I tried to make it incredibly, painstakingly explicit that your hypothetical wheel using the very newest C++ compiler is not a valid manylinux_2_12 wheel, just like it’s not a valid manylinux2010 wheel. So we have exactly the same non-technical and technical defenses either way. And if that isn’t explicit in the text, then I want to know :slight_smile:

PEP 600 is just codifying the rules that we already use to maintain all the previous manylinux PEPs. So if the manylinux2010 PEP and PEP 600 disagree about whether a wheel is valid, then that’s a bug in the manylinux2010 PEP.

IMO what @zwol is saying about investigating what caused these crashes makes sense. We should figure out a strategy to avoid them. The funding and grant work to get some expert to understand those issues is more than welcome. No one is opposed to that AFAICT.

However, blocking a process improvement isn’t going to do much (anything?) toward addressing that issue. PEP 600 is essentially an independent process improvement, that’s not going to affect if/when these kinds of issues occur with our current manylinux scheme – it’s only reducing how much of a process overhead there is.


Based on some second hand experience of seeing folks struggle with those crashes with these packages, they usually assign “blame” to:

  • the installer, because it installed what are incompatible packages. (I’ve had not-so-happy folks come to complain about this to me. :upside_down_face:)
  • the projects, if they are aware that projects are responsible for publishing binary wheels. (pyarrow doesn’t work with Tensorflow so something’s wrong with it or vice-versa)

So, based on this experience, I don’t think that the PEP should really take a stand on this. I’d rather leave it to the discretion of PyPI admins to figure out the exact details of how to determine “problematic” packages.

That said, I feel it’s pretty reasonable to expect the responsibility for this to be on package maintainers/publishers.

2 Likes