The next manylinux specification

pf_moore · June 7, 2019, 11:19am

This remains my most fundamental reason for preferring manylinux2014 over perennial manylinux.

Notwithstanding @njs’s comments that “we actually have an implementation defined standard” right now, what we do have right now, and in manylinux2014 we will continue to have, is a documented set of rules stating what it means to conform to the relevant spec. I don’t buy the argument that “if auditwheel says one thing and the PEP says another, the PEP is wrong”. The reality is that it’s OK for the PEP to need updating, and it’s OK for the updates to not get done in a timely manner - people are human - but you don’t need to read the code of auditwheel to know what the standard is. If you write something that satisfies the standard, and auditwheel rejects it, you can raise the issue and have a debate about whether the standard needs updating, but you can reasonably start from the presumption that the standard is what you work to.

The perennial manylinux spec abandons that and says that you must use auditwheel. And that auditwheel’s word is final - even if you don’t understand why auditwheel says what it does. That is, to me, fundamentally the definition of “implementation defined behaviour”.

I see the practical benefits of not needing a new PEP and an installer change each time we want an updated baseline. And I can see that for people doing the actual grunt work of building wheels, all of this may be irrelevant in practice. But I think there is value in having standards that an outsider (like myself!) can point to and say, “I don’t understand the details here, but I can see that this is a cleanly defined and implementable definition of what’s required”.

So right now, I feel the onus is strongly on the supporters of perennial manylinux to address the concern about “implementation defined behaviour”, not by saying that it doesn’t matter, but by providing a definition of the behaviour that’s external to the implementation. If the PEP process is too onerous, then that definition can be in a spec that is managed and change controlled by a different process, but it should be available somewhere. (At the end of the day, I’m surprised if anyone thinks that a piece of code as complicated as auditwheel should be managed without a written spec - all I’m asking is that if the spec will no longer be managed via PEPs, what is the proposal for managing it in future?)

I don’t actually see any other discussion happening here. The two proposals have been put out there, and there’s mostly silence. So I’m inclined to say that there’s no consensus to be reached here, we’re just looking at a straight choice between two proposals, and it’s ultimately going to be down to me to make that choice. So let’s hear the arguments, and when they die down I’ll take that as a signal that everyone’s ready for a final decision to be made.

(However, if there’s any significant work going on “behind the scenes” that means there is need for further discussion and a realistic chance of a merged compromise proposal, then someone speak up. I’m not trying to shut down discussion here, merely avoid an extended “so what do we do next?” period).

And as background for me, if someone could point me at some instructions on how I, as a package developer working mostly on Windows, would take a package of mine and build manylinux binaries for it, that would be really helpful. I’m aware that I’m coming from this with knowledge that’s mostly focused on the “packaging tool” perspective, and I’d like to balance that with some “project maintainer” understanding. Thanks.

pitrou · June 7, 2019, 11:36am

I don’t think it’s irrelevant. It’s important to have a realistic idea of what you can expect to build against. If manylinux1 had been standardized without a PEP, then building wheels would end up a game of reverse-engineering the auditwheel implementation.

takluyver · June 7, 2019, 11:43am

I think people agree that the main benefit of ‘perennial manylinux’ is avoiding the need for pip updates to propagate, not avoiding PEP-writing. So would it be feasible to define the glibc-based version scheme, so that pip can be adapted to accept future manylinux tags without requiring updates each time, while still requiring a PEP to define each manylinux profile?

This could either be an end state if we’re happy enough with the compromise, or an intermediate step if those arguing for new profiles without accompanying PEPs convince others of their case.

And as background for me, if someone could point me at some instructions on how I, as a package developer working mostly on Windows, would take a package of mine and build manylinux binaries for it, that would be really helpful.

I’m not aware of good docs around this - the binary extensions for Linux section of the packaging user guide is very brief - but the manylinux demo repository might be the best starting point. You’ll need docker, either in a local VM or on a CI service.

pf_moore · June 7, 2019, 1:41pm

That sounds like a reasonable compromise proposal to me. But ultimately the question is whether people like @dustin and @njs could get behind such a proposal.

Thanks for that, but wow, that looks like much more than I’d want to work out for myself. (Not that I’m under any illusions that producing Windows binaries is any easier for Linux users). I think I’ll pass until I have an actual need for it

dstufft · June 7, 2019, 1:54pm

I am not specifically opposed to it, though it depends on the specific details. Although part of me wonders if the reason that this is as painful as it is, is because we’re being reactive in when we define new manylinux specs instead of proactive. For instance, when we produced manylinux1, we used CentOS5 because it was the right answer for people too actually produce now, but we knew at the time we were going to want a newer one eventually. Likewise we produced manylinux2010, and it seems fairly obvious that manylinux2010 isn’t going to be the last one we produce and I assume the hypothetical manylinux2014 isn’t the end state for manylinux either.

Maybe the right answer is right now we should be looking at defining manylinux2014 to handle the immediate need, and manylinux2019 (I think?) for future use. We wouldn’t actually be recommending anyone be using manylinux2019 now except for very cutting edge projects, but if we defined it ASAP and got it into pip and such the propagation can start happening sooner.

IOW, maybe this pain primarily comes from waiting until we need it rather than looking into the future and being proactive.

takluyver · June 7, 2019, 2:13pm

It doesn’t seem that bad. There’s some Travis config to run docker images, and a ~20 line script that runs in those docker images to build and fix the wheels. The rest of that repository is files that would be there anyway.

Of course, it may be more complex for real-world projects, e.g. if you have to compile some binary dependencies first. But many projects with a compiled ‘speedups’ module can probably work a lot like the example.

njs · June 7, 2019, 4:44pm

I feel like I’ve explained this like 5 times already, but… the idea of perennial manylinux is that there is a spec: to be manylinux_X your wheel has to work on the systems it claims to work on. We can quibble over the wording, but the idea is really not complicated.

Luckily, there’s no requirement that PEPs have to be complicated :-). And this spec is precise enough to give unambiguous answers about whether wheels are compliant or not, and to answer the question of whether something is a bug in auditwheel or not. It’s actually more useful than our old specs were. And it’s much more formal and precise than anything we have for windows or macos wheels.

So that’s the spec. Then there’s also the question of how maintainers learn what they need to learn to actually produce wheels that follow the spec. That’s always going to involve a lot of fiddly detailed technical information and special cases. In extreme cases you might need to learn arbitrary amounts about your toolchain, your system’s history, etc. etc. On windows and macos, the answer is “go spend a few weeks reading whatever random tutorials, wiki pages, etc. you can find”. On linux, we have a whole pre-setup toolchain and end-to-end tutorials for those who want them.

But let’s say you’re not interested in using pre-made tools like auditwheel, you’re someone who wants to DIY. In that case – well, it’s up to you, after you’ve read the PEP you can pore over wiki pages and figure things out from scratch. Totally viable if you’re into that. But we also have an exhaustive, detailed, completely unambiguous reference describing every detail of what we know, which is the auditwheel source code. This is, again, much better reference documentation than anything available on windows or macos. It’s also much better reference documentation than what you want to replace it with, because it gets tested, a computer has verified that all that statements are unambiguous, etc. And then if you need some exegesis, there’s also bunch of English descriptions of motivating cases, bug fixes, etc., all cross-linked in the auditwheel issue tracker. Pure gold for someone trying to figure this stuff out.

The things I’m proposing are the things I wish we’d had available back when we were figuring out how to make manylinux1 and auditwheel and all that in the first place. I find it weird that everyone’s saying “but think of the people trying to make independent implementations!”, but no-one’s willing to listen to the only person in the conversation who’s actually done that (with a lot of help from others, of course).

pitrou · June 7, 2019, 6:35pm

It’s not a spec, it’s a circular definition. It’s unhelpful for packagers and users. For example it doesn’t say what the allowed base libraries are (is linking to the system libcrypto allowed? what about libreadline? etc.).

njs · June 7, 2019, 7:26pm

It’s not circular at all. It’s legal for a manylinux_X wheel to link to system libcrypto if it works on all the systems that meet the manylinux_X definition. That’s an empirical question that could be either true or false independently of the spec. In the case of libcrypto, we checked, and it doesn’t work (openssl has lots of ABI variation in the wild). Therefore, the spec forbids it. The role of auditwheel here is to serve as a repository of these empirical facts, not a normative spec.

You might say well, the spec shouldn’t depend on empirical reality, it should be self-contained. It would be nice if that were possible! But it’s not. This is already the actual spec we use, and whenever the PEPs disagree with this spec we have to fix the PEPs to match the spec. It’s just that right now the actual spec isn’t written down.

If you want to understand my confusion, imagine you were working on a project and everyone was very insistent that you should only commit .pyc files to git, not .py files.

pf_moore · June 7, 2019, 7:46pm

I think you have, but I don’t think you’ve realised that I still have no idea what you mean. And from what I can see, there are a lot of other people struggling to understand you, too. I’m genuinely trying to get what you’re saying, but all I keep hearing is circular definitions. You clearly don’t think that they are circular, but whatever insight you have as to why they aren’t is somehow getting lost.

Let me try again.

There are two parts to the spec:

What does it mean for a wheel to be manylinux_X" compliant?
What does it mean for a system to be “manylinux_X” compliant?

I think you’re focusing on the idea that a wheel is manylinux_X compliant if it works on all manylinux_X compliant systems. Let’s assume I’m OK with that (I have some reservations, but on the whole I’m willing to go with it). But that still leaves open the question of how I know if a system in manylinux_X compliant. And I don’t know how you are specifying that - there’s no document I can refer to which lets me check if my system is manylinux_X compliant, and no set of checks I can run to do so.

And suppose I take it the other way around and assume that a system is manylinux_X compatible if it runs manylinux_X compatible wheels. I’m much less happy with that as a definition, but once again, let’s go with it. Now, how do I check if my wheel is manylinux_X compatible? If the answer is “run auditwheel on it”, do you then see that this means we have a definition that’s implementation defined (specifically, defined by the implementation of auditwheel)? And that’s not sufficient, for all of the reasons why we have standards in the first place, to avoid implementation-defined behaviour. And if the answer is something other than “run auditwheel on it”, I don’t know what that answer is, so a (possibly repeated) explanation would be helpful.

Whichever way you choose to frame the argument, one of the parts needs to be clearly defined, and you don’t seem to be doing that. Which is why people feel your arguments are circular.

If you’re still struggling to understand my confusion, here’s another suggestion - can someone else who’s arguing for the perennial manylinux proposal try to clarify? Maybe a different perspective will help break the impasse here. Maybe @takluyver can help to clarify? Once we can get past this issue of not understanding each other, we’ll hopefully be able to get the discussion moving forward again.

njs · June 7, 2019, 8:18pm

Correct.

The draft PEP answers this in the section Platform compatibility. The draft wording might not be perfect, but basically the idea is that a system is manylinux_X compatible if:

It uses glibc X or newer
It hasn’t been explicitly configured (via the magic _manylinux module) to declare that it should not be considered manylinux_X compatible

And these are the systems that the wheels are expected to work on.

This is the same definition of system compatibility used in the previous manylinux PEPs. It’s sufficiently well-defined that pip can compute whether it’s running on a manylinux_X system via totally automated means. I guess it’s true that we don’t provide a standalone program that does this, but there’s sample code in the previous PEPs that we could copy in here too if you want. This is also how we all sort of assume the macosx tags work – a macosx_10_10 wheel “should work” on all macOS 10.10+ systems. And none of us understand 100% of what that means in terms of which symbols you can use from which libraries, etc., but we all know how to identify a macOS 10.10+ system, and it turns out that’s enough.

In practice I guess there’s also a bit of “reasonable person principle” here – if someone tries to game the system by like, first installing Debian and then deleting a bunch of system libraries (but not glibc), and then insists that this means everyone building manylinux wheels has to somehow make them work on their broken system, then we’re obviously going to ignore them, just like we would if someone installs macOS and then deletes a bunch of system libraries, or is using some weird hackintosh setup. In practice this has never been an issue so far. Some projects like tensorflow have decided that they don’t want to support all the manylinux1 systems out there, but there’s never been any question about which systems are manylinux1 systems.

No, I’m doing the other argument, not this one

I guess one way to put it is: the manylinux1/manylinux2010/draft-manylinux2014 PEPs are all overspecified. They have a definition of a manylinux_X system and a definition of a manylinux_X wheel, and the two definitions are totally independent. We hope they match, we tried hard to make them match, but there’s nothing to guarantee they match, and over time they tend to get out of sync and then we have to fix the PEP.

The perennial manylinux idea is to pick one of these and make it the “source of truth”. And between the two, the system definition is more fundamentally connected to what we’re trying to accomplish, we have less control over it (people are going to use systems whatever systems they use; it’s encoded into pip so changes take a long time to propagate), and it’s empirically more stable over time (hasn’t changed at all since manylinux1). So we should standardize the system definition, and then use that to derive the rules for wheels.

brettcannon · June 7, 2019, 8:34pm

I think it helps that macOS is coming a closed, centrally controlled ecosystem because we all inherently understand when I say “go buy a Mac with macOS 10.14 installed on it” versus “get me a PC with manylinux2010 support” (like I couldn’t remember what .so files would be considered compatible or glibc, just that it’s the oldest supported CentOS version).

I know for me the tricky bit with the perennial manylinux proposal is what beyond glibc matters? The PEP says:

As with the previous manylinux tags, wheels will be allowed to link against a limited set of external libraries and symbols. These will be defined by profiles in auditwheel.

That does run into the worry that @pf_moore has about the spec being tool-defined. Now maybe I’m worrying too much about the potential compatibility of those modules or their existence, but that would seem to be potentially be a point of fluctuation between manylinux versions (I don’t know if it has been yet, though).

njs · June 7, 2019, 8:48pm

It’s definitely a cause of fluctuation – for example, since the original manylinux1 PEP, Fedora has changed their soname for libncurses and for libcrypt (the one that’s part of glibc, not the libcrypto that’s part of openssl), so we can no longer rely on those being available everywhere. Other fluctuations have come from changes between Python versions (there are some Python C API symbols we’ve had to disallow because they aren’t ABI stable), and general increases in knowledge (we didn’t discover that linking to libpython.so should be disallowed until some time after we thought we’d finalized the manylinux1 spec). But none of these are fluctuations between manylinux versions. When the latest Fedora comes out and does something weird, we have to go back and fix all the manylinux specs, because they’re all supposed to work on Fedora.

pf_moore · June 7, 2019, 9:54pm

OK. So basically all of the requirements on additional libraries are being dropped? That’s fine, I’m not going to make a judgement on what the spec should be, that’s for the experts to decide. But I think it needs to be clear that this is a point of difference between the 2 proposals.

As with the previous manylinux tags, wheels will be allowed to link against a limited set of external libraries and symbols. These will be defined by profiles in auditwheel.

This, as @brettcannon says, is where I am not comfortable, as this is explicitly an implementation-defined limitation. Adding this to your definition above, you’re saying that the requirements on additional libraries actually aren’t being dropped, they are just being defined by auditwheel and (unless this is mentioned somewhere in the PEP that you didn’t quote above) that additional restriction isn’t explicitly called out as part of the spec.

It’s actually a bit worse than that, as you say the PEP defines what it means for a system to be compatible, and defines compatibility for wheels by inference, as “must work on compatible systems” - and yet auditwheel tests wheels not systems - so we’re left with a situation where there’s no complete spec for testing compatibility of either.

OK, so as things stand, I now better understand your position. I disagree with it, but I’m not the expert in this area, so I’m not going to make a unilateral decision here. The people you have to persuade are the people arguing for manylinux2014, who have mostly stayed silent so far in this latest discussion. So I’m going to back off at this point and let them have their say.

I’ll pick things up again once the latest round of discussion has died down, and we can assess whether we’re at a point where a decision can be made.

pitrou · June 7, 2019, 10:12pm

Are you saying the current PEPs are lying?

njs · June 7, 2019, 11:09pm

No, the additional library requirements aren’t changing in any substantive way.

The detailed list of libraries was always part of the “wheel” definition, not the “platform” definition. Basically the spec said “you can use any libraries that are present with compatible ABIs on all the systems that meet the manylinuxX definition, and our best guess as to that list is: …”. And then whenever we learned more, we updated that list.

With perennial manylinux, the spec just says “you can use whichever libraries are present on with compatible ABIs on all the systems that meet the manylinuxX definition”, and then it’s the implementer’s job to figure out which ones those are. And as one particular implementer, auditwheel will continue to keep a list, that we’ll update as we learn more.

Yeah, if this is the sticky point then that wording will need to be changed. In particular the word “defined” shouldn’t be there. Are you happier with how I put it up above?

We have a complete testable spec for systems. For wheels, our options are to either have a complete spec, or else have a useful spec. Pick 1. Because ultimately, “useful” means “it works for users”, who are using whatever systems they’re using. To the extent we understand what they’re using, we can encode that in auditwheel and test it, but it’s never going to be 100% complete. Fortunately, it works well enough in practice.

No-one knows how to look at a wheel and tell for sure if it will work on Windows 7 or macOS 10.10, either. We just trust individual maintainers to figure it out, and they generally do.

The current PEPs are correct that if you meet the requirements that they specify, then you have, tautologically, met the PEPs’ requirements. But that doesn’t guarantee that the wheels will actually work for users, and yeah, right now when auditwheel and the PEPs disagree, the PEPs are probably wrong. And following the PEPs’ requirements produces a wheel that doesn’t work for users, then the PEPs are definitely wrong.

Going back to one of my first posts on this whole idea:

in practice the definition of a manylinux1 wheel is “I promise this wheel will work on any system with glibc 2.5 or greater and an Intel processor”.

But most maintainers have no idea how to actually fulfill that promise, which is where the docker image and auditwheel come in

pitrou · June 7, 2019, 11:12pm

But do they actually disagree and, if so, why wasn’t the PEP updated to reconcile them? That was my question, not some kind of abstract philosophical puzzle.

njs · June 7, 2019, 11:17pm

I haven’t done a detailed audit to see exactly which constraints on compatibility are (1) known, (2) encoded in auditwheel/the build images, (3) encoded in the PEPs. When the manylinux1 spec was first written then all three lists were the same, but since then there have been a lot of changes, and we’ve tried to keep them close, but as you know it’s easy for things to get dropped on the floor and I doubt we’ve done a perfect job.

And in practice, updating the PEP is the lowest priority, because if auditwheel is wrong then people yell at us and their wheels break, but if the PEP is wrong then nothing bad happens and no-one notices.

dstufft · June 7, 2019, 11:22pm

As I understand it, the non compliant wheels produced by Tensorflow work fine… in isolation and they only crash when you mix libraries that correctly are produced and tensforflow in the same process. It seems like in this defintition, the tensorflow binaries would be compliant-- because they work! until you install another library. Whose to blame then? Without a defined spec for what is or isn’t manylinux compatible, it seems like both libraries would be pointing the finger at each other, and both would be correct?

dustin · June 7, 2019, 11:26pm

Just a heads up: I’m AFK for the next 2 weeks, but I’ve specifically asked the rest of the folks I’ve been working with to chime in here.

Also, I’m sensing that this discussion is starting to get a little bit heated, so I’d like to ask y’all to please be mindful and welcoming. I’ve asked them to disclose their affiliations as well, so it’s clear what everyone’s motivations are.