The next manylinux specification

pf_moore · June 18, 2019, 11:09am

Yes - we could accept manylinux2014 (the part about compatibility requirements that impacts auditwheel and build tools), and also accept perennial manylinux when it’s done (which covers the other parts of the mechanism - tag names and installers). This allows the people waiting to start work to get going now, at the cost that people involved in both pieces of work will have to juggle their time carefully. (Plus some rewriting of the manylinux PEP to reflect its changed scope, but that can happen after the fact).

There’s actually very little overlap between manylinux2014 and perennial manylinux. It’s mostly just “Platform Detection for Installers” (in manylinux2014) vs “Platform compatibility” (for perennial manylinux) and the actual tag name. The rest is just a question of whether we define the platform rules in the PEP or not (and that’s basically irrelevant to the work of updating auditwheel and the docker images).

But honestly, the only thing I am waiting on is some feedback from the manylinux2014 people (@dustin, @gunan, and @angerson) that they are happy with your comment that “the best use of our energy is on polishing off the last few details on perennial manylinux and moving onto impementation ASAP”. If they can get behind perennial manylinux, I’m happy to say that we go with that option.

I’m not suggesting we “encourage” anyone to work on manylinux2014. All I’m suggesting is that we ask the people who have already committed themselves to working on it, whether they would be willing to extend that commitment to working on perennial manylinux instead. And then deciding based on the answer to that question.

But if they would rather put their efforts into implementing manylinux2014 as it stands, with perennial manylinux as the “next step”, then I don’t really see much downside in letting them - no-one has yet provided any objection to manylinux2014 other than “it diverts effort from a better long-term solution”, and if the effort being diverted isn’t willing to work on perennial manylinux as it now stands, where’s the loss?

I really want to make use of the already committed resources we have - and if that means approving manylinux2014 in the knowledge that it will be quickly followed by perennial manylinux, I’m fine with that.

I have no problem with finding an alternative to the PEP process, if people believe it’s too heavyweight. But I still believe that clearly documenting the requirements for future compatibility is important, and while I want to avoid emotive terms like “implementation defined standards”, I’m not the only one to have expressed this concern. Defining the process for ensuring that people can access the definition of a given compatibility level remains an outstanding question for perennial manylinux. But I’m not going to hold up the decision about where we focus our efforts just to address the procedural question of how we manage the compatibility level definitions.

100% agreed. Let’s put it this way, then. I’m away for 2 weeks from this Friday. Let’s aim for a single proposal by the time I get back, supported by (at a minimum) @njs and @dustin. I’ll approve that proposal on my return. The proposal can have “to be decided” sections (such as bikeshedding on exact tag names, and the exact process for defining future compatibility levels).

If we can’t agree a unified proposal¹ by then, I’d be looking to provisionally approve manylinux2014 to allow implementation work to proceed in parallel with any remaining discussion that’s needed - on the understanding that priority goes to the auditwheel/build environment work, as that’s the work that will be least impacted by the ongoing discussions.

¹ Which I’d expect to be based on the (perennial manylinux) principle that we can define a rule for installers to generate supported tags that will work indefinitely, and a rule to allow detailed compatibility specs to be attached to those tags,

However, I really hope that you’re right, that (a) everyone is comfortable with where the perennial manylinux proposal has ended up, and (b) the amount of work remaining on details for that proposal is minimal.

takluyver · June 18, 2019, 11:53am

I think this is a good point. I haven’t heard any argument for manylinux2014 besides the likelihood of it being ready sooner than perennial manylinux. If we can agree about how & where future manylinux library profiles should be defined, I think we should use the enthusiasm for a new manylinux flavour to try and create a more long term solution to some of the problems.

You’re convinced, but (as I see it), this was your argument all along. I’m open to the idea but not 100% convinced, and I get the impression other people in the discussion are less convinced than me.

Would it be acceptable to you to hammer out the tag naming so that platform compatibility can be defined based on a glibc version in the tag, but still have a PEP defining wheel compatibility for manylinux_2_17 or whatever we call it (aka manylinux2014)? Then we could have a further discussion about going ‘fully perennial’ and defining new compatibility levels without PEPs.

jayfurmanek · June 18, 2019, 6:52pm

Hi all,
I work at IBM and contribute to Tensorflow (I also hopped over from the TF SIG Build group mentioned by @gunan and @angerson ). Our group was indeed leaning toward backing manylinux2014 first as mentioned a few times already. The impending EOL of the older CENTOS versions is the original reason for haste, but there is also the careful step over the c++11 standard hazards that we considered as an important reason why it sounded good to turn the crank one more time in a familiar way, solve those immediate issues and buy some time to do a perennial spec properly.

We at IBM also have a keen interest in multi-platform aspects that are in the manylinux2014 drafts. Specifically, ppc64le as an architecture is relatively new and wasn’t introduced until the CENTOS7 timeframe. A manylinux standard based on CENTOS7 would allow us to participate for the first time. Multiple architecture support in general is new to manylinux specfications and is perhaps more evidence that a two-step update to get to perennial is more prudent.

The final thought of pause I have on perennial, is what it would look like, big picture, and how long would it take to decide it. Maybe there are more things to argue about on it which could delay a final spec.

Now, some of the above is based on the perhaps incorrect notion that manylinux spec authoring and ratification is a long and slow process and could push out past when CENTOS6 goes EOL. Some in the community believe manylinux2010 is still not complete and use that as the only data point coming to this conclusion.

The concerns I outline above are vague, I admit. I’m heartened to see optimism here on a plan for a perennial spec. If there are folks here that know better, and believe in a timely design and ratification of a perennial spec that would include the multi-architecture elements present in the 2014 draft, I’d happily change my opinion.

dstufft · June 20, 2019, 4:08pm

FWIW I don’t specifically care if the spec for what compatibility looks like lives in a PEP or not, but I do think that it is important that it lives in a standard document somewhere, and I don’t think that a json blob inside one project counts as a standard document. For one it has no ability to have prose to describe why something is the way it is, for another it’s been suggested already that the JSON document alone isn’t enough. If, for instance, the decision is that this all gets documented as part of packaging.python.org like other specs do, and we no longer need a PEP each time to update it then that is ~ok by me.

But I do think it is important that there is a somewhat neutral/standard place that this all gets documented

I’m not personally swayed by the argument that the PEP is already out of date because changes have been made to auditwheel that haven’t been reflected in the PEP. To me that sounds more like a failure of process in auditwheel and not the standard to strive for.

njs · June 23, 2019, 2:44am

If you’re used to standards bodies like IETF or ISO, then we’re much less formal than that. Basically the formal part of the procedure is just “Paul makes a decision, or appoints someone to make a decision”. This can be faster or slower – it can be faster because if something is uncontroversial we can just do it; it can be slower if the discussion gets confused or people wander away and lose interest.

I believe the perennial approach is just as friendly to alternative architectures as the 2014 draft.

I get the impulse to kick the can down the road, but I’d really rather get this finished now instead of spending more time on the wrong thing now and then having to rehash it all again next time.

It’s not changes in auditwheel that make the PEP out of date. It’s changes in reality: Fedora makes a new release, now the PEP is wrong. It’s true that auditwheel has been better at keeping up with reality than our PEPs, but the more fundamental problem is that this isn’t something we can control with a PEP any more than King Canute could control the tide.

So I’ve explained above why calling this a “standard document” doesn’t make sense – the word for a document that tries to describe reality but could be wrong is just “documentation”.

And even if we ignore the past PEPs, the documentation on what makes a portable Linux wheel is already way better than the documentation for what makes a portable Windows or macOS wheel. (For example: Windows and macOS don’t have anything like the JSON document at all!) So I know this isn’t what you intend, but I think the effect is you’re saying “it’s not good enough if Linux’s documentation is merely 10x better than the other platforms we support; if you want us to support new Linux wheels then you have to make the documentation 100x better.” This feels unfair and arbitrary.

(And yes, Windows and macOS have single corporations behind them. That doesn’t change the fact that making portable wheels for Windows/macOS is still a completely unspecified black art.)

dstufft · June 23, 2019, 4:18am

It shouldn’t be true that auditwheel is better at keeping up with reality though is my point. Something we’ve been pretty good about now in pip is that when the standards aren’t giving us the behavior we want anymore, is that we insist on updating those standards first, then we can bring that new behavior into pip. If that’s not what has been happening with auditwheel that is undesirable IMO and isn’t the model we should be looking to further along.

Quite frankly, the sort of hand wavyness around it makes me even more nervous to be OK with the perennial manylinux spec, because according to you the auditwheel project has been effectively, IMO, taking shortcuts with specified behavior.

njs:

So I’ve explained above why calling this a “standard document” doesn’t make sense – the word for a document that tries to describe reality but could be wrong is just “documentation”.

And even if we ignore the past PEPs, the documentation on what makes a portable Linux wheel is already way better than the documentation for what makes a portable Windows or macOS wheel. (For example: Windows and macOS don’t have anything like the JSON document at all!) So I know this isn’t what you intend, but I think the effect is you’re saying “it’s not good enough if Linux’s documentation is merely 10x better than the other platforms we support; if you want us to support new Linux wheels then you have to make the documentation 100x better.” This feels unfair and arbitrary.

(And yes, Windows and macOS have single corporations behind them. That doesn’t change the fact that making portable wheels for Windows/macOS is still a completely unspecified black art.)

The difference isn’t that macOS and Windows has a single corporation behind them, it’s that the standard for what you can expect is external to the packaging toolchain. The Windows platform defines what you can expect and what you can reasonably link to, not us. Hypothetically with a generic linux wheel the same is true, except the only thing you can rely on is the kernel and nothing else, which isn’t very useful so we’ve defined our own platform ontop of that which attempts to be the subset of a number of different platforms all at once.

The reason why there needs to be a standards document, is because we’re defining that platform, nobody else is, we are. We can try to hand wave around and say that we’re just documenting reality, but the truth is we have to make judgement calls about what can be regularly linked and what can’t be. Just because the platform we’re defining is a subset of a number of other platforms doesn’t make it any less a platform in it’s own right, and we can’t expect the only definition of what that platform is to exist in one particular project’s code.

So no, I’m not saying that it needs to be 100x better than windows and macOS, I’m saying that if you’re going to define a platform, then you need to actually document that platform, we don’t define windows or macOS so that responsibility doesn’t fall on us.

njs · June 24, 2019, 12:18am

For sure, if we’re going to have a standard then we ought to keep it updated. But there shouldn’t be a standard describing the details of auditwheel’s checks, just like there’s no standard defining, I don’t know, the details of warehouse’s internal SQL schema. Of course documentation is good to maintain if and when we can, but there’s no reason to shame people for not keeping the warehouse SQL schema PEP up to date, because there’s no reason for that PEP to exist in the first place. Plus, during the period we’re talking about, we simply didn’t have the volunteer labor to maintain auditwheel well, so trying to make the PEP work better was in direct conflict with doing things that actually help users…

My argument isn’t “the PEPs aren’t always well-maintained, therefore we should get rid of them”. It’s “There’s no principled reason for having PEPs that go into the details of which exact checks auditwheel does, AND to further support that conclusion, we’ve run the accidental experiment of having inaccurate PEPs and nothing bad happened, AND we’ve found empirically that keeping the PEPs up to date is a significant drain on volunteer resources, therefore we should get rid of them”.

Also just, think about it as an engineer: when you have two sources-of-truth and are trying to manually keep them in sync and they still get out of sync, then the solution isn’t to blame people for not trying hard enough. The solution is to stop relying on humans to manually sync the two sources of truth.

Everyone agrees that we need to define the platform for platform tags. The platforms are:

win32: promises that this wheel works on recent-ish 32-bit Windows x86, where “recent-ish” is vague but maybe something like what it says in PEP 11
macosx_10_9_intel: promises that this wheel works on macOS systems, version 10.9 or greater, using either 32-bit x86 or 64-bit x86-64.
manylinux_2_14_x86_64: promises that this wheel works on Linux distros using glibc 2.14 or greater running on x86-64

So perennial manylinux actually has a more detailed platform specification than we have for Windows.

That’s not the disagreement. The disagreement is whether we also have a responsibility to go beyond that, and exhaustively document or standardize the behavior of “Linux distros using glibc 2.14 or greater running on x86-64”. Given that we have no control over what these distros do, trying to standardize their behavior would be a meaningless exercise in bureaucracy. Trying to document it is more useful, to help package maintainers figure out how to actually generate wheels – but that’s equally true for all three platforms; there’s nothing special about Linux there.

takluyver · June 24, 2019, 2:36pm

I feel like positions are becoming entrenched, so let’s see if I can move things forwards.

I think we all agree that something needs to provide more specific guidance for wheel creators about compatibility than “Linux distros with glibc >= 2.x”. At a minimum, we want auditwheel to do this in the form of “here’s a wheel, is it compatible?”

Is it useful on top of this to have the rules auditwheel uses documented somewhere, whether we call this a specification or something else? It seems that the answer should be a clear yes, so long as it’s not an undue drain on resources and we’re confident we can keep it up to date.

So, can we keep it up to date without expending lots of effort on it? For the library and symbol lists in auditwheel’s JSON profiles, I imagine it wouldn’t be hard to autogenerate some human-readable documentation, so there’s a single source of truth (this is an idea @zwol proposed on the perennial PEP PR). Other specific rules encoded in auditwheel’s code are hopefully changing slowly enough that we can easily maintain written details by hand.

Would people be happy with partially-generated documentation living at e.g. https://packaging.python.org/specifications/ ? This would, I think, involve less effort than writing a PEP for each profile.

zwol · June 24, 2019, 4:48pm

For the record, autogenerated human-readable documentation was not my idea, it was @dstufft’s originally, in this very thread. I just happen to agree that it would help.

I’m going to copy and paste here a bunch more of what I said in the perennial-manylinux PEP PR last Wednesday:

it seems to me that the remaining gap between proponents of [perennial] and proponents of manylinux2014 is mostly about documentation.

When people who are not comfortable with perennial say things like “…you can never definitively know that a wheel is manylinux X compatible, only that it satisfies the current recommendations. I’m not entirely comfortable with this” or “there’s no document I can refer to which lets me check if my system is manylinux X compliant, and no set of checks I can run to do so”, these appear to me to be requests for documentation.

Most of the pushback from the perennial side seems to be on the grounds that the current process for documenting what “manylinux X” means is burdensome, both because the PEP process is too heavyweight, and because it involves a bunch of work repeated in three places—the PEP, the auditwheel profile, and the blessed build environment. Their proposed alternative seems to be to abandon the idea altogether of writing a human-readable spec for each manylinux rev. And I can see where they’re coming from, but I can also see how this makes people uncomfortable.

…

[I like the idea of autogenerated human-readable versions of the auditwheel profiles because] The existence of these generated docs would reassure people that they don’t have to dig through the entire auditwheel codebase just to find the profile. It would also be a visible place for the rationale for each aspect of the profile to be documented, and thus to encourage auditwheel devs to write down those rationales. The “10% quirky stuff that involves code changes” (from the same post) could be dealt with by presenting the actual code as part of the generated docs: “A manylinux2020 system must have properties A, B, and C. To determine whether it has these properties, call these Python functions: [source code]”.

[Also,] I think it would be valuable if [the perennial] PEP went into some detail about the process of developing a new manylinux rev. It doesn’t have to be perfect, but it should be comprehensive enough that someone who hasn’t done it before could imagine themselves going through the process, guided by the PEP + existing auditwheel documentation (which the PEP would reference). For instance, there should be a list of decisions that need to be made and criteria for each, guidelines for deciding which Linux distributions should be examined and from how long ago, guidelines for deciding what C and C++ compilers to put into the blessed build image, that sort of thing.

gunan · July 2, 2019, 6:27am

Just read the perennial manylinux PEP. Before I go into my concerns on that, after the requests of
@brettcannon and @sumanah , I will try to create a more specific timeline for us.
Looks like Centos 6 is EOL on November 30th 2020: https://wiki.centos.org/About/Product
Looking at our work on manylinux 2010, many roadblocks we have been facing, preparing toolchains has proved to be around almost a quarters work. Adding the overhead of our big organization, to get projects funded and people working on it, I would say we need the standard implemented by August 2020.

While I like the general outline of the perennial manylinux proposal, I still feel like there are a few gray areas in it. I have commented on one, regarding “any real-world linux environment”, another one is how to avoid bloat. Another one already being discussed is about C++ ABIs.

Such concerns are why I put my support behind manylinux2014 now. I think for the long term perennial manylinux is the way to go, but there are quite a few things to consider when trying to support the vast landscape of linux distributions. And I am nervous that we may not get thing right in haste.
But again, if we think we can get the PEP draft ready this month, and also address the documentation issues as @zwol pointed out, I would be OK with going with the perennial manylinux proposal.

brettcannon · July 9, 2019, 9:59pm

Two questions:

Who are we waiting on for what at this point?
Does Libcrypt.so.1 removal in Fedora 30 impacting manylinux* builds change anything?

takluyver · July 10, 2019, 9:42am

My last post aimed for a compromise that other people have suggested in the past: don’t require a PEP to define each new manylinux profile, but still document the details somewhere (probably on https://packaging.python.org).

No-one has objected to this, so perhaps that means people are broadly OK with it? But I’d like to see if both camps - perennial manylinux supporters and those with reservations about the vagueness - are actually on board before moving forwards with that.

I broadly agree with @njs that it shouldn’t be significantly more work to define the first profile in a perennial scheme than to define manylinux2014 with a new PEP. And we could be just a couple of compromises away from an agreed way to do perennial manylinux. But on the other hand, we’ve been in this holding pattern for weeks, so I have sympathy if someone makes a decision to press ahead with manylinux2014.

This looks like a concrete example of the sort of change @njs has described. A library which you could once assume was in all mainstream Linux distros no longer is. How do we adapt to this?

I think in practice what we’ll do is fairly uncontroversial: change auditwheel and the build images so that new manylinux1 & manylinux2010 wheels do not link against an external libcrypt.so.1. But presumably we won’t try to invalidate existing wheels on PyPI, because that would cause chaos. So affected packages will have to make a new release to pick up the fix.

Does this mean we’re changing the definition of those manylinux flavours? Or is the definition ‘all mainstream Linux distros since X’, and we’re changing guidelines for how to achieve that? I don’t think the terminology matters much, so long as the definition/guidelines are documented somewhere.

pf_moore · July 10, 2019, 11:24am

Ultimately, me. I’ve been away for a couple of weeks, and was hoping for more discussion while I was away than actually happened. So I’d like to push things to a conclusion now, if at all possible.

I’m broadly OK with it. But I’m concerned that no-one is responding to the suggestion, and without more details it would be very easy for the suggested “partially-generated documentation” to be unhelpful in practice (I have strong reservations about machine generated docs). I’d like to see examples of the sort of docs you envisage. I’d also prefer it if someone other than me were to comment. I’m assuming that @dstufft is probably also “broadly OK” with your proposal, but he’s not commented yet, so maybe he’s waiting for a bit more detail?

Right now, my biggest problem is that we appear to no longer have a champion for manylinux2014. @dustin seems to have gone silent - I know he was away for a while, but I believe he should be around now, and he’s not commenting on this discussion at all. That leaves me with a dilemma - I can’t sign off on manylinux2014 without an assurance that someone is going to move it forward.

… and I think that both manylinux2014 and perennial manylinux would in practice do exactly the same thing (change the build images, live with the breakage to already-published wheels). The key point here is that both proposals have the same logic for how installers actually detect the platform - it’s glibc with the _manylinux module for finer control. The difference is essentially only in how the impact of this change to Fedora affects the documentation (and in consequence, how we think through the implications when we update the specs).

The real questions here are:

Do we want to have to make installer changes every time a new definition is released? (Ideally no, but in practice the installer changes are trivial and far from being the key stumbling block). Score a point for perennial manylinux, but it’s a minor one.
Do we want to have our assumptions of what’s part of “the manylinux platform” documented? I’m really struggling to understand why the answer here would ever be “no”. Surely having libcrypt mentioned in the manylinux2010 docs makes it easier to explain why Fedora’s change is an issue? I don’t see how anyone could take the position that reading the auditwheel code should be the only way for a non-expert like me to understand the relevance of libcrypt.

It’s worth noting that in this debate, nobody has argued that the exact specs given in manylinux2014 are wrong. That’s either because they aren’t - and in that case why not just say "perennial manylinux accepts those specs as the basis for glibc_2.17 - or because they are, but we’re spending all of our time debating how we record the specs rather than confirming what they actually are!

(Interestingly, manylinux2014 allows libcrypt - that probably needs fixing, as I assume there’s no way we’d want to accept a PEP that explicitly documents that Fedora 30 won’t be manylinux2014 compatible…)

So to summarise (and yes, this is a revision of my previous thoughts):

I’m not willing to sign off on manylinux2014 without an active champion for the PEP (and that champion should also update the PEP to reflect the libcrypt situation).
I am willing to sign off on perennial manylinux, but only if we have a clear commitment on how the “platform” will be documented - and that should come with an actual example of the docs that would be published for at least one version of the tags (ideally the one that corresponds to manylinux2014, to make the comparison easier, but I’d be OK with the equivalent of manylinux2010, as the auditwheel code for that one is already written).

Please note - I don’t want to have another cycle of the “Linux is being held to higher standards than Windows or Mac” discussion. I’ve thought a lot about this, and I don’t believe the situations are comparable (if we were defining “ubuntu_18_10”, “fedora30” and “centos7” tags, I’d agree - but because we made the decision to work with a common denominator “manylinux” abstract platform, we took the responsibility for at least minimally defining what we mean by that - this is the point that @dstufft was trying to make, and which I agree with).

Also, in case it isn’t clear, I am not insisting on any sort of onerous process for signing off on documentation changes. I don’t personally think that documenting (uncontroversial) things in a PEP is a difficult process (and I’ll personally commit to quickly turning around approvals on any uncontroversial “this is a change to the spec for manylinux_XXXX” proposals, if that helps to reduce people’s fear of PEP bureaucracy) but if people prefer an alternative way to store and control the specs, I’m fine with that too.

On timescales:

I’m waiting on an updated perennial manylinux spec, that addresses the question of how the definition of the platform will be provided in human readable form. If we can’t produce that by the end of July, I’m going to conclude that the whole issue is too controversial for a quick resolution, and I’ll go with manylinux2014. (Note for @takluyver, or whoever produces such a proposal - can it be posted in the form of an actual document that people can refer to and link to as a whole, not as a link to a github PR or similar?)

However, if no-one updates manylinux2014 to remove libcrypt, or posts a discussion here explaining why that isn’t needed, I’ll have to consider manylinux2014 as “unmaintained”, at which point we will have no viable proposal and we’ll have to accept a delay. (Again, can the revised manylinux2014 proposal be posted as a link to a proper document, please?)

Thanks to everyone for continuing to engage on this. IMO there are very few differences between the two proposals in how they’d actually be implemented, so hopefully getting the administrative side sorted out can be resolved, so that people can get on with the real work of writing the code.

gunan · July 10, 2019, 8:05pm

As far as I know, @dustin was going on vacation for a short while. He may be back. But he mentioned @sumanah agreed to help push it forward? And I am still willing to help from TF side.
@dustin, @sumanah please correct me if I am wrong.

sumanah · July 10, 2019, 10:08pm

I am not a champion for either perennial or manylinux2014; I am impartial between them.

Per that approval, I have been having conversations and asking questions and nudging folks to help make progress on both PEPs, both privately and on the TensorFlow Build SIG calls and mailing list. After @pf_moore’s decision, I plan to aid with project management to help move the implementation forward (basically making and leading a checklist like this one) whether we choose perennial or manylinux2014, but I’m pretty sure @dustin would be the technical lead if we choose manylinux2014.

dustin · July 10, 2019, 10:09pm

Hello, I’m here, just very busy.

I am still willing to champion manylinux2014. I’ve removed libcrypt.so.1 as a valid external library to link against in the current proposal, and added a note about it.

I’ve started this process by turning my branch into a PR: https://github.com/python/peps/pull/1121

I’m still a little hung up on this. From the point of view of a package maintainer:

manylinux2014 means that I have exactly one more build environment to target for each architecture: manylinux2014
perennial manylinux means that I have the possibility of M different build environments to target now, for each version of glibc (or whatever maps to a version of glibc)

This seems like a lot to expect from package maintainers. Or is the idea to only select one “valid” heuristic/tag out of all “possible” heuristic/tags? (And if so, who determines what the “valid” heuristic/tag is, and where does this logic live?)

pf_moore · July 10, 2019, 10:40pm

Thanks for responding - glad to get your perspective here.

Cool, that’s excellent. Sorry if I seemed to be pushing (the usual “we’re all volunteers” applies here, and you should do what you have time for, not feel pressured).

I thought the idea was that package maintainers could just target the oldest version that suits them (so, for example, manylinux1 wheels will work on a manylinux2010 system, etc). So there’s no need to upload one wheel per environment. But if I’m wrong about that, then I agree, perennial manylinux seems to be much more of a “moving target” for package maintainers - at the very least, I think the PEP should clarify how new target environments would get announced and publicised, so that maintainers know they have work to do.

njs · July 10, 2019, 10:58pm

I think the way it would work in practice is:

With manylinux2014: package maintainers have a whole set of build environments to choose between – manylinux1, manylinux2010, manylinux2014, and more to come in the future. But average package maintainers don’t start at the PEP index and then derive how how to build wheels from first principles :-). They read a tutorial, or ask a friend, and get directed to whatever pre-packaged build environment makes sense. You can see this happening right now with the manylinux1/manylinux2010 transition – maintainers are making their own judgements about when to switch their packages over, and talking about it with each other. We can and should summarize the trade-offs and make recommendations (probably on packaging.python.org?), but there’s no need for a central commission to issue edicts about which one to use.

With perennial manylinux: it’s very similar. In principle they could target any version at all – whoa, that’s a lot! But in practice average package maintainers aren’t creating new build environments from first principles; there will be a small set of pre-packaged build environments maintained by the community (in fact: exactly the same environments that we’d maintain in the manylinux2014 approach!), and maintainers will pick one by reading a tutorial or asking a friend or looking at their favorite project. And we’ll still want to put up something on packaging.python.org laying out the options and the trade-offs and making recommendations. I think for the average maintainer, the experience will be very similar regardless of which approach we pick.

The perennial version does allow for more flexibility: if someone is really eager to jump to a newer version ahead of the rest of the community, and willing and able to create their own build environment, they can do that. (Like Google did for manylinux2010, just they didn’t have any valid way to tag the resulting builds!) If, I dunno, the s390x community decides that because of some peculiarities in their distro landscape, it makes sense to collectively recommend a different build environment version than what we use on x86-64, then that’s possible. But those aren’t average package maintainers :-).

pf_moore · July 11, 2019, 6:18am

This discussion made me realise that there’s a definite gap in my understanding here. I’ve been working from the perspective of a Windows user, and on Windows, building binaries for a (simple!) C extension is easy:

Get the compiler toolchain installed
Run pip wheel .

That’s it. You get a wheel tagged for your CPython version, and platform win32 or win_amd64 depending on whether you used 32-bit or 64-bit Python. You obviously need a 64-bit OS to build 64-bit binaries, but you can build 32-bit binaries on any platform (just use 32-bit Python).

But I honestly have no idea how you’d build such a binary for Linux (or, for that matter, for MacOS). It looks like the process might be the same for MacOS, with the wheel being tagged with a load of versions (presumably your actual version and earlier ones, so you should probably make sure to use a recent MacOS version if you want to distribute your wheel)…?

While I appreciate that a lot of the issues with manylinux come from more “advanced” uses, I think it would help to consider this really basic use case. How does a non-expert user package a simple C extension (with no library dependencies at all - basically “Hello world”) for distribution on Linux? And critically, how do the two proposals differ?

At the moment, I would look at manylinux, and realise that I needed to decide between manylinux1 and manylinux2010. With no other knowledge to work on, I’d pick manylinux2010 (as it’s “newer”) and then go looking for a “How to build a manylinux2010 wheel” document. Googling for that phrase was basically useless, though

While I don’t want to divert the discussion by getting into the difficulties of building on Linux, I do think that the unsophisticated user who “just wants to distribute some binaries for Linux” is an important use case to consider. And for such a user, the question “which manylinux version should I build for” is critical.

Perennial manylinux seems to be addressing that issue by saying “it’s all just manylinux, so there’s no question to answer”, but that’s only a reasonable approach if it isn’t just delaying the question until the user finds out that they need to ash “so which manylinux profile should I build for?”… Conversely, it seems to me that manylinux2014 isn’t addressing the question at all, it’s just “not making the issue any worse” in the sense that choosing between 3 versions is no harder than choosing between two in practice (oldest, newest or “you know what you are doing” are the only 3 real options )

takluyver · July 11, 2019, 7:06am

I don’t think perennial manylinux makes a big difference here. There will be different flavours of manylinux in a chronological sequence either way. The differences are:

There isn’t a PEP for each flavour. This doesn’t affect the average person trying to build a package, who doesn’t want to go and read PEPs anyway. There will still be a list somewhere of which flavours are easy to build on provided docker images.
Flavours are identified by a glibc version rather than a year. Perhaps this is a bit less clear, but there’s still a clear way to tell which is more recent, and I think the year is already somewhat confusing for people not familiar with the topic - i.e. manylinux2010 was defined in 2018.