The next manylinux specification

Has anyone proposed a topic for the packaging summit on policing (and preventing) this sort of invalid uploading?

I don’t think so but I’ll put it up.

manylinux2014_2_12_x86_64 (= installs on systems with glibc 2.12+, backed by a manylinux2014 spec)

This is really the worst of both worlds IMO. The goal isn’t to let people use arbitrary glibc versions; the goal is to let us migrate to new broadly-useful manylinux profiles without having to write new PEPs or add more hard-coded logic to pip, like how it works on Windows/macOS. We don’t care about the glibc version per se. We only care about it because it’s (1) a convenient shorthand to refer to a given “era” of linux distros, (2) it’s one that pip can mechanically compute on any system


I actually see a lot of merit in this idea, since there are actually multiple different things we want to version:

  • our “Linux ABI identification heuristics” scheme
  • within each heuristic, the tuning parameters that an installer needs to adjust based on the target system

Sticking with CalVer for the first purpose then provides guidance to heuristic designers on which distros they need to consider in their designs within the overall framework, and potentially allows installers to skip checking some heuristics because the current distro is too old for the entire framework.

We get to skip over that distinction for Windows and Mac OS X because the heuristic framework is linked to CPython versions rather than needing to be defined independently.

So under that interpretation, the next manylinux spec would still be manylinux2014 (so we can ignore everything older than Ubuntu 14.04 and CentOS 7 when defining build profiles), but it would also introduce the notion of tunable heuristics such that manylinux2014 build profiles targeting newer distros can be added to packaging.python.org without requiring installer updates to make them usable by publishers.

And then in relation to the “named installer heuristics not full ABI constraints” aspect, it might be worth spending an extra few characters per filename to add a qualifier like “for_” before the heuristic name.

That would give tags like: manylinux2014_for_glibc_2_12_x86_64.

We’d then hope that all future updates would just be to add either new build profiles or new install heuristics to the manylinux2014 framework, but if it was determined that an entirely new structure was needed, then we’d pick a new baseline year and use that as the name for the next iteration.

I think you’re overengineering this Nick :-). The string “manylinux” has always meant “glibc-based linux distros using a glibc-version based heuristic”. In the unlikely event that we need to switch to a different heuristic, we can just invent a new string. Call it mostlinux or something. There’s no need to reserve version space for this a priori. And your proposed distinctions are way too subtle for average users to understand. (I’ve already seen folks confused about how something called manylinux2010 could possibly be relevant today.)

I’m not worried about folks that haven’t read the specs at all being able to guess what tags mean - I’m worried about folks that have at least glanced at them being able to remember what they mean.

The folks saying that “manylinux_glibc_2_12” looks like it is expressing a direct dependency on a specific version of glibc are raising a valid concern, so the question is what to do about it.

Spelling it manylinux<spec version>_for_<build profile> instead will align the tagging scheme with the way the target compatibility definition process actually works:

  • there’s a manylinux PEP that defines the overall framework (called 2014 to avoid switching to yet another numbering scheme)
  • there’s a build profile that defines which ABIs should be available (manylinux1 and manylinux2010 each only have one build profile, whereas manylinux2014 would allow for more than one)
  • build profiles are described by an installer heuristic name and a set of heuristic parameters that installers can use to decide if a particular system satisfies the tag

Sure, we could say “There will never be another framework revision”, drop the “2014” qualifier, and instead rely solely on the manylinux prefix, but I think the distinction between “manylinux” and “manylinux1” is too blurry for that to be a good idea when keeping the CalVer numbering scheme is cheap.

1 Like

Yeah, I agree:

I don’t think it’s very blurry… it’s very easy to explain:

  • manylinux_$X_$PLATFORM will run on any distro using glibc 2.$X or greater.
  • manylinux1_$PLATFORM is a deprecated alias for manylinux_5_$PLATFORM
  • manylinux2010_$PLATFORM is a deprecated alias for manylinux_12_$PLATFORM

I agree the manylinux1 bit is a bit awkward looking, but a year from now we’ll have forgotten it ever existed.

BTW, why 2014? If it’s naming the heuristic, shouldn’t it should be 2019, or maybe 2016 (= the date on PEP 513)?

If it’s called manylinux2014, then CalVer would still be being used the same way it is used in manylinux2010: specifying the rough age of the distros that can be targeted.

Reviewing your alias examples though, I’m wondering if it might be enough to replace my “YEAR_for_” idea with the string “bp” for “build profile”:

  • manylinux_bp_glibc_2_5 (previously manylinux1)
  • manylinux_bp_glibc_2_12 (previously manylinux2010)

The idea there would be to provide an explicit reminder that the “glibc” here is identifying a full build profile, not just the individual library.

while this might sound nice to some, the package owners doing this are our users and the users of those “invalid” binary packages are our users, preventing them from doing what they’ve already decided allows them to get work done won’t make friends unless a right way to do it is also widely available at the same time.

2 Likes

That’s why I suggested it for discussion - it needs to be talked through, so that people get a good feel for the issues. Glib solutions like “we should block such uploads” (or equally “we should do nothing”) aren’t necessarily correct. It’s not something we can or should decide on lightly, but the current situation is causing harm to other users of ours (people who try to install such a package and get a broken system), and we need to consider them as well.

This should probably go over in Run auditwheel on new manylinux uploads, reject if it fails · Issue #5420 · pypi/warehouse · GitHub. But notice that one of the reasons that issue was opened was literally the creator of PyTorch asking us to enforce harsher restrictions on PyTorch so that he could use that as leverage with his vendors.

2 Likes

Hi there. That was me.

If we want fast movement on the community playing by the rules, then yes I believe we should enforce the rules. Things will definitely be chaotic for a bit, but it will be for the better.

If you see the manylinux2010 spec being finally executed, and for example https://github.com/google/or-tools/issues/1218#issuecomment-487083206 , it’s not a surprise. As package publishers, we are trying to do what will work with the least amount of friction for the maximum number of users – within the constraints that were given to us. The key here is “the constraints that were given to us”.

w.r.t. the issue above, it is practically true and of great friction that a lot of distros are on an older pip version and a decent percentage of users wont be able to upgrade to a pip that can recognize manylinux2010. But if it’s a hard constraint, the best we can tell users is to upgrade their pip (somehow), switch to anaconda, or simply do a pip install -f direct_url – and ask internally for more engineering share towards helping with moving the packaging conversations faster. If it’s a soft constraint, the internal requests for help remains an “aspirational priority” and not “house is on fire”.

That being said, @dustin seems to be on a great momentum trying to move things faster than I honestly expected and the TensorFlow SIG seems to have reasonably significant engineering behind it, and maybe we will get something out of this that will move quickly.

3 Likes

There already is. If I understand the disruption correctly, only Warehouse would block the upload You can still upload whatever you want with custom indexes (which are easy to host), and pip would happily download and install distributions.

3 Likes

However, you still shouldn’t lie, really. Pip does not prioritise any index over another, so if you distribute a “fake” manylinux1 wheel on your local index, but there’s a genuine manylinux1 wheel on PyPI, you have no way of which one pip will choose (because it shouldn’t matter).

It seems to me that there may be a case for a specialised “best” tag, that front ends should pick in preference over any generic tag. This tag would only be intended for use in private indexes where the user deliberately chooses to use it via --extra-index-url. Would something like this be a helpful addition to the standard? IIRC, this is more or less how the general “linux” tag works, so maybe it’s possible already.

2 Likes

Is manylinux2010 sufficient for the projects currently faking manylinux1 wheels once newer versions of pip are sufficiently widespread? Or do they need something further?

In light of the concerns about how quickly people update pip, I’ll have a go at writing the perennial manylinux PEP to improve this for the future, unless anyone else is already doing it. I’ll probably need help from other people who understand the details better, but if I can start a skeleton, I hope that will move things forward.

I have put a first draft of some text for the PEP in a pull request on the manylinux repository:

The branch is in that repository, so those who have push access are free to change it directly without waiting for me.

I just wanted to chime in and say “manythanks” for getting the x86_64 Docker image out! I just made a release with it https://pypi.org/project/coincurve/#files

It went quite smoothly, however I don’t think I can require my downstream users to upgrade pip just yet (plus there is no i686 builder currently) so I’m shipping all 3 now. Maybe next year!

Also, whatever happens with the next spec, pretty please have a goal to support musl/Alpine. Containerization is becoming more and more prevalent in my work and also side projects, and it’s quite excruciating to use Python on Alpine with the extra setup & build time of dependencies.

I agree 100%. I don’t know whether it needs to be in a “standard”, but explicitly saying somewhere that this is okay/expected/desirable will help with discussions (for example, I’ve spent far too much time convincing colleagues that a private index implementation should not enforce the same restrictions as PyPI just because PyPI does, but since the only documented point of view is in support of those restrictions, they have the docs on their side).

It would also require some ability for tools to pick up an explicit preferred tag. I’ve been advocating for such a tag for a while now - something that a distro can easily embed in their Python site so that the tools will automatically get the best build for that platform.

The most “standard” I think this needs is specifying that the platform/distro defines the tag, and that packagers should not define tags and expect users to move to match them (packages should probably use package names, custom indexes, or extras for handling platform differences).

This came up in a thread on distutils-sig recently. From what I understood, it’s not entirely straightforward to design and implement something similar for musl. And we don’t have many people with the skills and the time to do this kind of work - that’s why the manylinux2010 image took so long, and why there’s only a 64-bit image for it.

So for the time being, the ‘perennial manylinux’ proposal is only about glibc-based systems. We have a reasonable idea how to proceed there, and trying to make it broader might well mean the whole effort goes nowhere at all.

Other kinds of Linux wheels - based around other libc implementations, or distro specific wheels tagged for e.g. Alpine - would need someone to drive a push for them, probably starting with a separate PEP.

2 Likes

Things seem to have gone quiet here. Since @takluyver posted his draft perennial manylinux proposal (here) there’s not been much further discussion.

I’m happy to let the discussion take its own pace, but is there anything needed to keep it moving? If someone can just confirm for me - is the expectation now that the next standard will be (some form of) perennial manylinux proposal, or is there still potential for a manylinux2014 proposal to compete with it?

More particularly, is anything waiting on me?

I have a draft manylinux2014 proposal that I’ve been working with the Tensorflow/IBM/Anaconda/Red Hat folks on. We are wrapping it up now and plan to share it here on Monday.

3 Likes