PEP 600: Future 'manylinux' Platform Tags for Portable Linux Built Distributions

Here’s a thread for PEP 600, the “perennial manylinux” proposal. This has already had extensive discussion in the other thread, but that thread became super unwieldy and ended up with multiple topics mixed together, so @pf_moore suggested we should make a new one focused just on this PEP.

The current draft has been extensively reworked since @takluyver’s original draft, and I think addresses all the questions and concerns that were raised. As far as I know this is ready to be accepted. So if you have any new questions or concerns please say so!

The source text is available in the PEPs repository [here](https://github.com/python/peps/blob/master/pep-0600.rst], the official rendered version is here, and I’ve also pasted the current text below to save you a click and for easy quoting.


Abstract

This PEP proposes a scheme for new ‘manylinux’ wheel tags to be defined without requiring a PEP for every specific tag, similar to how Windows and macOS tags already work. This will allow package maintainers to take advantage of new tags more quickly, while making better use of limited volunteer time.

Non-goals include: handling non-glibc-based platforms; integrating with external package managers or handling external dependencies such as CUDA; making manylinux tags more sophisticated than their Windows/macOS equivalents; doing anything besides taking our existing tried-and-tested approach and streamlining it. These are important issues and other PEPs may address them in the future, but for this PEP they’re out of scope.

Rationale

Python users appreciate it when PyPI has pre-compiled packages for their platform, because it makes installation fast and simple. But distributing pre-compiled binaries on Linux is challenging because of the diversity of Linux-based platforms. For example, Debian, Android, and Alpine all use the Linux kernel, but with radically different userspace libraries, which makes it difficult or impossible to create a single wheel that works on all three. This complexity has caused many previous discussions of Linux wheels to stall out.

The “manylinux” project succeeded by adopting a strategy of ruthless pragmatism. We chose a large but tractable set of Linux platforms – specifically, mainstream glibc-based distributions like Debian, OpenSuSE, Ubuntu, RHEL, etc. – and then we did whatever it takes to make wheels that work across all these platforms.

This approach requires many compromises. Manylinux wheels can only rely on a external libraries that maintain a consistent ABI and are universally available across all these distributions, which in practice restricts them to a small set of core libraries like glibc and a few others. Wheels have to be built on carefully-chosen platforms of the oldest possible vintage, using a Python that is itself built in a carefully-chosen configuration. Other shared library dependencies have to be bundled into the wheel, which requires a complex process to avoid collisions between unrelated wheels. And finally, the details of these requirements change over time, as new distro versions are released, and old ones fall out of use.

It turns out that these requirements are not too onerous: they’re essentially equivalent to what you have to do to ship Windows or macOS wheels, and the manylinux approach has achieved substantial uptake among both package maintainers and end-users. But any manylinux PEP needs some way to address these complexities.

In previous manylinux PEPs (PEP 513, PEP 571), we’ve done this by attempting to write down in the PEP the exact set of libraries, symbol versions, Python configuration, etc. that we believed would lead to wheels that work on all mainstream glibc-based Linux systems. But this created several problems:

First, PEPs are generally supposed to be normative references: if software doesn’t match the PEP, then we fix the software. But in this case, the PEPs are attempting to describe Linux distributions, which are a moving target, and do not consider our PEPs to constrain their behavior. This means that we’ve been taking on an unbounded commitment to keep updating every manylinux PEP whenever the Linux distro landscape changes. This is a substantial commitment for unfunded volunteers to take on, and it’s not clear that this work produces value for our users.

And second, every time we move manylinux forward to a newer range of supported platforms, or add support for a new architecture, we have to go through a fairly elaborate process: writing a new PEP, updating the PyPI and pip codebases to recognize the new tag, waiting for the new pip to percolate to users, etc. None of this happens on Windows/macOS; it’s only a tax on Linux maintainers. This slows deployment of new manylinux versions, and consumes part of our community’s limited PEP review bandwidth, thus slowing progress of the Python packaging ecosystem as a whole. This is especially problematic for less-popular architectures, who have less volunteer resources to overcome these barriers.

How can we fix it?

A manylinux PEP has to address three main audiences:

  • Package installers , like pip, need to be able to determine which wheel tags are compatible with the system they find themselves running on. This requires some automated process to introspect the system and match it up with wheel tags.
  • Package indexes , like PyPI, need to be able to validate which wheel tags are valid. Generally, this just requires something like a list of valid tags, or regex they match, with no need to know anything about the actual semantics for individual tags. (But see the discussion of upload verification below.)
  • Package maintainers need to be able to build wheels that meet the requirements for a given wheel tag.

Here’s the key insight behind this new PEP: it’s crucial that different package installers and package indexes all agree on which manylinux tags are valid and which systems they install on, so we need a PEP to specify these – but, these are straightforward, and don’t really change between manylinux versions. The complicated part that keeps changing is the process of actually building the wheels – but, if there are multiple competing build environments, it doesn’t matter whether they use exactly the same rules as each other, as long as they all produce wheels that work on end-user systems. Therefore, we don’t need an interoperability standard for building wheels, so we don’t need to write the details into a PEP.

To further convince ourselves that this approach will work, let’s look again at how we handle wheels on Windows and macOS: the PEPs describe which tags are valid, and which systems they’re supposed to work on, but not how to actually build wheels for those platforms. And in practice, if you want to distribute Windows or macOS wheels, you might have to jump through some complicated and poorly documented hoops in order to bundle dependencies, target the right range of OS versions, etc. But the system works, and the way to improve it is to write better docs and build better tooling; no-one thinks that the way to make Windows wheels work better is to publish a PEP describing which symbols we think Microsoft should be including in their libraries and how their linker ought to work. This PEP extends that philosophy to manylinux as well.

Specification

Core definition

Tags using the new scheme will look like:

manylinux_2_17_x86_64

Or more generally:

manylinux_{GLIBCMAJOR}_{GLIBCMINOR}_${ARCH}

This tag is a promise: the wheel’s creator promises that the wheel will work on any mainstream Linux distro that uses glibc version {GLIBCMAJOR}.{GLIBCMINOR} or later, and where the ${ARCH} matches the return value from distutils.util.get_platform(). (For more detail about architecture tags, see PEP 425.)

If a user installs this wheel into an environment that matches these requirements and it doesn’t work, then that wheel does not comply with this specification. This should be considered a bug in the wheel, and it’s the wheel creator’s responsibility to look for a fix (possibly with the help of the broader community).

The word “mainstream” is intentionally somewhat vague, and should be interpreted expansively. The goal is to rule out weird homebrew Linux systems; generally any distro you’ve actually heard of should be considered “mainstream”. We also provide a way for maintainers of “weird” distros to manually override this check, though based on experience with previous manylinux PEPs, we don’t expect this feature to see much use.

And finally, compliant wheels are required to “play well with others”, i.e., installing a manylinux wheel must not cause other unrelated packages to break.

Any method of producing wheels which meets these criteria is acceptable. However, in practice we expect that the auditwheel project will maintain an up-to-date set of tools and build images for producing manylinux wheels, and that most maintainers will want to use those. For the latest information on building manylinux wheels, including recommendations about which build images to use, see https://packaging.python.org.

Since these requirements are fairly high-level, here are some examples of how they play out in specific situations:

Example: if a wheel is tagged as manylinux_2_17_x86_64, but it uses symbols that were only added in glibc 2.18, then that wheel won’t work on systems with glibc 2.17. Therefore, we can conclude that this wheel is in violation of this specification.

Example: Until ~2017, all major Linux distros included libncursesw.so.5 as part of their default install. Until that date, a wheel that linked to libncursesw.so.5 was compliant with this specification. Then, distros started switching to ncurses 6, which has a different name and incompatible ABI, and stopped installing libncursesw.so.5 by default. So after that date, a wheel that links to libncursesw.so.5 was no longer compliant with this specification.

Example: The Linux ELF linker places all shared library SONAMEs into a single process-global namespace. If independent wheels used the same SONAME for their bundled libraries, they might end up colliding and using the wrong library version, which would violate the “play well with others” rule. Therefore, this specification requires that wheels use globally-unique names for all bundled libraries. (Auditwheel currently accomplishes this by renaming all bundled libraries to include a globally-unique hash.)

Example: we’ve observed certain wheels using C++ in ways that interfere with other packages via an unclear mechanism. This is also a violation of the “play well with others” rule, so those wheels aren’t compliant with this specification.

Example: The imaginary architecture LEG v7 has both big-endian and little-endian variants. Big-endian binaries require a big-endian system, and little-endian binaries require a little-endian system. But unfortunately, it’s discovered that due to a bug in PEP 425, both variants use the same architecture tag, legv7. This makes it impossible to create a compliant manylinux_2_17_legv7 wheel: no matter what we do, it will crash on some user’s systems. So, we write a new PEP defining architecture tags legv7le and legv7be; now we can ship manylinux LEG v7 wheels.

Example: There’s also a LEG v8. It also has big-endian and little-endian variants. But fortunately, it turns out that PEP 425 already does the right thing LEG v8, so LEG v8 enthusiasts can start shipping manylinux_2_17_legv8le and manylinux_2_17_legv8be wheels immediately once this PEP is implemented, even though the authors of this PEP don’t know anything at all about LEG v8.

Legacy manylinux tags

The existing manylinux tags are redefined as aliases for new-style tags:

  • manylinux1_x86_64 is now an alias for manylinux_2_5_x86_64
  • manylinux1_i686 is now an alias for manylinux_2_5_i686
  • manylinux2010_x86_64 is now an alias for manylinux_2_12_x86_64
  • manylinux2010_i686 is now an alias for manylinux_2_12_i686

This redefinition is largely a no-op, but does affect a few things:

  • Previously, we had an open-ended and growing commitment to keep updating every manylinux PEP whenever a new Linux distro was released, for the rest of time. By making this PEP normative for the older tags, that obligation goes away.
  • The “play well with others” rule was always intended, but previous PEPs didn’t state it explicitly; now it’s explicit.
  • Previous PEPs assumed that glibc 3.x might be incompatible with glibc 2.x, so we checked for compatibility between a system and a tag using logic like:

sys_major == tag_major and sys_minor >= tag_minor

Recently the glibc maintainers advised us that we should assume that glibc will maintain backwards-compatibility indefinitely, even if they bump the major version number. So the new check for compatibility is:

(sys_major, sys_minor) >= (tag_major, tag_minor)

Package installers

Generally, package installers should install manylinux wheels on systems that have an appropriate glibc and architecture, and not otherwise. If there are multiple compatible manylinux wheels available, then the wheel with the highest glibc version should be preferred, in order to take advantage of newer compilers and glibc features.

In addition, we follow previous specifications, and allow for Python distributors to manually override this check by adding a _manylinux module to their standard library. If this package is importable, and if it defines a function called manylinux_compatible, then package installers should call this function, passing in the major version, minor version, and architecture from the manylinux tag, and it will either return a boolean saying whether wheels with the given tag should be considered compatible with the current system, or else None to indicate that the default logic should be used.

For compatibility with previous specifications, if the tag is manylinux1 or manylinux_2_5 exactly, then we also check the module for a boolean attribute manylinux1_compatible, and if the tag version is manylinux2010 or manylinux_2_12 exactly, then we also check the module for a boolean attribute manylinux2010_compatible. If both the new and old attributes are defined, then manylinux_compatible takes precedence.

Here’s some example code. You don’t have to actually use this code, but you can use it for reference if you have questions about the exact semantics:

LEGACY_ALIASES = { “manylinux1_x86_64”: “manylinux_2_5_x86_64”, “manylinux1_i686”: “manylinux_2_5_i686”, “manylinux2010_x86_64”: “manylinux_2_12_x86_64”, “manylinux2010_i686”: “manylinux_2_12_i686”, } def manylinux_tag_is_compatible_with_this_system(tag): # Normalize and parse the tag tag = LEGACY_ALIASES.get(tag, tag) m = re.match(“manylinux_([0-9]+)([0-9]+)(.*)”, tag) if not m: return False tag_major_str, tag_minor_str, tag_arch = m.groups() tag_major = int(tag_major_str) tag_minor = int(tag_minor_str) # Check for manual override try: import _manylinux except ImportError: pass else: if hasattr(_manylinux, “manylinux_compatible”): result = _manylinux.manylinux_compatible( tag_major, tag_minor, tag_arch, ) if result is not None: return bool(result) else: if (tag_major, tag_minor) == (2, 5): if hasattr(_manylinux, “manylinux1_compatible”): return bool(_manylinux.manylinux1_compatible) if (tag_major, tag_minor) == (2, 12): if hasattr(_manylinux, “manylinux2010_compatible”): return bool(_manylinux.manylinux2010_compatible) # Fall back on autodetection. See the pip source code for # ideas on how to implement the helper functions. if not system_uses_glibc(): return False sys_major, sys_minor = get_system_glibc_version() sys_arch = get_system_arch() return (sys_major, sys_minor) >= (tag_major, tag_minor) and sys_arch == tag_arch

Package indexes

The exact set of wheel tags accepted by PyPI, or any package index, is a policy question, and up to the maintainers of that index. But, we recommend that package indexes accept any wheels whose platform tag matches the following regexes:

  • manylinux1_(x86_64|i686)
  • manylinux2010_(x86_64|i686)
  • manylinux_[0-9]+[0-9]+(.*)

Package indexes may impose additional requirements; for example, they might audit uploaded wheels and reject those that contain known problems, such as a manylinux_2_17 wheel that references symbols from later glibc versions, or dependencies on external libraries that are known not to exist on all systems. Or a package index might decide to be conservative and reject wheels tagged manylinux_2_999, on the grounds that no-one knows what the Linux distro landscape will look like when glibc 2.999 is released. We leave the details of any such checks to the discretion of the package index maintainers.

Rejected alternatives

Continuing the manylinux20XX series : As discussed above, this leads to much more effort-intensive, slower, and more complex rollouts of new versions. And while there are two places where it seems at first to have some compensating benefits, if you look more closely this turns out not to be the case.

First, this forces us to produce human-readable descriptions of how Linux distros work, in the text of the PEP. But this is less valuable than it might seem at first, and can actually be handled better by the new “perennial” approach anyway.

If you’re trying to build wheels, the main thing you need is a tutorial on how to use the build images and tooling around them. If you’re trying to add support for a new build profile or create a competitor to auditwheel, then your best resources will be the auditwheel source code and issue tracker, which are always going to be more detailed, precise, and reliable than a summary spec written in English and without tests. Documentation like the old manylinux20XX PEPs does add value! But in both cases, it’s primarily as a secondary reference to provide overview and context.

And furthermore, the PEP process is poorly suited to maintaining this kind of reference documentation – there’s a reason we don’t keep the pip user manual in the PEPs repository! The auditwheel maintainers are the best situated to understand what kinds of documentation are useful to their users, and to maintain that documentation over time. For example, there’s substantial overlap between the different manylinux versions, and the PEP process currently forces us to handle this by copy-pasting everything between a growing list of documents; instead, the auditwheel maintainers might choose to factor out the common parts into a single piece of shared documentation.

A related concern was that with the perennial approach, it may become harder for package maintainers to decide which build profile to target: instead of having to pick between manylinux1, manylinux2010, manylinux2014, …, they now have a wider array of options like manylinux_2_5, manylinux_2_6, …, manylinux_2_20, … But again, we don’t believe this will be a problem in practice. In either system, most package maintainers won’t be starting by reading PEPs and trying to implement them from scratch. If you’re a particularly expert and ambitious package maintainer who needs to target a new version or new architecture, the perennial approach gives you additional flexibility. But for regular everyday maintainers, we expect they’ll start from a tutorial like packaging.python.org, and by choosing from existing build images. A tutorial can just as easily recommend manylinux_2_17 as it can recommend manylinux2014, and we expect the actual set of pre-provided build images to be identical in both cases. And again, by maintaining this documentation in the right place, instead of trying to do it PEPs repository, we expect that we’ll end up with documentation that’s higher-quality and more fitted to purpose.

Finally, some participants have pointed out that it’s very nice to be able to look at a wheel and tell definitively whether it meets the requirements of the spec. With the new “perennial” approach, we can never say with 100% certainty that a wheel does meet the spec, because that depends on the Linux distros. As engineers we have a well-justified dislike for that kind of uncertainty.

However: as demonstrated by the examples above, we can still tell definitively when a wheel doesn’t meet the spec, which turns out to be what’s important in practice. And, in practice, with the manylinux20XX approach, whenever distros change, we actually change the spec; it takes a bit longer. So even if a wheel was compliant today, it might be become non-compliant tomorrow. This is frustrating, but unfortunately this uncertainty is unavoidable if what you care about is distributing working wheels to users.

So even on these points where the old approach initially seems to have advantages, we expect the new approach to actually do as well or better.

Switching to perennial tags, but continuing to write a PEP for each version : This was proposed as a kind of hybrid, to try to get some of the advantages of the perennial tagging system – like easier rollouts of new versions – while keeping the advantages of the manylinux20XX scheme, like forcing us to write documentation about Linux distros, simplifying options for package maintainers, and being able to definitively tell when a wheel meets the spec. But as discussed above, on a closer look, it turns out that these advantages are largely illusory. And this also inherits significant dis advantages from the manylinux20XX scheme, like creating indefinite obligations to update a growing list of copy-pasted PEPs.

Making auditwheel normative : Another possibility that was considered was to make auditwheel the normative reference on the definition of manylinux, i.e., a wheel would be compliant if and only if auditwheel check completed without errors. This was rejected because the point of packaging PEPs is to define interoperability between tools, not to bless specific tools.

Adding extra words to the tag string : Another proposal we considered was to add extra words to the wheel tag, e.g. manylinux_glibc_2_17 instead of manylinux_2_17. The motivation would be to leave the door open to other kinds of versioning heuristics in the future – for example, we could have manylinux_glibc_$VERSION and manylinux_alpine_$VERSION.

But “manylinux” has always been a synonym for “broad compatibility with mainstream glibc-based distros”; reusing it for unrelated build profiles like alpine is more confusing than helpful. Also, some early reviewers who aren’t steeped in the details of packaging found the word glibc actively misleading, jumping to the conclusion that it meant they needed a system with exactly that glibc version. And tags like manylinux_$VERSION and alpine_$VERSION also have the advantages of parsimony and directness. So we’ll go with that.

Source: https://github.com/python/peps/blob/master/pep-0600.rst

2 Likes

I think the bit that concerns people the most is that there wouldn’t be a defined spec to judge compatible wheels against. I’ve struggled with this as well - moving away from rigid specifications to something vaguer seems like a step backwards. So I’ll try to describe why I’ve come to think it’s reasonable. I don’t claim I can explain this better than @njs, but maybe different explanations work for different people.

It seems like, at the moment, we can clearly say that a wheel is or isn’t manylinux2010 compatible. We have a spec (PEP 571), and auditwheel aims to implement that spec. But the spec had to change once Fedora 30 dropped libcrypt.so.1, making previously valid manylinux2010 wheels now invalid. This wasn’t just a mistake in the spec, or a one-off special case: the spec is trying to describe what’s compatible with all Linux distros since 2010, and that necessarily changes as new releases occur. So the spec is not consistent through time, which seems like a problem.

There are two ways we could go with this:

  1. Get stricter: version each spec, so you can say the wheel complies with manylinux2010 spec A, but not manylinux 2010 spec B. This might be reasonable if it was just one spec, but we’ve already got three, and there are details that differ on different CPU architectures (example), so having multiple versions of each spec would get confusing, and there’s no obvious benefit except that you can say unambiguously what a wheel complies with.
  2. Accept that the ‘spec’ is not really a spec. It’s more like guidelines: a wheel which does this is, to the best of our knowledge, compatible with all mainstream glibc-based Linux distros with glibc 2.x or newer.

PEP 600 takes option 2, of course. Once you start thinking about the lists of libraries and symbol versions as evolving guidelines, rather than a specification, then it becomes pretty clear that PEPs are a terrible place to maintain them. They’re still important, and they’ll still need to be described somewhere, but there are better ways to do that than in PEPs.

Part of this concern has been that in the absence of formal specifications, we’d have an implementation-defined specification in auditwheel. I’d certainly have limited patience with anyone who was uploading manylinux wheels which auditwheel rejected. But this too seems less of a problem with a subtle shift in thinking: a good result from auditwheel means “no compatibility issues were found with current knowledge”, not “this wheel is compatible”.

3 Likes

Another example that came up recently is an amendment to the manylinux2010 spec to cover the fact that the symbols exported under GCC_4_3_0 on x86_64 systems are exported under GCC_4_4_0 (and in one case, GCC_4_5_0) in i686 CentOS 6 builds.

I haven’t re-read the entire original thread, but my recollection is that I had two main concerns:

  • the cryptic nature of the numbers in the wheel filenames. On that front, I think the comparison to Windows and macOS wheels is a fair point. In the last thread, adding the extra words also seemed to cause more confusion than it solved, as the words prompted people to guess what the following numbers meant, and those guesses were often wrong. With only the numbers, folks that want to know what they mean are going to have to look at the documentation (or other online information), and that can convey more information than a single word in a filename. So I’m now happy to endorse the proposal to just use the bare numbers with no new text around them.
  • the potential usability issues of omitting the user friendly aliases for auditwheel build environments. I’ve since realised that even if the PEP doesn’t specify that auditwheel should continue offering calendar-based aliases for future target environments, there’s nothing preventing the auditwheel developers from continuing to create them as a developer experience enhancement. So I’d like to see a slight shift in the tone of that section of the PEP to make it clear that it would be entirely acceptable for tool developers to define aliases for future manylinux iterations in addition to the legacy ones, it’s just that for the legacy profiles, the alias should be used in the emitted wheel archive names, whereas for future profiles, any such aliases would be purely local to the tools defining them.

Beyond that, I think the only actual required change is to extend the legacy alias list and the subsequent section to cover manylinux2014 as a third legacy alias (probably without listing all the architectures though, since it supported several more than manylinux1 and manylinux2010 did).

(Thanks for working through this Nathaniel, and for seeking to move the process change forward well in advance of the definition of an Ubuntu 18.04/RHEL 8/Debian 10 era baseline)

1 Like

Thanks Thomas, that did help clarify for me.

I’ve realised that this is the bit that I struggle with. Coming from Windows, that feels to me like a terrible user experience - I upgrade my OS in place, and wheels that worked no longer do? Or I build a new PC with a later version of my OS, and wheels that claim to work on that OS fail to do so? That to me seems like such an unacceptable situation that I can’t understand why Linux users accept it. But I have to accept that it’s the norm for Linux, and not let my perceptions based on different experiences mean that I hold this PEP to standards that the intended users view as too strict. (That’s not intended as any sort of “my OS is better than yours” comment - if I failed to avoid it sounding like it was, my apologies). I’ll keep that in mind from now on.

OK, with that aside, and having re-read the previous thread and the spec, I have the following comments. I hope most of these are minor, but I’ve learned not to try to guess what will be controversial in this topic :slight_smile:

Documentation

I still feel that the PEP should say that the rules that define the various manylinux profiles should be documented in human-readable form somewhere. Call them auditwheel specs, or guidelines, or whatever you want, but a number of people have expressed the point that they want to be able to read the definitions. The PEP explicitly says it doesn’t intend to make auditwheel normative, but without any rules, the PEP is essentially meaningless, so I think it’s reasonable to expect that some rules should be documented, if only as a baseline for others. And realistically, the auditwheel rules are the best we have in terms of “our current understanding”.

I’m perfectly OK with not being any more formal than “the definitions should be documented”. No need for processes around updating, claims that specs and implementations will never get out of sync, anything like that. Simply an acknowledgement that if people want to know, they should be able find out without deciphering code. I’m assuming that if the PEP makes this statement, the developers will make a good faith attempt to conform to it - I’m not going to get hung up over the possibility of someone rules-lawyering their way out of providing something acceptable. I find it hard to believe that something at that minimal a level would be controversial.

(This is notwithstanding @takluyver’s explanation - I still believe that even though he successfully argues that the spec is “just our current knowledge”, that doesn’t mean that we shouldn’t document what “our current knowledge” is).

Previous manylinux PEPs

It seems that PEP 600 supersedes and obsoletes the previous manylinux PEPs, as noted here. I think it would be worth explicitly stating that the existing PEPs will be marked as obsolete, and no longer maintained. It’s a key point of PEP 600 that the work involved in maintaining those old PEPs will stop, so making that explicit would be helpful.

This ties into the documentation point, as a natural question will be “so where do we go now to get the information that used to be in those PEPs?”

Compatibility checks

The example manylinux_tag_is_compatible_with_this_system function doesn’t seem to match up with the reality of how pip (and the explanation in PEP 425) works, which is to generate a list of tags supported by the current platform. As PEP 600 defines an unbounded set of tags, this is problematic, and I’d like to see more details on how this would be implemented in practice. A link to a reference implementation for pip would be a reasonable option here (and very much in line with how other PEPs handle implementation questions).

My main concern here isn’t with how the implementation would work in detail, it’s simply about whether the proposal is practical to implement - as there’s no point in accepting a PEP that can’t actually be implemented. Note: This is probably my biggest technical concern with the PEP as it stands.

Tensorflow crashes

I’m going to leave this to others to debate, specifically for the reasons I gave above - personally, I consider having a spec that lets this sort of crash happen to be unacceptable, but I’m not in a position to judge (much less dictate) what is acceptable to the Linux community.

Unless someone comes up with specific objections, and a plan for resolving them, I’m going to assume that the PEP’s statement, which is effectively that wheels causing such crashes are non-compliant in principle, even though we don’t have a check that allows us to detect them or prevent them being installed, is sufficient.

I would like this to not end up being a decision based on “who shouts the loudest”, so I’ll be looking for how many people express an opinion, and what concrete suggestions are made, much more than simply repeated assertions that it’s a significant issue.

Other user comments

Finally, a couple of comments that were made by others in the previous thread.

I’m not convinced that PEP 600 needs to cover this. None of the previous manylinux PEPs discussed migration strategies, and while PEP 600 does have a wider scope in terms of covering “all future manylinux tags”, that doesn’t (to me) immediately imply that migration processes come into scope.

Migration is an important process to consider, but I don’t actually think it’s something that belongs in a PEP. But even if it is, a separate process PEP “migration between manylinux versions” seems more appropriate here.

There were a number of other comments about going beyond glibc-based Linux (musl, for example) and about tying in additional constraints like the C++ toolchain. I’m going to declare these out of scope. Yes, they do mean that PEP 600 won’t be the last ever PEP on Linux compatibility tags, and it is certainly possible that future PEPs may even make PEP 600 completely obsolete (if we discover that we cannot ignore the C++ toolchain, for example) but I don’t think we should block progress now by trying to anticipate every possibility that the future can bring.

Summary

Overall, I think we’re nearly there. There’s a few things to address, but not much that seems controversial (I hope!) I’d like to see a few more responses here, just to ensure we get as wide a consensus as we can (given the specialist nature of the PEP) but at some point we do have to accept that anyone who wants a say has had their chance.

Thanks, @njs, for persisting with this proposal - I know it’s been frustrating at times, but I think it’s coming together.

I totally agree that the things auditwheel checks should be documented, but does this belong in a PEP? If it doesn’t make someone responsible for making and updating documentation, what’s the point in putting it there? We all agree that documentation is good! :wink:

I’m not against putting such a sentence in, if @njs is happy with it. It just seems like it doesn’t matter.

Obviously no-one thinks a crash is acceptable. But, as far as I know, no-one has figured out the cause. Perennial or not, we can’t write any kind of spec that forbids mysterious crashes - except for the vague “play nicely with other packages” wording that Nathaniel already wrote. We could hold up the spec until someone works it out, but that seems counterproductive.

This sort of sentence does come across as a bit “My OS is better than yours”, to me. It’s not like Linux is the only platform where software has ever mysteriously crashed. :slightly_smiling_face:

Yeah, sorry - that part was written in haste, without thinking it through as much as I should have. All I was trying to get at there was that I don’t feel that it’s up to me to judge whether the PEP’s stance on the crashes was acceptable.

Oh absolutely! But Windows is a much more constrained environment for wheels - basically you have to use a specific version of MSVC to build with, so all this toolchain interoperability complexity gets avoided. Hence my lack of relevant experience with the sorts of crashes we’re talking about.

Agreed. And actually, the more I think about it, the more I’m of the opinion that unless someone finds the precise issue here and can define a constraint that avoids it, or there’s a strong consensus for something much more strict (like, for example, all manylinux wheels must be built with the official docker images - basically the equivalent of what we have implicitly on Windows) then trying to pin down anything more than “play nicely” is unlikely to be much help.

Do note that there aren’t a ton of projects that try to pull off what we do with manylinux wheels (at least that I’m aware of). Most Linux distros do their own packaging and recompile everything for each release (hence the request to allow for distro-specific wheels on PyPI or any specific distro-specific support). Or they ship just the source and build at install time. For me the easiest way to understand this is think of every Linux distro as their own OS that happens to share some common base code that’s common enough that we can squint enough to treat them as the same OS.

Easiest thing would probably be to modify https://github.com/pypa/packaging/blob/master/packaging/tags.py.

2 Likes

Let me jump in here with a different hat on than usual: I happen to be one of the people responsible for libxcrypt, the library that Fedora replaced libcrypt.so.1 with, so I know in detail what changed in Fedora 30 and why. It is not the norm for shared library maintenance on Linux, and I don’t think we need to spend a lot of time worrying about it.

libcrypt has a misleading name. It doesn’t do general purpose cryptography, it only supplies a set of functions for one-way hashing of passwords. It used to be a component of GNU libc, but GNU libc’s infamously slow and conservative development process meant that it wasn’t keeping up with advances in the cryptographic state of the art for password hashing. So, a couple years ago, Björn Esser and I put together a drop-in replacement library to be maintained as a separate project with a more agile process. We went out of our way to ensure it really was a drop-in; the default configuration builds a libcrypt.so.1 that is binary backward compatible with the glibc component it replaces (but provides more modern hashing algorithms). We expected Linux distributions to stop installing libcrypt.so.1 from glibc, start installing libcrypt.so.1 from libxcrypt, and leave it at that.

However. Due to a historical quirk (dating at least as far back as System V Release 3), glibc’s libcrypt.so.1 also supplies a set of functions that can be used for encryption and decryption of single blocks of data with the DES block cipher. Because it’s DES, which has been breakable by brute force since the 1990s, these functions should never be used in modern code. So I put in a configure knob that lets you exclude those functions from the library at build time – which makes the library not be binary backward compatible with glibc anymore, so its installed name becomes libcrypt.so.2 instead of libcrypt.so.1. I expected this knob to be used only by niche distributions with a strong security focus and no binary backward compatibility guarantees to worry about, not mainstream distributions like Fedora. I don’t know the details of their decision making; I imagine it was something along the lines of “well, any program that actually uses those functions must be insecure, so let’s make them not start and then we can find them all more easily and fix them.” This is not a normal circumstance and I don’t know of any reason why something similar would happen with any of the other libraries that the manylinux specs say not to bundle.

Fedora 30 does have an RPM package you can install that puts libcrypt.so.1 back on the system – the whole point of changing the number at the end of the name is that both versions can coexist. I think there was a plan to document somewhere that you can make old wheels work again by installing that package.

1 Like

I agree, we don’t. I’m sorry for making the comment I did - it was ill-considered and didn’t explain the point I intended it to explain.

Can we please not let this thread get side-tracked as a result of my comment? I appreciate the explanation of the background (which is very interesting in terms of explaining how some of the things that happen with Linux come about) but I’d rather we kept the focus on PEP 600.

Apologies again for the distraction.

In an effort to bring this thread back on track, let me copy and paste the important part of what I just posted as a reply to @steve.dower over on the other thread:

I claim we won’t have enough information to judge whether the perennial proposal makes senseshould be accepted until both of the following are true:

  • A supermajority of the daily connections to production PyPI, by pip running on Linux, are a version of pip that understands the manylinux2014 tag
  • A supermajority of the wheels on production PyPI that contain compiled code for Linux have had an upload, with a “final release” version number, that was compiled in the manylinux2014 build environment

Once those are both true, we will need to canvass the community of people who have built wheels in the manylinux2014 build environment, and the community of people who have downloaded wheels for use on Linux, to find out whether there were any unexpected problems arising from the transition that need addressing by changes to the perennial process. (Probably we’ll hear about some problems in the form of bug reports on pip , auditwheel , the build environment, and specific compiled-code packages, but I don’t think we’ll discover all of the problems if we don’t do some outreach.)

What part of the proposal doesn’t make sense until those conditions apply?

We don’t know! That’s my entire concern right there. We do not know what problems we might not even have thought of, and the only way to find out is to wait and watch the manylinux2014 transition happen. It’s a case of “unknown unknowns”.

That is kind of an epistemological problem. You can’t know what you don’t know that you don’t know, so of course you can’t plan for it. Paul also isn’t [a deity], so does not have perfect future knowledge. This is kind of part of the human condition, and we probably shouldn’t try to correct for it in a PEP. :wink:

If all reasonably knowable problems are accounted for and addressed - be it by making a decision to mitigate the problem, or by declaring the problem one not trying to be solved by this pep - I don’t see an issue with making a decision on it with the current knowledge.

That’s the threshold you should use when making a decision - reasonable knowledge of the problem.

But what’s the hurry here? Why shouldn’t we postpone the perennial PEP until a point when we have better information?

These are not rhetorical questions. The driver for updates to newer manylinuxes is, as I said in the other thread, that the build environments are based on old versions of CentOS that eventually go out of support. For manylinux2014, that happens in 2024, which is still five years away – so we have at least three years before we need perennial to be done. So waiting to learn more about how the transition to manylinux2014 goes, before we finalize the plan to transition away from manylinux2014, seems like the obvious right choice to me.

I didn’t say there was a hurry, I just said there is no issue with making a decision with the current level of knowledge. Delaying just for more information is unreasonable - the problems for Linux package distribution are well known, it is unlikely we will be surprised no matter how long we wait.

I think exactly the opposite is true; I expect the 1->2014 transition to uncover at least two completely unsuspected new problems with packaging binary wheels for Linux. They may be problems that we need to revise the perennial PEP to deal with, or they may not.

What makes you confident that there aren’t any unsuspected problems to discover?

What makes you confident that there are unknown problems? This is our third-and-a-half bite at the apple (I’m counting 2014 as a half bite, since it’s not ready yet), and literally hundreds if not thousands of other projects have been working at the problem for their language/application/os for decades.

Are we going to run into a glitch? of course. No matter what choice we make regarding packaging there is going to be bugs. Are they going to be intractable? almost certainly not. Waiting to figure out what the bugs are before even starting is an unreasonable request.

Am I saying we should have a decision on this right now? Of course not. The unknown unknowns are so unknowable that we won’t even know them when we see them. Delaying for information on unknowable unknown unknowns is inherently unreasonable and irrational.

That said, if you have a sense of where problems might happen that Paul hasn’t addressed, that isn’t an unknowable unknown unknown, that’s a known unknown, and it would help Paul if you told them.

… And now after typing unknowable and unknown so many times, I must go read some Lovecraft.

Thanks everyone for the feedback!

OK.

OK.

Good point.

So it’s probably obvious but just to be clear, the actual implementation in pip is up to the pip maintainers – the code in the PEP is only to illustrate which wheels are supposed to be installable on which system, and any code that ends up doing that is fine. (I do wonder if pip might want to stop generating all the tags at some point, since pep425tags.py is getting pretty convoluted and has accumulated a number of dubious edge cases, as @brettcannon has noted. But that’s a separate issue :-).)

Anyway, it should be possible to generate all the supported manylinux tags using this algorithm:

  • fetch the current glibc version (pip already has code for this)
  • enumerate all the versions between some lower bound (let’s say 2.5 = manylinux1) and the current version. So e.g. if the current glibc is 2.29, we’d enumerate: 2.5, 2.6, 2.7, …, 2.28, 2.29
  • fetch the current platform tag (pip already has code for this), e.g. x86_64
  • use these two pieces of information to generate all the candidate tags: manylinux_2_5_x86_64, manylinux_2_6_x86_64, …, manylinux_2_29_x86_64
  • for each candidate tag, run the “manual override” logic

Comparing this to the text in the PEP, I can see two places where this would break down currently:

  • If we’re running on a hypothetical future system with glibc 3.x installed, then we can’t enumerate all the supported tags without somehow knowing what the maximal glibc 2.x version is. This is kind of an inherent limitation of the “generate all tags” approach. As a hack I’d suggest that if we’re on a glibc 3.x system, then generate all tags up to 2.99, and then 3.0 through 3.x. Since this is just for speculative future-proofing, it’s probably not worth worrying about too much; worst case we’ll just fix things later after the glibc devs actually start making 3.x plans.

  • In the PEP, we currently allow “manual overrides” to declare that systems are compatible with arbitrary manylinux wheels, e.g. a macos-on-ARM system could declare that no really it’s totally compatible with linux-glibc-on-x86-64 wheels. This is kinda silly, and causes problems for the enumeration approach. I edited the PEP to move the manual override checks down below the normal compatibility checks, so that now the manual overrides can only rule out compatibility, not rule it in. That fixes this issue.

    Technically my edit introduces a tiny backwards-compatibility break from how pip works currently. Right now pip only checks the manylinux overrides if the platform is linux_x86_64 or linux_i686, so you can’t declare that a macOS system supports manylinux, or that an ARM system supports manylinux. But before you could declare that a system with an ancient glibc or musl can install recent manylinux wheels, and my updated text prevents this. But this never did anything useful anyway, so I don’t think it matters. In fact, it’s not clear that anyone uses the override system at all, and if they do I’m pretty sure it’s only to disable manylinux wheels entirely (e.g. Nixos used to do this).

The edits I mentioned are here: PEP 600: Small updates in response to Paul's feedback by njsmith · Pull Request #1191 · python/peps · GitHub

This was exactly the problem we had when we were writing the first manylinux spec. Binary compatibility on Linux is a vast unknown! Nobody knows what dragons lurk there! etc. Fortunately that turned out OK.

What makes me confident now is that we’ve shipped more than 3.2 billion manylinux wheels over the last ~3 years. In that time we’ve found tons of edge cases in wheel building that needed fixes in auditwheel or the build image. We’ve found a few edge cases in system detection that needed fixes in pip (two that come to mind: handling 32-bit python running on a 64-bit kernel, and glibc redistributors who append weird text at the end of the glibc version string). We haven’t found a single issue that called into question the basic approach, and PEP 600 only codifies the basic approach, nothing else.

Also, while I get that it’s impossible to prove a negative, we can make probabilistic estimates about negatives, and I’m confused about why you would think this is a particularly risky transition, even if you aren’t familiar with all that detailed history. Fundamentally the only difference from manylinux1 → manylinux2010 → manylinux2014 is dropping support for old platforms. From the perspective of wheel builders, everything that worked in the old specs is still possible – every manylinux1 wheel is also a manylinux2010 wheel. So it’s hard to imagine how the transition could uncover fundamental problems that invalidate what came before, even in principle.

Oh man I wish that were true; I’d get like a year of my life back. The whole scientific Python stack on Windows is totally dependent on convincing GCC and MSVC to play nicely together via obscure black magic. The first Windows wheels for numpy/scipy took substantially more effort than the first Linux wheels, and that’s including “inventing manylinux wheels” as part of the Linux efforts.

And FWIW, Python 3.8 had to break the “stable ABI” on Windows in order to keep up with a Microsoft-driven deprecation, and this broke PyQt’s wheels. If there was a “manywindowsX” PEP we would have had to update it. This stuff happens sometimes. The best thing is accept that and make it as painless as possible to adapt. Which is the goal of PEP 600 :-).

2 Likes

To be clear here, what I’m saying is that as a pip maintainer, I wouldn’t find the definition in the PEP sufficient. I agree with you that generating a list of all supported tags feels like a bad way to check compatibility, but I had that debate with Daniel when he first developed the wheel specs, and he was clear that there were edge cases where generating the tag list was the only way to get the correct order of priority on the candidates. I don’t recall the details now, but the result is that generating the list is the current way of doing things (from my reading of the compatibility tags PEP it may even be required).

Anyway, you sketched out a possible approach, and I’ve flagged my concern. I’m not going to block the PEP on this, but ultimately someone is going to have to develop a PR for pip to implement this, and as long as we’re clear that doing so may be trickier than it first seems, that’s fine.

Ouch, good point. I should have expressed things differently - on Windows, “compatibility” is implementation-defined by the version of MSVC that Python is built with. Which I guess undermines the argument that defining manylinux standards using an implementation defined standard in auditwheel is unreasonable :slightly_frowning_face:

If nothing else comes out of this discussion, it’s that all of this stuff is really hard and there’s a lot of knowledge scattered around in people’s heads that could really do with being captured somewhere, or people will keep reinventing wheels…

Thanks for your patience, I’m happy with your edits to the PEP. I’m now switching back to a “watching the discussion” mode - I don’t have any more points of my own to add.

Is it still the case? My understanding is that the CRT is now binary compatible accross all recent MSVC versions.