Why is manylinux2014 able to use GCC 10 with a much older libstdc++?

I would like to understand how the manylinux2014 image works.

  1. PEP 599, which describes manylinux2014, expects the official docker image to use gcc 4.8.5, and allows a shared library inside a wheel to depend on versioned symbols with at most GLIBCXX_3.4.19 and GCC_4.8.0.
  2. In reality, however, the official manylinux2014 docker image uses gcc 10.2.1. Features like concepts or if constexpr or std::optional work without any issues. But it links C++ code against /lib64/libstdc++.so.6, which is a symlink to /lib64/libstdc++.so.6.0.19, which corresponds to gcc 4.8.3. I’ve also looked at strings /lib64/libstdc++.so.6 | grep GLIBCXX and, indeed, the largest version I see is 3.4.19, as the PEP requires. For gcc 10.2.1 I would expect the versions to go to somewhere around 3.4.28.

My understanding is that libstdc++ and gcc versions are tightly coupled and one can’t simply link against an old libstdc++ with a compiler that is six years younger. So why does it work and what had to be done to achieve that?

They install the Red Hat Developer Toolset which provides recent gcc+collateral for old GLIBCs. (I’m sure there was a reference to it in the manylinux repo’s README somewhere but I can’t find it.)

4 Likes

Yup, the vast majority of the hard work on this is done by Red Hat. It’s necessary for them because over the lifetime of their RHEL distribution, the original compilers become more and more unusable towards the end of the lifecycle, and so the devtoolset backports keep it feasible to build up-to-date software on quite-old OSes.

This is also the reason why essentially all manylinux versions were based on different RHEL versions (and their derivatives like originally CentOS, now Alma, Rocky, etc.): the devtoolset fills the exact same need for people needing to compile & bundle up-to-date Python packages. The one time manylinx provided a Debian-based version, people soon complained about the very outdated compilers.

More context can be found in the discussion for the tracking issue of manylinux_2_28.

4 Likes

The relevant source code for the backporting work involved in the Red Hat Developer Toolset can be seen here: Explore - AlmaLinux OS Foundation Git Server

For gcc, there’s quite a few patches, although most of the patches are pretty short: rpms/gcc-toolset-14-gcc - gcc-toolset-14-gcc - AlmaLinux OS Foundation Git Server

I was looking into this recently because I’ve been getting interested in the possibility of creating full-source bootstrapped versions of the manylinux images by replacing almalinux/alpine with a bootstrapped distro, maybe Guix or StageX. (Manylinux project uses almalinux for glibc-manylinux and alpine for musl-musllinux.)

1 Like

Thank you for the answers. The only approach to providing a new C++ compiler on an old system that I was aware of was to install a newer libstdc++ as well (or a newer libc++), like pkgsrc and macports do. I didn’t realize that patching gcc like that was actually feasible.

@tabbyrobin what was your conclusion? Can such an image be created?

(Note: I’m new to this forum, should I split this into a separate topic?)

@Zabolekar My very initial research suggests it’s worth investigating further. No idea how difficult it might end up being.

Guix apparently provides a make-gcc-toolchain function, which can be used to create a gcc toolchain compiled against a theoretically-arbitrary glibc: https://stackoverflow.com/questions/66063337/build-against-an-old-glibc-with-guix

However, I imagine that if you go back too far, you will get compiler errors – of course, that’s where the RH patches come in. I haven’t tested any of this yet.

I did talk with the StageX devs on their Matrix channel. StageX project is in the early stages, so a lot of things would need to be done to make it work, but they expressed they would potentially be interested in accommodating this use case.

Note: While both distros technically offer both libc’s, Guix is glibc-focused and StageX is musl-focused.

I guess one thing to be decided when pursuing this is whether to structure it around forwardporting or backporting. The RHEL devtoolset patches are structured around backporting gcc etc to old distro versions.

Conceivably one could also forwardport old glibc to new distro revisions. But I’m not sure if that would provide the right guarantees/properties for manylinux.

Side thought: The success of the manylinux base-distro swap-out could be initially assessed by rebuilding wheels for a set of projects that already support bit-for-bit reproducible builds…

No, it wouldn’t. You need to rely on the nothing more than the set of symbols provided by glibc 2.x (in manylinux_2_x), and their exact signature. Compilation against a newer glibc with newer symbols will generally end up using those newer symbols (from the POV of the compiler: “hey this symbol is available in the library, I can refer to it”), but that will fail as soon as you try to run the resulting artefact on a linux with an older glibc.

That’s why manylinux is structured the way it’s structured. To be (maximally) compatible with a broad range of linux versions and distributions, the only way to do that reliably is to compile against some oldest glibc version (which is encoded in the manylinux version)[1], which is discretized by RHEL versions.


  1. plus some more details (support libraries etc.). ↩︎

1 Like

Maybe I misunderstand what you mean, but wouldn’t old glibc (and software linked against it) simply work with a newer kernel without anyone having to port or even recompile it, and isn’t that more or less what happens when we run the manylinux image?

Right, that would just work. Maybe I didn’t explain myself clearly, or maybe I’m misunderstanding something. My point was the distinction between:

  1. everything on the system is old, except gcc is new (backporting gcc)
  2. everything on the system is new, except glibc is old (“forwardporting glibc”)

So in both cases: It’s always new gcc, and old glibc. And the whole system is always compiled against old glibc. But, say, curl – is it old curl or new curl?

Another way to put it is: do we backport “only gcc et al.” to old glibc, or do we backport “everything else” to old glibc?

The upside of “forwardporting glibc” is… you potentially get security and feature updates.

The downside is that you’d have to patch a lot more than just gcc, and of course you have to recompile the entire system. You couldn’t just use an old frozen image.

So based on what @h-vetinari said, after reflecting, I think “backporting everything to old glibc” is probably just a bad idea and completely unmaintainable. So, please disregard my whole “forwardporting” idea.

And I think maybe @h-vetinari what you are also saying is that you need not just glibc to be old, you also need other relevant libraries to be old?

I think I’m making this more complicated than it is. The maintainable way is to backport just the minimum necessary. For good reason, RHEL backports just select things, including critical security updates, and gcc etc.

For reference for anyone interested in this swap-out idea, the almalinux:9 image that manylinux uses is defined here: container-images/Containerfiles/9/Containerfile.default at main · AlmaLinux/container-images · GitHub

And manylinux makes use of it here: manylinux/docker/Dockerfile at main · pypa/manylinux · GitHub

See PEP 513 which defines manylinux.

1 Like

Actually, perhaps I misunderstood what you meant at first, because in a way, this can work. In conda-forge we’re actually doing something along those line – our images are based on Alma 9 (glibc 2.34), but we’re compiling against a CentOS 7 era glibc 2.17. Obviously, we cannot touch the glibc of the image itself, but we can provide a separate “sysroot” with glibc 2.17 (and a few other key pieces).

The one reason that this is not a generally applicable solution is that it needs very careful control of the compiler setup, such that it will actually use your custom sysroot, rather than the default glibc on the system. This would very likely be infeasible for broad (uncoordinated) usage through manylinux, unless manylinux were willing to build and maintain somewhat fragile compiler wrappers (e.g. pointing gcc to a script that adds the right flags).

Here’s a write-up about how this roughly works in conda-forge.

2 Likes

Perhaps, but your entire system (including critical binaries and libraries without which nothing will work) is linked against the newer glibc, so you must be very careful as @h-vetinari already pointed out.

It’s certainly possible to do in a maintainable way. With pkgsrc, for example, you can install gcc 14 and curl 8.12.1 on Debian Jessie with glibc 2.19 in a way that doesn’t interfere with its native gcc 4.9.2 and curl 7.38.0. See below for a Dockerfile so you can try and reproduce it; be warned that it may need a few hours of time and a few gigabytes of space.

This is useful if you want a recent version of some program and upgrading the whole system is impossible or risky. What it isn’t useful for is creating portable Python wheels: C++ code compiled with gcc from pkgsrc will be linked against original libc and libm, which is what we need, but also against libstdc++ and libgcc_s from pkgsrc, which the user likely won’t have.

The promised Dockerfile:

FROM debian/eol:jessie

# Install GCC 4.9
RUN apt-get update && apt-get upgrade -y && apt-get install g++ -y

# Install old curl, install pkgsrc 2025Q1.
RUN apt-get install curl xz-utils -y && \
	curl -LO https://cdn.netbsd.org/pub/pkgsrc/pkgsrc-2025Q1/pkgsrc.tar.gz && \
	tar xzf pkgsrc.tar.gz && rm pkgsrc.tar.gz
WORKDIR /pkgsrc/bootstrap
RUN ./bootstrap --prefix /opt/pkg-2025Q1 --prefer-pkgsrc yes --make-jobs $(nproc) && \
	ln -s /opt/pkg-2025Q1/bin/bmake /usr/bin/bmake

# Install new Curl with pkgsrc.
WORKDIR /pkgsrc/www/curl
RUN bmake install clean clean-depends

# Install GCC 14.2 with pkgsrc.
WORKDIR /pkgsrc/lang/gcc14
RUN bmake install clean clean-depends && \
	ln -s /opt/pkg-2025Q1/gcc14/bin/gcc /usr/bin/gcc-14 && \
	ln -s /opt/pkg-2025Q1/gcc14/bin/g++ /usr/bin/g++-14

WORKDIR /

In this case you might as well use conda-forge and its gcc or clang package, which will spare you the hours of manual compilation. It will give also you access to many modern third-party C/C++ libraries compiled with the given toolchain.

1 Like