Wheels for musl (Alpine)

Maybe for an initial release we should try and stick with 1.1.x (not sure which one to pick, would appreciate input from @ncopa), then? Having manymusl_major_minor_arch like it’s done for glibc doesn’t sound too terrible.

Since musl has already gone through the pain of implementing time64, it would be nice to take advantage of this work, but if using an older musl might enable everything to just work on older alpine versions which are likely to still be in use, it can be a worthy tradeoff. That said, armv7l and i686 (which I forgot to mention at previous comments) are still valuable test platforms, due to all the ARM SBCs out there using 32-bit userland (which is unfortunately already left out of CI testsuites for many projects).

Thanks for the list @enriconr, that’s already a first big step!

At the time of the manylinux2010 saga, the 32bit images were also an issue and were dropped for the initial roll-out. It then took roughly half a year for them to be added.

What I’m trying to say is that it’s probably a much more feasible goal to aim for 64bit only initially (without closing the door to 32bit), and to not block the whole effort on some ancient hardware.

In a similar vein (and in particular with the groundwork of the “perennial” manylinux PEP 600), it’s completely possible to build these images out of order, so a presumptive manymusl_1_2 could come before manymusl_1_1.

I’m thinking though that a putative manymusl proposal might not be usable in the similar forward-compatible manner as manylinux. Here’s a comment from the creator/maintainer of musl about ABI compatibility, and how musl is taking a slightly different path than glibc (ABI stability is guaranteed, but that’s not the same as “guaranteed it will work”)

This is different from the glibc approach, which is to use symbol
versioning to attempt to retain “bug-compatibility” with the version
of glibc the application was linked with. Such a system forces new
application binaries that want to be able to run on systems with old
glibc to link against the old glibc, and thereby get the buggy
behaviors even if they’re running on a system without the bugs. Myself
and most of the musl community I’m aware of consider this entirely
unreasonable, and that’s why musl doesn’t do it.

I should note that this approach is in part what makes musl attractive. Since they are able to avoid the binary bloat that comes with keeping old versions of interfaces around, they remain a very tiny library while still having lots of implemented functionality.

That said, I see what you mean. For an example of what glibc does, one need only look at memcpy(3):

In glibc 2.14, a versioned symbol was added so that old binaries (i.e., those linked against glibc versions earlier than 2.14) employed a memcpy () implementation that safely handles the overlapping buffers case (by providing an “older” memcpy () implementation that was aliased to memmove(3)).

While in this case it is a cheap alias, there are many cases which result in a lot of code duplication. musl has ended up carrying some “bloat” on 32bit systems now, due to the time64 changes, but it is strictly for ABI compat, not for bug compatibility.

Overall, given that musl’s stance on this is part of what brings users to it, I would argue that manymusl should adopt it. Binary wheels are guaranteed to not throw “symbol lookup error” during dynamic linking or have weird ABI incompatibility, but bug compatibility is guaranteed only if you’re running the same musl version as specified in manymusl_major_minor_... (musl versions with 3 fields, actually, so we’d need some other naming).

@ericonr do you happen to have any source or reference for musl’s versioning scheme? I cannot find it in their wiki nor the “Official manual” section.

My reading of Rich’s post is that generally, you can be pretty confident that it works to build against musl version X and run on version X+1, with the only exception being if the library was somehow depending on some bug in version X that got fixed in X+1. I see why Rich emphasizes that exception, because Rich is a very precise guy and musl does put less effort into bug-for-bug compatibility than glibc, but really this is the same caveat that applies to every upgrade process ever, and I don’t think we need to worry about it being a major issue for musl.

The other thing he’s pointing out is that in many cases, you can build against musl X+1 and actually get a binary that runs on version X (!). Unfortunately, this isn’t super useful to us at the spec level: “many cases” is not the same as “all cases”, so we can’t just assume that building on any version of musl will produce a binary that works on all versions of musl. We’ll still need some kind of monotonic “version” counter to use in the wheel tag. (Though auditwheel could potentially be smart enough to detect cases where a binary built against musl X+1 will work on musl X, and tag the wheel appropriately.)

AFAIK, the biggest challenge to doing this is that we don’t yet have a concrete proposal for how to reliably figure out what version of musl is running. The best option I’ve seen is to first manually parse the python binary’s ELF header to see if the string musl appears and what the path to libc.so is, and then invoking libc.so as a child process and parsing the text it prints in the output. This is like… incredibly janky, but probably doable? Also, it will fail if the python binary itself was statically linked – but maybe that’s OK? If python was statically linked against musl, then I think it might not be able to load extension modules that are dynamically linked against musl anyway?

The other option would be to give up on using musl itself as the clock, and instead use the distro. It’s very easy to figure out whether you’re running on alpine and what version you have: just check the standardized os-release file. The downside is that then wheels would be specific to alpine, and wouldn’t automatically install on other musl-based linuxes. The upside would be that wheels could potentially depend on features that alpine provides beyond just musl… though I’m not sure whether there are any, since alpine is so minimalist.

As a general piece of advice: I’d recommend whoever picks this up to pick one of these options and run with it. manylinux happened because we were ruthless about getting something workable into users’ hands, and made whatever compromises we had to to accomplish that. This is the sort of problem where you can spend forever debating and going off on tangents, and it doesn’t help. If you can get something that allows people using the python:alpine docker image to install wheels, then that’s worth shipping, even if it doesn’t solve every other problem.

7 Likes

All versions, from 0.5.0 to 1.2.2 have followed x.y.z as a version scheme. A quick talk on IRC seems to imply that to be scheme for all planned releases as well.

From what I understood, Rich has suggested kind of the same thing; and it makes sense, after all, compiled modules are unlikely to use newly introduced functions, which means that if auditwheel keeps a database of symbols, figuring out the minimum musl version would be doable. I am not 100% comfortable with this approach, however. Still, such a workaround would be a good idea because otherwise “you’d mostly end up imposing gratuitous later version dependency than needed”.

An alternative approach could be allowing the user to manually override the manymusl version in the wheel to allow installation in their system, but I guess this is unlikely to help in container usage.

Yes, statically linked binaries with musl just get an error when they attempt to call dlopen. That said, I’d rather see Issue 43112: SOABI on Linux does not distinguish between GNU libc and musl libc - Python tracker fixed than having to manually parse any ELF at all - the downside being that this would take a while to propagate across python versions, from what I understand.

While unfortunate for users of other musl based distributions, this would work for the most pressing case here, which is python stuff on alpine containers.

All that said, I think sticking with 64-bit platforms for now is the best thing to do. Since Rust hard codes information taken from platform headers instead of being able to read them, they can’t immediately adapt to the time64 changes. See https://github.com/rust-lang/libc/issues/1848

Would it be worth having a stop-gap alpine platform tag which is not supposed to be future-proof? Are the future maintenance requirements too great?

As long as auditwheel can automatically switch from generating alpine wheels to manymusl wheels, there would be no friction anywhere in the packaging workflow, right?

Maybe for an initial release we should try and stick with 1.1.x (not sure which one to pick, would appreciate input from @ncopa), then? Having manymusl_major_minor_arch like it’s done for glibc doesn’t sound too terrible.

I’d rather start with musl 1.2.x and add 1.1 later if needed.

As a general piece of advice: I’d recommend whoever picks this up to pick one of these options and run with it. manylinux happened because we were ruthless about getting something workable into users’ hands, and made whatever compromises we had to to accomplish that. This is the sort of problem where you can spend forever debating and going off on tangents, and it doesn’t help. If you can get something that allows people using the python:alpine docker image to install wheels, then that’s worth shipping, even if it doesn’t solve every other problem.

Then what I would prefer to see is that we start with musllinux (not alpine), and don’t bother with the version number for now. Assume musl 1.2.x with time64 and only support that. But even before that I’d like to be able at compile time tell that this is musl.

Yes, statically linked binaries with musl just get an error when they attempt to call dlopen. That said, I’d rather see Issue 43112: SOABI on Linux does not distinguish between GNU libc and musl libc - Python tracker fixed than having to manually parse any ELF at all - the downside being that this would take a while to propagate across python versions, from what I understand.

I agree that Issue 43112 should be fixed first.

Based on our manylinux experience it will simply be too costly time-wise. We ended up with perennial manylinux specifically because updating the manylinux versions took so long every time.

Now that I’m on a computer with docker, I took a quick peek at the popular alpine image, and it appears that it ships with literally nothing except musl, openssl, libz, and libtls-standalone (which appears to be some funky thing forked from part of libressl):

❯ docker run --rm -it alpine ls /lib /usr/lib
/lib:
apk                    libc.musl-x86_64.so.1  libz.so.1.2.11
firmware               libcrypto.so.1.1       mdev
ld-musl-x86_64.so.1    libssl.so.1.1          modules-load.d
libapk.so.3.12.0       libz.so.1              sysctl.d

/usr/lib:
engines-1.1                 libtls-standalone.so.1
libcrypto.so.1.1            libtls-standalone.so.1.0.0
libssl.so.1.1               modules-load.d

OpenSSL has traditionally been really bad at ABI compatibility, and I doubt many packages are using this libtls-standalone thing since it appears to be unique to alpine. And musl and libz both have extremely stable ABIs and it’s safe to assume they’re available on every musl-based distribution.

So my updated suggestion is:

  • Write a quick PEP for musllinux_X_Y tags, where the tag means that the wheel should work on any real-world distro that uses musl X.Y or later. You can cite the perennial manylinux PEP for most of the details. This PEP could honestly be like 3 paragraphs long. Maybe less.
  • Update pip to understand when it’s running under musl X.Y or later. This will require grossness but it’s doable. Maybe later something like bpo-43112 will make it less gross, but worry about that some other time.
  • Update auditwheel to understand musllinux_X_Y tags
  • Maybe provide some standard musllinux docker images, though tbh these would probably just be the standard python:alpine images + gcc + auditwheel, so the need is a lot less urgent than for manylinux
5 Likes

This sounds exceptionally sensible to me! For pyca/cryptography we’d be able to start producing wheels as soon as auditwheel added support.

2 Likes

I am still a bit worried about time64 mixing with Rust, which is quite relevant, because, as I understand it, the main motivation for this post was using Rust in Python wheels without forcing containers to also include the Rust toolchain. On 32 bit devices, any ABI boundary between Rust and C code which uses types which depend on time_t (so struct timespec, struct stat, …) should by all rights be completely broken - note that using these types from within Rust will still be completely fine, since the libc ABI is completely functional.

For that reason, if we are starting with musl 1.2, it shouldn’t cover 32-bit devices, in my opinion. If someone from the Rust side (maybe someone has a contact they could ping?) could leave their thoughts about this here, that would be of great help. If the risk is considered low enough, we could go forward with 32-bit devices as well, I guess.

Possibly a rust toolchain as well? Luckily setuptools_rust already works around rustc’s behavior of defaulting to static linkage when using musl, so building dynamic wheels should work out of the box.

Isn’t it the other way around? A musllinux_1_2_i686 tag promises to only run on 32-bit with musl 1.2+, which is correct from what I can tell—The runtime it’s incompatible with is musl 1.1. The problematic tag would be musllinux_1_1_i686, which (by the definition of PEP 600) should be compatible with all musl 1.1+ (32-bit), which may not be correct.

One way to work around the problem would be to add an additional contraint. The rules laid out by PEP 600 all still hold, except that if a project published both musllinux_1_1 and musllinux_1_2 wheels on the same index, it does not need to promise the 1_1 one will work on musl 1.2+. Practically installers should always prefer the 1_2 one if both are present (pip already does).

I’ve made a draft here: Pre-PEP: Platform Tag for Linux Distributions Using Musl

1 Like

I think the time_t thing might be a red herring? It only affects parts of the Python C API that use time_t, which appears to be nothing except a few underscore-prefixed functions. I’m not sure any C extensions use them at all, and auditwheel should be able to detect the rare problematic cases in any case.

I don’t mind only support 64 bit initially, even if python itself can support 32 bit devices with musl 1.2+, but yeah, anything using rust won’t work til rust is fixed. Using rust is a cost/benefit decision and I think they are fine with not having 32 bit as they only focus on the bigger architectures.

The problem, as I understand, is that Rust is broken with musl 1.2+ and probably hard to fix. It is not a problem for python to worry about IMHO.

No. musllinux_1_1_i686 will be compatible with all future musl tags. The issue with 1.2 I’m referring to is showcased in GitHub - ericonr/rust-time64, where the resulting binary can simply be entirely miscompiled.

I’m not sure how you would detect this at all? The mismatch is in the type definition, not in linkage. And the issue can appear for any module that uses Rust and C code, for example from an external library that’s linked statically into the bundle, not only when talking directly to the Python C API.

The project I linked above shows the issue I’m worried about, but I have no idea how to actually fix it.