Special suffixes for compiled files for Mac arm64 (M1) and intel x86_x64

Andrej730 · May 20, 2024, 10:11pm

Is there some special suffixes available for compiled files on Mac: something like cpython-xxx-darwin-arm64.so or cpython-xxx-darwin-x86_x64.so? Or is it always just .cpython-xxx-darwin.so?

Any ideas why Python doesn’t seem to have these kind of suffixes? They would be useful as those architectures are incompatible (unless you use build the universal file but it will be twice the size).
Is it because Python itself built for Mac with universal executable?

raulcd · May 20, 2024, 10:17pm

Maybe I am saying something stupid but isn’t this implicit on the wheels that pull that file? In which case why do you need the .so to differentiate if you have a packaged wheel with the corresponding built .so for Mac?

steve.dower · May 20, 2024, 10:45pm

@nad or @ronaldoussoren are your best bet, but it’s likely because we started with fat binaries.

Looking at platform_triplet.c, it seems we only detect macOS generally (and iOS specifically) but don’t put the architecture in there.

Wheels are an optional addition to the Python runtime, so it’s very possible to distribute extension modules in other ways. Being able to support multiple versions or platforms with separate extension modules in the same directory is often very useful, and the SOABI tag allows that.

nad · May 20, 2024, 11:41pm

As noted, if you are distributing extension modules using a wheel, the wheel name can and should include tags that indicate what architecture(s) are supported by the extension modules included them. Tools like pip use that information when selecting which wheel to install. Once an extension module file (.so) is “installed” i.e. in the site-packages directory or somewhere else on sys.path, and a Python program performs an import on a name, the Python import machinery, importlib, will use its rules to search for it and, if it resolves to a .so file, importlib will call the macOS dynamic loader, dyld, to do the loading of the Mach-O format binary code. dyld will determine if the Mach-O binary code can be loaded into the running Python process: if the .so is a single-architecture binary (x86_64 or arm64, say), dyld will return an error if the code is of an incompatible architecture; if the .so is a fat (“universal”) multi-architecture binary, it will choose the best architecture from those present in the file or return an error if none are compatible. In other words, Python doesn’t need to concern itself about the contents of the .so file, it lets the operating system handle it. If the .so file is installed by means other than from a wheel, it’s up to the builder and installer to provide an .so with a compatible architecture (either single-architecture or one of the architectures included in the fat binary). At least, that’s how I recall that it works!

nad · May 21, 2024, 12:31am

Also, in case it wasn’t clear, it doesn’t matter whether the Python interpreter and an extension module are both built as single-architecture or as multi-architecture or a mixture of each, as long as the extension module has at least one architecture that is compatible with the architecture that Python itself is currently running in.

And, to add to the fun, that architecture may not be the CPU architecture of the Mac. For example, with Rosetta2 installed on an Apple Silicon (arm64) Mac, it is possible to force a universal build of Python to run in x86_64 mode and thus would need to load an extension module that includes an x86_64 binary in its .so. The macOS dynamic loader handles all of this transparently.

$ python3.13 -c 'import platform;print(platform.machine())'
arm64
$ arch -x86_64 python3.13 -c 'import platform;print(platform.machine())'
x86_64

Andrej730 · May 21, 2024, 5:46am

I mean if we can just rely on the wheels then there are no need for special suffixes on the compiled files at all, but sometimes modules indeed distributed just using .pyd/.so and it’s nice that it’s possible to recognize the architecture they’re build for just from name.

It seems there is just some discrepancy in approach between Mac and other platforms, e.g. on Windows there are 4 possible tags depending on the used architecture: win_amd64, win_arm64, win32, win_arm32 and on Mac there is none.

Didn’t know that, that’s nice.

I’ve seen some errors importing .so files compiled for arm64 having issues running on x86_64 and vice versa. Does it mean those users were using some special python that was prebuilt only for arm64 or x86_64 and if they would use universal build (e.g. “macOS 64-bit universal2 installer” that’s available on Download Python | Python.org) they wouldn’t have that issue as python would be able to load modules compiled for any architecture? Is this possible to recognize universal python from some architecture specific (import platform;print(platform.machine()) seems to return just arm64 for universal python)?

And is it possible to have both x86_64 and arm64 modules imported during one session of universal python or e.g. importing x86_64 makes current session is dedicated to x86_64 and it wouldn’t be able to load arm64 modules anymore?

I guess the other possibility is they were using something like arch -x86_64 python3.13 to explicitly make universal python work only with x86_64 modules and without that command it woulbe able to handle both x86_64/arm64.

But isn’t it the same how it works on other platforms? Checking architecture is performed by the platform itself but suffixes just needed to detect the possible error early on and maybe use a different file.

PS Sorry, for asking too much questions

nad · May 21, 2024, 2:41pm

sometimes modules indeed distributed just using .pyd/.so and it’s nice that it’s possible to recognize the architecture they’re build for just from name.

It sounds nice but, unfortunately, you can’t really depend on the name in the general case. For one thing, the extension module (.so) file name is generated by whatever build process was used to build the extension and there are many different ones in use today in the Python world. Each one of them would need to be modified to incorporate some agreed-upon rules about what metadata is included in the file name and even then it still wouldn’t guarantee that the data is accurate or, even if accurate, useful as you still would need to know what architecture the Python interpreter doing the import will be running under. As we saw, on macOS, a single Python executable might be able to run as more than one on the same machine. Or there may be more than one Python executable installed on a machine that might try to import the same copy of an extension module.

Note that the execution CPU architecture is determined by the OS when a process is launched, usually automatically by examining the requested executable but, as we saw, it might be influenced by the use of the arch utility or by lower-level APIs. Once a process is launched, all code executed in it must be of a compatible architecture: you can’t mix and match, say, arm64 code and x64_64 code in the same process. Again, the dynamic loader normally enforces that transparently when dealing with multi-architecture binaries.

Also, while recent versions of macOS and of python.org macOS installers support two CPU architectures in fat binaries (what we’ve called universal2), there are at least three additional CPU architectures that are supported by older versions of macOS and older Mac hardware that are still in limited use in the world (i386, ppc, ppc64) and it is still possible to build Python and extension modules to support various combinations of all these architectures. And it is not at all trivial to determine ahead of time which combination of Python interpreter archs and extension module archs will work in a particular macOS version / Mac hardware environment. The rules could even change: during the transition period from PPC Macs to Intel Macs, a few releases of MacOS provided Rosetta which allowed PPC executables to run on Intel Macs but, with macOS 10.7, Apple removed support for Rosetta and suddenly some Python executables and/or extension modules no longer worked. No doubt, Apple will eventually drop support for Rosetta2 and running Intel-64 binaries on Apple Silicon Macs, possibly as soon as the next macOS feature release later this year.

The point of all this is that there is no foolproof method to guess ahead of time which extension modules are going to be able to execute in a particular Python environment on macOS. The best and surest way is to simply try importing the modules to see if they are loadable and, if necessary, catch the exception around the import statement.

I’ve seen some errors importing .so files compiled for arm64 having issues running on x86_64 and vice versa. Does it mean those users were using some special python that was prebuilt only for arm64 or x86_64 […]

It might. If you build Python from source on macOS, the default is build only for a single architecture, the “native” architecture of the machine. And most third-party distributors of Python for macOS, like Homebrew or MacPorts, typically only provide single-architecture pre-built binaries, although they may support building multi-architecture binaries from source.

But that brings out another potential pitfall: any external shared libraries that Python extension modules call also have to provide binaries with compatible architectures. For example, a standard installation of Python itself depends on external libraries like those for OpenSSL, Tk, etc and for a universal build of Python, universal versions of all those external libraries are needed (which the python.org macOS installers take care of). More importantly, extension modules included with third-party packages, however distributed (PyPI wheel, built from source, etc), can also call third-party shared libraries and which may or may not be included in the package. Even if the extension module binary itself is compatible with the running Python interpreter environment, an import could fail if the dynamic load of an extension module’s dependent shared library fails due to incompatible architectures.

And is it possible to have both x86_64 and arm64 modules imported during one session of universal python

As noted above, no. All the code executed in a macOS process has to be one architecture.

But isn’t it the same how it works on other platforms? Checking architecture is performed by the platform itself but suffixes just needed to detect the possible error early on and maybe use a different file.

It may be but, as I’ve tried to outline, macOS multi-architecture files and support for different architectures make it very difficult to determine by inspection what combinations are going to work. And, even it were easier, that would be putting the burden on the suppliers of packages containing extension modules to get the rules right somehow. Presumably there could be similar issues on other (non-macOS) platforms as well.

I’m not sure exactly what problem you are trying to solve here but, if you are trying to ensure that some installation is going to be able to load and start execution successfully, the best approach is to just test it. Perhaps a package provides some post-installation tests or you could write a simple Python program to find and try importing all .so files.

But this situation hasn’t really changed for a very long time: Python has supported multi-architecture (fat) builds on macOS for going on 20 years now. There are certainly ways things could be improved and we are open to suggestions. Perhaps with more information, we all could identify some specific use cases where people are running into problems and try to document or otherwise mitigate them.

Hope this helps!

steve.dower · May 21, 2024, 3:33pm

We certainly have rich ABI suffixes for extension modules, that allow multiple Python versions or OS platforms to coexist and be selected based on filename:

D:\> python3.12 -c "import importlib.machinery; print(importlib.machinery.EXTENSION_SUFFIXES)"
['.cp312-win_amd64.pyd', '.pyd']
D:\> python3.10 -c "import importlib.machinery; print(importlib.machinery.EXTENSION_SUFFIXES)"
['.cp310-win_amd64.pyd', '.pyd']

I think Andrej’s point is that on Windows (and Linux), these tags include the OS architecture, but on macOS it always says darwin (in place of win_amd64 above).

It ought to be possible to add an additional suffix to that list with an architecture matching the one that Python was compiled with (or is running with - no real reason this can’t be dynamic, though today it’s static), and it does enable a very useful feature for embedders or extenders (who are not relying on per-project venvs).

For example, at one point (possibly still true), our debugger for VS Code included native modules for all supported platforms alongside each other in a single directory with the Python code, so that a single import ... would just choose the right one.

nad · May 21, 2024, 4:38pm

I think I’m missing something here. As I tried to outline, the convention on macOS for Python has (always?) been to support multiple architecture .so’s via a single fat .so file rather than having multiple single-architecture .so files. That puts the onus directly on the operating system dynamic loader to determine the best and/or only architecture to dynamically load rather than trying to (imperfectly) guess by adding code in importlib to examine fat .so files etc, code that we’ve never attempted (AFAIK) and that would be a big maintenance headache. If a project finds it easier to build single arch .so’s, they can use the macOS lipo utility to combine the single arch .so’s into a single fat .so. I’m not sure why we would consider changing this after all these years. I fear it would have a big impact across the Python ecosphere.

steve.dower · May 21, 2024, 4:45pm

Everything you say is true, but it applies after the dylib has been located by the importer. The ABI tag applies for the importer to help it choose one that is going to be supported by the OS (and if it happens to not be, then the user will get an error).

There’s no need for Python to inspect anything other than the filename, and provided the existing entries are still in EXTENSION_SUFFIXES, there is no impact on any existing scenario - they’ll just check for foo.cp312-darwin-m1.dylib first before foo.cp312-darwin.dylib. Whichever is selected is loaded as normal.

nad · May 21, 2024, 4:46pm

But why do we need to add something to the importer and to all extension module builders when what we have already works? The importer just needs to find that one fat file.

steve.dower · May 21, 2024, 4:47pm

I guess some people don’t want to build one fat file but would rather have separate arch-specific ones?

nad · May 21, 2024, 4:50pm

Well, then they could contribute and maintain the code to importlib, setuptools, uv, multiple setup.py files, et al Seriously though, I don’t see what has changed that made this an issue after all these years. Any insights?

steve.dower · May 21, 2024, 4:55pm

The original question seems to be about why macOS is different from the other platforms, which I suspect has been answered somewhere above. Whether it should change isn’t necessarily being asked.

It wouldn’t surprise me if something related to AI/GPU is generating large binaries - CUDA is renowned for it - and doubling the size of an extension module may actually be tens or hundreds of MB. But chances are this is just a “why the inconsistency” question (you could ask the same for why Linux has the ABI in the extension but Windows does not - there’s no real consistency in the SOABI tag).

nad · May 21, 2024, 5:01pm

The original question seems to be about why macOS is different from the other platforms, which I suspect has been answered somewhere above.

That may be it. I’m not sure that we’ve totally answered why since I wasn’t around when the original design decisions were made back in the day. But not trying to duplicate functions of the macOS dynamic loader seems like one good reason. In any case, we could certainly improve documentation in this area and perhaps someone will be motivated to add something to packaging documentation somewhere. I’d be happy to review it.

Andrej730 · May 21, 2024, 5:34pm

Thanks Ned for bringing much more context into this, there are much more details.

Personally I don’t have some issue at hand that would be resolved but this (yes, I guess it would be more convenient to use .so files but it’s not crucial) and I was just interested to learn about the roots of the discrepancy and what are the caveats if there would be an extended suffix.

I’m not saying that it should be changed or that there would be a huge benefit for the community - maybe everyone indeed is just using fat binaries and if there is no demand for a change from the community then it might best way to handle it instead of storing binaries separately, given that you never know what would be the next twist with Mac architecture that Apple will decide to make (though it seems a bit odd for me to store multiple compiled files in the one, especially for large libs).

But let’s say there would be such suffix that would match the Python that was used to build the extension - would it techincally work the way I describe below or I’m missing something?

Then we would get the following extensions:

.cpython-311-darwin-arm64.so
.cpython-311-darwin-x86_64.so
.cpython-311-darwin.so

Based on two assumptions (they seem pretty solid):

OS determines the architecture when python process started
OS doesn’t allow mixing different architectures in one process

I guess it is safe for importlib to check platform.machine() and if it’s returned “arm64” it can try to look for .cpython-311-darwin-arm64.so first. If it didn’t worked out, then it would make sense to look for .cpython-311-darwin.so next, as it indeed maybe a fat binary (or just mac binary that didn’t specified the architecture in the name). Then it would look for .so files.

That way if there are no architecture suffixes present in current folder, it would keep the current behaviour. If there is, it would autodetect the matching .so file.

In case if Mac is originally arm64 and Rosetta2 or some other binary translator involved or Python was started with arch -x86_64, then platform.machine() would return x86_64 and .cpython-311-darwin-x86_64.so will have a priority.

If Mac removes translator tomorrow and we’re on arm64 then Python would either not start at all (if Python didn’t had only x86_64 binaries in it) or would start only as arm64 and would be able to import only arm64 extensions.

ronaldoussoren · May 24, 2024, 12:09pm

As one of the people involved with getting us to the current setup, we use fat binaries (or “Universal Binaries” in Apple speak) because that’s the the standard on macOS (and has been for a very long time, this convention was inherited from NextStep).

This gives us a setup where users don’t have to care about the system architecture. That’s lets important with extension modules and their current distribution mechanism (wheels, …), but is still important for the main binary (e.g. “python3.12” just works, you don’t have to known what CPU you system has to start python or to download the right installer for your system).

But what would this bring us compared to the current setup? Adding this would complicate things, with little to no gains (IMHO).

The only reason I can come up with is that this would make it slightly easier to create installations supporting both architectures even when package maintainers, like most of the scientific python stack, don’t ship universal binaries. But that would still require additional tooling or changes to tooling.

In any case the need for fat binaries is a temporary situation, in a couple of years all relevant Mac hardware will be Apple Silicon anyway (at least until there’s a migration to yet another CPU architecture )

nad · May 24, 2024, 2:37pm

(Catching up after PyCon travel) Thanks for chiming in, @ronaldoussoren. I have two additional comments. One can always use the file utility from the command line to determine the compiled architecture(s) of any binary file including a Python .so. And if the file size of a fat binary .so is a concern and if you can be certain of the execution environment, the lipo utility can also be used to split out the required architecture from a fat binary into a single arch file and have that replace the fat file.

tungol · June 3, 2024, 5:30pm

I believe it’s actually older than that - fat binaries were used during the 68k to PPC transition back in the MacOS classic days. Back then, applications were conventionally a single file which contained everything needed for that program, split between the data fork and the resource fork of the file. Contrast that to say, Windows, where you commonly have a folder with the main executable, maybe some DLLs, and the miscellaneous resources all as separate files. In that context, having two binaries would have been pretty weird, and it would have put the onus on the end user to know which one of the two they needed.

NextStep retained some ideas from classic MacOS, even if the technical underpinnings were totally new - the single file programs became the .app folders that are technically many files but behave like a single file in most user-facing contexts. That could have been an opportunity to switch to separate binaries hidden away in the .app folder. I can’t speak to why they decided to keep using fat binaries in that context; it must have made something easier.

ronaldoussoren · June 4, 2024, 6:48pm

Fat binaries also work for code that isn’t in an app bundle, such as /usr/bin/python3.