Add `sys.abi_features` to make information about the interpreter ABI more accessible

steve.dower · September 2, 2025, 1:33pm

This issue comment seems relevant, as the info required (which architecture DLLs should be added to the OS search path) really ought to come from sys (runtime info, rather than build time or platform).

I would argue that it’s about matching the ABI, but it’s hard to define the full range. Possibly we also need an unspecified platform-specific string that’s probably a compiler triple, but not necessarily? Or maybe that belongs under sys.implementation?

encukou · September 3, 2025, 12:45pm

Not yet, but we can add the rest in a new PR :‍)

We have too many unspecified strings that are frozen forever for backwards compatibility. Rather than adding another one of those, I’d rather not add anything now.

Can we do better than sys.winver and sysconfig.get_config_var('MULTIARCH')?

steve.dower · September 3, 2025, 2:27pm

Provided we keep sys.winver as x.y-plat (where plat is omitted for AMD64), then it doesn’t need changing.

I don’t know MULTIARCH… is that a Debian thing? It certainly doesn’t imply “the current runtime arch” to me at least by name, it sounds like it’d only be set if the runtime was originally cross-compiled (and potentially may include multiple architectures), and so I still need a fallback to figure out the current system and then assume that Python is not being emulated.

The only existing sysconfig variable on Windows that would be useful is SOABI, but that’s also deliberately customisable. But maybe it’s the right path here, since if someone is customising it then presumably they know which platform binaries should be used, even if it doesn’t actually match what the build system inferred? (It would also need its format locked if we were to tell people to parse it, since I don’t know that it formally is locked.)

encukou · September 4, 2025, 7:19am

MULTIARCH is where the “platform triplet” ends up; the triplet is a CPython-specific thing that comes from the guesswork in Misc/platform_triplet.c.
“Multiarch” is generally the mechanism for machines that allow multiple architectures at runtime, on Linux mostly used for 32-/64-bit x86.

I disagree with telling people to parse strings. If we’re able to document the format, we should also be able to expose the source information directly.
And we shouldn’t lock SOABI – we might need to put some new information in it.

steve.dower · September 4, 2025, 3:44pm

The problem is that the source information is usually an arbitrary string, though at least it’s going to be used as a key of some sort (so it’s an opaque identifier that’s only meaningful if you recognise the identifier). A platform triplet is probably a bit over-specified (compiler choice shouldn’t usually affect the ABI, whereas architecture does, and I think the compiler is part of the triplet?), but at least it has some history and are generally more understood than if we were to invent something new.

I’m suggesting that a sys.abi_info field is the best place to expose this. It’s relevant information at runtime that shouldn’t require parsing the Makefile (i.e. without import sysconfig) - more specific than sys.platform but still closer to that than platform.*().

(Agreed with everything else you said. Well, agreed with everything in this post, but clearly we think “expose the source information” leads to different outcomes.)

encukou · September 4, 2025, 4:11pm

Any part of a platform triplet can be “unknown”, in case you don’t care. Sometimes you’ll see the libc as a fourth item of the “triplet”. The triplet that CPython generates is CPython-specific. The architecture/processor names differ across distros (e.g. Debian & CPython use powerpc64le where Fedora uses ppc64le). You can add a compiler if you want, who’d stop you?

SOABI looks like something to expose in importlib – a useful ingredient in filenames or path components. But parsing it to identify a processor architecture is a hack.

steve.dower · September 4, 2025, 5:01pm

Which means to read and interpret one, you need to parse it, which we want to avoid. So I don’t think MULTIARCH is a suitable option here.

Right, but presumably this means that if your set of .so’s had two subdirectories, one powerpc64le and one ppc64le, they’d contain different builds (or possibly the same, so throw in i386 as an option too)? And so you want to retrieve that value to choose which directory to load from - where do you get that value from?

encukou · September 5, 2025, 9:19am

Why are these directories named only after the processor type?
Ideally, the directories would have SOABI-style names – cpython-313-powerpc64le-linux-gnu and cpython-313-i386-linux-musl. Python 3.17 could change the scheme without breaking anything, or you could override it.
By “SOABI-style names” I mean an opaque mechanism to serialize everything in abi_features (except what’s irrelevant, like byte order on x86) plus the CPython version.

On *nix the source info for processor type is in defines like __x86_64__, __i386__, __aarch64__, __m68k__. Should we take that and fill in abi_info.machine (or abi_info.base_isa) with a string? And explcitly document a set of spellings to use, to eliminate platform differences?

steve.dower · September 5, 2025, 10:48am

Because they’re not Python binaries, they’re just arbitrary system binaries. Might be loaded with ctypes for example.

If you look at the issue I linked originally, the context is os.add_dll_directory, which is (roughly) the equivalent of setting RPATH, not PYTHONPATH.

Also, we don’t really want too many options here. We just need the granularity of “will ctypes be able to load it”, no more or less. (And yes, I’m aware that a great way to do this is to just try loading each until one succeeds, but that doesn’t stop people trying to invent all sorts of ways to guess.)

encukou · September 8, 2025, 8:32am

Who named the directories? If they’re system binaries, is there a system-specific convention we need to be compatible with?

To me, that seems like a lot of granularity. ctypes might fail on some unknown symbol. Or an incompatible libc, or libc++.

I’m looking for something that will stop us from trying to invent new ways to guess.

steve.dower · September 8, 2025, 5:04pm

These two go together quite well, because I expect each system will have some name stored somewhere that’s accessible at compile time to identify it for the purposes of cross-compilation. And that’s essentially the name we need (not necessarily so that it can be used for cross-compilation, but it is essential to have the information to handle “the current system is X, but this binary will run on Y”, and we want to statically embed “Y” in case the runtime answer to “what is the current system” is not “Y” but the binary is somehow still running).

Okay, don’t take the analogy too literally. The binary could also include a bug in its entry point that prevents it from loading, but we’re clearly in the context of “is it built for an architecture that can be executed on the current system”, and so if it gets past that check, it suffices, even if a later issue causes it to actually fail to do anything useful.

encukou · September 9, 2025, 8:14am

I don’t think you can expect a single name though. GCC/clang will define one^[1] of __x86_64__, __i386__, __aarch64__, __s390x__, etc., and CPython takes it from there.

Do you want to distinguish between different libcs here? If not, I don’t think this is about “will ctypes be able to load it”, but about CPU type specifically.

and perhaps even only one ↩︎

steve.dower · September 9, 2025, 1:43pm

I’m not enough of an expert on all the platforms to know how granular we need to be.

Apparently we do need to be that granular in manylinux, so I assume that would apply here as well. On macOS there might be a need to distinguish between framework/non-framework (and similarly in some Windows contexts, though none we currently support). WASM may need to distinguish between Emscriptem and WASI.

Of course, it may be possible to build fully native libraries that can work across these boundaries (e.g. by not using a libc at all).

Windows is my area of expertise, and there I’d say that the granularity of process architecture is enough, because the ABIs are reliable enough and DLL dependency interactions are predictable.^[1]

Then perhaps this is the right level? It should certainly be an option that’s available at compile time, so that would seem to fit the bill. Do we store these names (if defined) as a string anywhere in our builds? Is sys.abi_info a good place for it?

From the POV of the process, not necessarily the author of a library whose code is going to be thrown into all sorts of contexts. ↩︎

encukou · September 10, 2025, 7:44am

We don’t – CPython essentially picks the ones it cares about, so we:

pick one of alternate spellings, for example __x86_64__ rather than __amd64 or __x86_64
don’t recognize some architectures that we don’t need to recognize yet

So, the result is necessarily CPython-specific. Sometimes you get “unknown”. Sometimes you get details a user won’t care about. If someone ports to a new architecture, they get to invent a new name. And we shouldn’t guarantee stability between versions, because CPython’s needs might change.

Which gets us back to SOABI, e.g. cpython-313-powerpc64le-linux-gnu. That has exactly the right granularity for CPython, by definition.

Until someone brings clang-cl or MinGW, right? :‍)