Add `sys.abi_features` to make information about the interpreter ABI more accessible

This issue comment seems relevant, as the info required (which architecture DLLs should be added to the OS search path) really ought to come from sys (runtime info, rather than build time or platform).

I would argue that it’s about matching the ABI, but it’s hard to define the full range. Possibly we also need an unspecified platform-specific string that’s probably a compiler triple, but not necessarily? Or maybe that belongs under sys.implementation?

Not yet, but we can add the rest in a new PR :‍)

We have too many unspecified strings that are frozen forever for backwards compatibility. Rather than adding another one of those, I’d rather not add anything now.

Can we do better than sys.winver and sysconfig.get_config_var('MULTIARCH')?

1 Like

Provided we keep sys.winver as x.y-plat (where plat is omitted for AMD64), then it doesn’t need changing.

I don’t know MULTIARCH… is that a Debian thing? It certainly doesn’t imply “the current runtime arch” to me at least by name, it sounds like it’d only be set if the runtime was originally cross-compiled (and potentially may include multiple architectures), and so I still need a fallback to figure out the current system and then assume that Python is not being emulated.

The only existing sysconfig variable on Windows that would be useful is SOABI, but that’s also deliberately customisable. But maybe it’s the right path here, since if someone is customising it then presumably they know which platform binaries should be used, even if it doesn’t actually match what the build system inferred? (It would also need its format locked if we were to tell people to parse it, since I don’t know that it formally is locked.)

MULTIARCH is where the “platform triplet” ends up; the triplet is a CPython-specific thing that comes from the guesswork in Misc/platform_triplet.c.
“Multiarch” is generally the mechanism for machines that allow multiple architectures at runtime, on Linux mostly used for 32-/64-bit x86.

I disagree with telling people to parse strings. If we’re able to document the format, we should also be able to expose the source information directly.
And we shouldn’t lock SOABI – we might need to put some new information in it.

2 Likes

The problem is that the source information is usually an arbitrary string, though at least it’s going to be used as a key of some sort (so it’s an opaque identifier that’s only meaningful if you recognise the identifier). A platform triplet is probably a bit over-specified (compiler choice shouldn’t usually affect the ABI, whereas architecture does, and I think the compiler is part of the triplet?), but at least it has some history and are generally more understood than if we were to invent something new.

I’m suggesting that a sys.abi_info field is the best place to expose this. It’s relevant information at runtime that shouldn’t require parsing the Makefile (i.e. without import sysconfig) - more specific than sys.platform but still closer to that than platform.*().

(Agreed with everything else you said. Well, agreed with everything in this post, but clearly we think “expose the source information” leads to different outcomes.)

1 Like

Any part of a platform triplet can be “unknown”, in case you don’t care. Sometimes you’ll see the libc as a fourth item of the “triplet”. The triplet that CPython generates is CPython-specific. The architecture/processor names differ across distros (e.g. Debian & CPython use powerpc64le where Fedora uses ppc64le). You can add a compiler if you want, who’d stop you?

SOABI looks like something to expose in importlib – a useful ingredient in filenames or path components. But parsing it to identify a processor architecture is a hack.

Which means to read and interpret one, you need to parse it, which we want to avoid. So I don’t think MULTIARCH is a suitable option here.

Right, but presumably this means that if your set of .so’s had two subdirectories, one powerpc64le and one ppc64le, they’d contain different builds (or possibly the same, so throw in i386 as an option too)? And so you want to retrieve that value to choose which directory to load from - where do you get that value from?

Why are these directories named only after the processor type?
Ideally, the directories would have SOABI-style names – cpython-313-powerpc64le-linux-gnu and cpython-313-i386-linux-musl. Python 3.17 could change the scheme without breaking anything, or you could override it.
By “SOABI-style names” I mean an opaque mechanism to serialize everything in abi_features (except what’s irrelevant, like byte order on x86) plus the CPython version.

On *nix the source info for processor type is in defines like __x86_64__, __i386__, __aarch64__, __m68k__. Should we take that and fill in abi_info.machine (or abi_info.base_isa) with a string? And explcitly document a set of spellings to use, to eliminate platform differences?

Because they’re not Python binaries, they’re just arbitrary system binaries. Might be loaded with ctypes for example.

If you look at the issue I linked originally, the context is os.add_dll_directory, which is (roughly) the equivalent of setting RPATH, not PYTHONPATH.


Also, we don’t really want too many options here. We just need the granularity of “will ctypes be able to load it”, no more or less. (And yes, I’m aware that a great way to do this is to just try loading each until one succeeds, but that doesn’t stop people trying to invent all sorts of ways to guess.)

Who named the directories? If they’re system binaries, is there a system-specific convention we need to be compatible with?

To me, that seems like a lot of granularity. ctypes might fail on some unknown symbol. Or an incompatible libc, or libc++.

I’m looking for something that will stop us from trying to invent new ways to guess.

These two go together quite well, because I expect each system will have some name stored somewhere that’s accessible at compile time to identify it for the purposes of cross-compilation. And that’s essentially the name we need (not necessarily so that it can be used for cross-compilation, but it is essential to have the information to handle “the current system is X, but this binary will run on Y”, and we want to statically embed “Y” in case the runtime answer to “what is the current system” is not “Y” but the binary is somehow still running).

Okay, don’t take the analogy too literally. The binary could also include a bug in its entry point that prevents it from loading, but we’re clearly in the context of “is it built for an architecture that can be executed on the current system”, and so if it gets past that check, it suffices, even if a later issue causes it to actually fail to do anything useful.

I don’t think you can expect a single name though. GCC/clang will define one[1] of __x86_64__, __i386__, __aarch64__, __s390x__, etc., and CPython takes it from there.

Do you want to distinguish between different libcs here? If not, I don’t think this is about “will ctypes be able to load it”, but about CPU type specifically.


  1. and perhaps even only one ↩︎

I’m not enough of an expert on all the platforms to know how granular we need to be.

Apparently we do need to be that granular in manylinux, so I assume that would apply here as well. On macOS there might be a need to distinguish between framework/non-framework (and similarly in some Windows contexts, though none we currently support). WASM may need to distinguish between Emscriptem and WASI.

Of course, it may be possible to build fully native libraries that can work across these boundaries (e.g. by not using a libc at all).

Windows is my area of expertise, and there I’d say that the granularity of process architecture is enough, because the ABIs are reliable enough and DLL dependency interactions are predictable.[1]

Then perhaps this is the right level? It should certainly be an option that’s available at compile time, so that would seem to fit the bill. Do we store these names (if defined) as a string anywhere in our builds? Is sys.abi_info a good place for it?


  1. From the POV of the process, not necessarily the author of a library whose code is going to be thrown into all sorts of contexts. ↩︎

1 Like

We don’t – CPython essentially picks the ones it cares about, so we:

  • pick one of alternate spellings, for example __x86_64__ rather than __amd64 or __x86_64
  • don’t recognize some architectures that we don’t need to recognize yet

So, the result is necessarily CPython-specific. Sometimes you get “unknown”. Sometimes you get details a user won’t care about. If someone ports to a new architecture, they get to invent a new name. And we shouldn’t guarantee stability between versions, because CPython’s needs might change.

Which gets us back to SOABI, e.g. cpython-313-powerpc64le-linux-gnu. That has exactly the right granularity for CPython, by definition.

Until someone brings clang-cl or MinGW, right? :‍)

1 Like

The most important APIs aren’t in the Windows C runtime, though.[1] So while switching libc on POSIX makes a huge difference to “can this even load”, most of the important APIs are coming from kernel32.dll or other system DLLs, which are going to be the same between compilers.

But this is the kind of difference that, until proven otherwise, I’d expect this value to ignore. LoadLibrary(some_dll) doesn’t rely on the other DLL having used the same CRT.

And so we get back to “if you care about this, parse SOABI and strip off the CPython-specific part”. And since we want to avoid parsing (yes, we’re going in circles, but only because I feel like you keep ignoring your own criteria :wink: ), then we should store the rest of it somewhere as a static string. Is sys.abi_info a good place for that?


  1. We’re in a very different space here from “if we change the CRT that CPython links to, can an embedding app safely load it and have the new CRT understand state from the host app’s CRT”. ↩︎

“CRT state” refers to things like a functions taking FILE* breaking if the file is opened using the wrong CRT, right?

I’m saying all of SOABI is CPython-specific.
If it reliably contained some non-CPython-specific part that you could parse out, we should expose that part individually. But, it doesn’t. AFAIK, the format itself depends on the Python version.

We can store all of SOABI. Or we can figure out what exactly is the useful part, and standardize that – but it seems that what’s “useful” depends on the application (and the OS).

That, plus locale, std stream settings, exception/error handling settings, fileno collisions, and probably a few other things (those just came immediately to mind). But usually few of these things are actually going to cross module boundaries - they matter vastly more for embedding scenarios - and if you’re writing CPython-independent native code then it’s your own job to be aware of it.

We’re talking about an arbitrary string anyway, at all parts of this. The end user who wants it just wants an arbitrary string as well. “SOABI without the Python version” is not unreasonable - it’s the granularity that CPython needs.

Sounds fine, we need a name.

Also, I assume this should omit the d tag if it doesn’t affect ABI compatibility. (That is, use ALT_SOABI if available.)
Or leave ABIFLAGS out of it entirely?

“Platform ABI”/platform_abi?

I could see it as a sys attribute (alongside sys.platform, which is wholly insufficient but often presumed to be right for this), or sys.abi_info.platform_abi.

Some of the ABIFLAGS might be relevant, actually, but not all of them. Free-threading isn’t relevant, but our (now removed) flags for wchar_t size and possibly memory allocator might matter more? Possibly long size might be relevant too? I really just don’t know how predictable these are given powerpc64le-linux-gnu - I guess that may be detailed enough to imply it all?

That’s my main worry here – can we be sure sys.abi_info.platform_abi won’t join the long list of information sources that turned to be wholly insufficient…

Both wchar_t and long size should be part of the platform ABI.
The CPython u flag described the size of Py_UNICODE; wchar_t was only sometimes an alias for that.
I don’t think of the Python memory allocator as part of the platform ABI. Nowadays it’s a runtime setting anyway.