With PEP 793 (PyModExport), it’s possible to build Stable ABI extensions compatible with both free-threaded and “GIL-ful” builds.
We’re still not there:
-
We need a way to select name/describe the ABI and the associated API limitations. Two draft PEPs do that: PEP 803 and PEP 809. Without those, you need to patch
Python.h(or use internal API that won’t stay until the 3.15.0 final release), and without tool support you need to build/install manually. Even given all that, there’s enough demand for a free-threaded stable ABI that people are experimenting with the opaque PyObject ABI. -
To take advantage of free-threading – as opposed to extensions just being loadable by free-threaded CPython – we need to expose some locking primitives. Both 803 and 809 say:
Limited API to allow thread-safety without a GIL – presumably
PyMutex,PyCriticalSection, and similar – will be added via the C API working group, or in a follow-up PEP.
So, how should a stable ABI for PyCriticalSection look like, if/when we have a stable ABI for free-threaded Python?
I’m starting with PyCriticalSection for a few reasons:
- Mutexes are provided by the
PyThread_*_lockAPI, which is already part of the Stable ABI. (They’re documented as obsolete – possibly for reasons that actually make them good Stable ABI, but that’s getting off-topic here.) - Rust and C++ have mutexes in their standard library. PyO3 and Cython both expose “safe” wrappers for platform mutexes that can’t deadlock with the interpreter. It’s really only C on platforms that don’t have
threads.hwhere PyMutex is really convenient, otherwise people have developed workarounds. - It’s been identified as a pain point by early adopter testing in PyO3.
PyCriticalSectionallows extension authors to control the per-object PyMutex locks in the free-threaded build and fundamentally cannot be emulated without some sort of hook into the interpreter runtime to achieve the same deadlock-protection characteristics and locking semantics. - The most natural way to use some C APIs is by holding a critical section. For example, dict iteration with
PyDict_Next.
The current PyCriticalSection API uses a stack-allocated structure. Allocating it dynamically on each use makes it slower (7.283 vs. 3.93 ns in microbenchmarks), which isn’t all that terrible, but still best to keep heap allocation only as a contingency option (for the possible future when the mechanism changes, but the ABI needs to stay).
Stack allocation by the caller means that the size of the struct needs to be part of the ABI; it can’t change in future versions.
There’s another constraint that I’d like to keep: the size of the struct should be the same in Stable ABI as in the “full”, version-specific ABI. That is, Stable ABI should continue being a subset of any “full” ABI. This doesn’t matter in practice if the struct is only ever stack-allocated, but starts mattering as soon as someone puts it in their object. That limitation is hard to explain, and essentailly impossible to enforce.
The current API is:
- structs with private fields:
PyCriticalSection(2 pointers),PyCriticalSection2(3 pointers) - functions:
PyCriticalSection_Begin,PyCriticalSection_End,PyCriticalSection2_Begin,PyCriticalSection2_End - C convenience macros, each of which contains unpaired
{or}:Py_BEGIN_CRITICAL_SECTION,Py_BEGIN_CRITICAL_SECTION2,Py_END_CRITICAL_SECTION,Py_END_CRITICAL_SECTION2
The macros are the preferred API. Non-C wrappers like PyO3 need to reimplement the macros, ideally as their language’s flavour of “context manager”.
I think we can put this in the Stable ABI as-is (and to non-threaded builds as no-ops).
If we later need to make an incompatible change, we:
- add new structs and functions with new names (possibly just a
_v2at the end); - make the convenience macros call the new functions;
- keep the old functions working, even if they need a malloc/free to get at a larger size.
We will not add build-time aliases like #define PyCriticalSection_Begin PyCriticalSection_Begin_v2. Since the old PyCriticalSection_Begin function needs to stay (for old Stable ABI extensions), the C PyCriticalSection_Begin would refer to a different function than ctypes.pythonapi.PyCriticalSection_Begin. This is confusing and error-prone.
That leaves one thing: fallibility. I don’t think we can guarantee that PyCriticalSection_Begin can never fail (which includes never emitting a warning, since -Werror is a thing).
What comes to mind is changing Py_BEGIN_CRITICAL_SECTION(op) to:
{
PyCriticalSection _py_cs;
int PyCriticalSection_result = PyCriticalSection_Begin(&_py_cs, (PyObject*)(op));
to be used as:
Py_BEGIN_CRITICAL_SECTION(o);
if (PyCriticalSection_result < 0) {
return -1;
}
...
Py_END_CRITICAL_SECTION();
In the non-limited API, we can add if (PyCriticalSection_result < 0) Py_UNREACHABLE(); – tell compilers to elide any checks when not targetting the Stable ABI.
How does that sound?
Thanks to @ngoldbaum for help drafting this!