With the advent of free threading, we will need to provide a public API for dealing with atomic operations (ones which cannot be interrupted by other threads).
This API is new in Python 3.13. I would prefer to wait for at least one Python release to see how it goes first in Python itself. In the past, we got compiler issues (especially with C++) with the previous “atomic” C API. This API was moved to the internal C API to fix compilation issues.
I think we can do better than that. The tooling is being developed for exactly the same reason C extension writers would want to use them: portable code.
E.g. look at the ticket I linked to - it’s dealing with making importing C API Capsules thread-safe. C extensions will have to do the same if they want to e.g. import the capsule for the Unicode name database or, probably more common, the datetime module C API.
That’s fair.
I just think we need to bring up these questions early to not leave C extension writers who want to prepare for free threaded CPython having to build their own atomic operation APIs.
Hmm… First, sorry for not looking at this issue before. But I think that begs the question: what API do you want to see exactly?
My take:
PyCapsule_Import should be made thread-safe (if not already so), so that “normal” extension code does not have anything to change.
For some reason, the interpreter state keeps a direct pointer to the unicodedata C API through _Py_unicode_state. I suppose that’s for micro-optimization, but I don’t think this is a common (and useful) pattern in third-party code. I also think that we should strive for the fast path in PyCapsule_Import to be as fast possible, so that third-party code isn’t motivated to developer similar micro-optimizations.
Interacting with the CPython C API, even in free-threaded mode, should not require using atomic operations explicitly. Everything should be hidden behind higher-level APIs (potentially inlineable).
I stand by my point that I don’t think it’s CPython’s business to expose a portable API for low-level atomic operations on standard C types. If you want a portable C API for such things, there are easily-vendorable, dedicated libraries such as portable-snippets.
I agree with this, but I’d like to add another reason. Keeping to higher-level APIs that belong to the interpreter for interacting with the interpreter prevents the issue of macros that don’t necessarily work well for non C native extensions that could otherwise use the C API. (yes, everything could be rewritten as a function, but there are a lot of macros to consider, vs only doing it for the things that actually need to be exposed)
The API closely mirrors C11 atomics. And those are also the main implementation, though there are GCC and MSVC fallbacks/specializations.
As far as I can see, MSVC added C11 atomics last year as an experimental feature. I assume that it’s likely to stop being experimental at a similar time scale as nogil.
IMO, users that need this should use what their compiler provides, rather than learn a Python-specific API (which we’d need to document and maintain forever).
This doesn’t help with writing portable code, though, and that’s the main reason we have those (and other similar abstractions) in the CPython code base.
Why should we require C extension writers to duplicate all this work in each of their extensions, when the code is there already, just not exposed ?
Would it be workable to provide a small package that would allow packagers to build against these headers without CPython committing to maintaining the API?
I’m thinking here of numpy, which provides a get_include(). Since these are purely headers, it need only be a build-time dependency.
I agree with @malemburg that we should make the atomics API public. I also think we should make some synchronization primitive, like PyMutex, public as well. That would would also require the atomics API to be public. I don’t think the ucnhash_capi changes are relevant here.
These APIs are useful for making C extensions thread-safe. Extensions written in Rust and C++ already have good APIs for locking and atomics, but C does not. This is important for extensions NumPy and Cython. The alternatives is that extensions either use this code anyways or write their own broken code.
I don’t think it’s worth waiting for another release. I’m not too worried about compiler issues, and not making these public doesn’t actually save us there – they’re already included in Python.h, they’re just labeled as “private” by the leading underscore.
Just want to chime in as a NumPy developer that it would be very nice to have better portable synchronization primitives provided by the C API.
I recently had to add a mutex to an extension type and doing so in a portable way in C was not straightforward. I ended up using a PyThread_type_lock mutex, which is not nearly as performant as PyMutex but allowed me to avoid attempting to implement or find a portable mutex in C. It’s also worth pointing out in this discussion that PyThread_type_lock is already a publicly exposed synchronization primitive in the C API.
There’s probably lots of reasons to write greenfield extensions in C++ or rust rather than C. However, there’s a ton of existing C (or cython code compiled to C) that will need to be retrofitted to avoid thread safety bugs. If CPython provides something in the C API to aid that process, that will make the effort of porting much easier IMO.