Stable ABI for PyCriticalSection

It’s more like a successful realloc with an aliased pointer (that is, a pointer that other functions are also trying to use). Acquisition is a state change that has to propagate, and if it can’t be propagated, you can’t safely rely on the state of any code that shares it. (For another analogy, it’s also like when a buffer overrun aborts after writing the memory - by that point, you can’t trust your other state, and the safest thing to do is completely bail out and hope that you catch it during development.)

Failure to initialize a critical section can be handled and lead into different logic (e.g. disabling threading completely for your application), but after initialization succeeds, all you can do is tear the tower down as safely as possible - you can’t patch it up.

1 Like

That is not what the current macros do. It sounds like a good idea though.
But, why would the thread state not be part of the PyCriticalSection struct? Like, if we’re using a struct to bundle related values, we don’t also need a separate variable.

That only works for languages that can use the C macros. PyO3 and similar would need to reimplement the macros, calling the private functions directly.
In other words, we need to treat the ABI as our API.

It is that, currently. But if in the future we (or another implementation) need one more pointer and switch PyCriticalSection_Begin to allocate PyCriticalSection dynamically, it can fail.

Right – and PyCriticalSection_Begin initializes the given critical section.
There is a feasible implementation of it that can fail when it hits a memory limit, at which point you can e.g. safely unwind to a REPL.

If we at some point want to remove the ABI, we should make it emit deprecation warnings first.
If we switch to allocating data dynamically, that can fail with a memory error.

If we decide to replace stack allocation with heap allocation, we need a new ABI. If we do that, we don’t have to deprecate/remove the old stack based ABI.

The current limited API already has 96 functions which return void. In the stable ABI, there are already multiple ABI-only functions which return void: PyEval_AcquireLock(), PySys_SetPath(), PyThreadState_DeleteCurrent(), _PyThreadState_Init(), etc. For example, _PyThreadState_Init() now always call Py_FatalError("_PyThreadState_Init() is for internal use only").

So we already have the problem of void functions in the limited API/stable ABI, and we already modified a stable ABI function to always fail with Py_FatalError(). It’s not great, but we have to deal with it anyway.

IMO the critical section API is special enough to have an API which cannot fail.

6 Likes

Context, from a much earlier post:


Thank you! Took me a while to process this.
Yes, initializing, or a critical section should be fallible. But what’s PyCriticalSection’s initialization API?
PyThread_acquire_lock is always called after the fallible PyThread_allocate_lock, which means allocate could fail if it can’t guarantee error-free acquires. For critical sections: if they ever start needing more than 3 pointers of space, then Begin needs to malloc, and it becomes the (fallible) initialization function.
But yes, current PyCriticalSection uses per-object locks, so every object allocation is the fallible initialization that must guarantee error-free acquires. Of course, you don’t want to handle critical-section-specific failures (e.g. disable threading) on every object allocation. OTOH, in the future where it can fail this way, disabling (free-)threading completely (re-enabling the GIL, and turning critical section APIs to no-ops) is a way out.
Let’s do that?

Proposal:

  • The Py_mod_gil slot becomes a bitfield. Practically: instead of re-enabling the GIL if it’s all zero, we only look at one bit (current Py_MOD_GIL_NOT_USED). Users see no change.

    If a future version of CPython needs to change the critical section ABI, it can define Py_MOD_GIL_NOT_USED as either 0b11 or 0b10, depending on whether the 3.15 PyCriticalSection ABI is still kickin’ & infallible, or has been turned to no-ops in this CPython version and thus the extension needs the GIL.

  • We add a reserved pointer to the struct, so if we (or another implementation) needs to add one pointer every few years for whatever unforeseen reasons, users have that many years to recompile without losing free-threading compatibility.


As a general comment: Yes, it’s not great; that’s why I don’t want to add new ones.

If I’m not mistaken, your proposal requires zero source-level changes for people using the critical section macros. If in the future CPython decides to make entering a critical section fail by re-enabling the GIL everything is fine because entering a critical section is already a possibly blocking call. But also I’m a little nervous about designing an escape hatch for a use-case that we don’t actually need yet. It’ll also be untested, I think?

I also don’t think your Py_MOD_GIL_NOT_USED bitfield idea needs anything to happen now besides adding a note about possibly using this in the future for this purpose.

The big change that would need to happen now is adding a new reserved pointer to the critical section struct. I’d want @colesbury to weigh in on that, since I believe he’s the original author of the current design. Is there any difference between adding a new critical section struct and using the reserved field you want to add? Because if not, why not just add a new struct to the ABI if/when we need different functionality?

I’m not sure I have much to add beyond what I’ve written previously and what Nathan and Victor wrote. I think the recent proposals: modifying Py_mod_gil slot and adding extra pointers, makes the API worse for unlikely scenarios that we could deal with in other ways. Like Victor said, the way to evolve the API, if necessary, is to add new structs and new functions.

These critical section structs are essentially just function arguments. We don’t add extra unused arguments to functions “just in case”.

1 Like

The difference is whether existing source code uses the new functionality or if they get stuck on the old functionality until they update their code. “Add a new struct to the ABI” is also not stable, it’s merely backwards compatible - we would need to add new APIs and users would need to opt into using them in order to get the (presumably, fixed/correct) functionality.

Adding a reserved field now means that neither the API nor ABI have to change, and all existing users can be brought into the new functionality without making any changes or even recompiling.

That makes sense. I don’t know whether adding an extra pointer field on a stack-allocated struct will have any performance impact, but I kind of doubt it. So from that perspective it’s probably low-risk to add an extra unused field?

Maybe there should be a general polict for new structs in the stable ABI to have a padding field “for insurance” like this?

I’m not opposed to that as a policy. The vast majority of the stable ABI doesn’t use structs - these locking primitives are an exception in that respect, typically we use runtime-side memory allocation - but if we do intend to add other stack-allocated structs (or struct members, since that has the same requirement) then we’d probably want to be sure that there’s at least an opaque pointer field in them - reserved or not, that can be safely repurposed later if needed.

There’s nothing magical about passing a struct here. It’s not really any different from PyDict_Next(op, ppos, pkey, pvalue) taking a Py_ssize_t *ppos for its state.

1 Like

Definitely, that’s the (contingency) plan. The question is what happens to the old functions – they should keep working so older extensions don’t break.

The general policy is that structs in stable ABI are opaque & heap-allocated. We have an exception here.

Yeah, if PyDict_Next was added now, I suspect we’d do it slightly differently.
But, a big difference here is that PyDict_Next is fallible. It can malloc (which opens possibilities to keep it working even if dict internals change), and it can be deprecated with a runtime DeprecationWarning.

1 Like

I’d like to share the results of some tests I’ve been doing on a version of CPython that exposes critical sections in the stable ABI.

This is in the vein of my post in the packaging thread on PEP 803 where I did some end-to-end tests on real-world packages. I based my tests on the CPython branch I used there, with additional patches to expose the critical section structs, functions, and macros (but not the mutex variants) in Include/critical_section.h. I also exposed PyMutex as an opaque pointer in pytypedefs.h because there are PyMutex * pointers in the critical section structs. You can look over the changes here:

The main reason I had to make these changes is because PyO3 uses critical sections in its implementation, as Petr described in the post opening this thread.

I also created branches of PyO3 and maturin that make the following changes:

  • Add new abi3t and abi3t-py315 PyO3 features to enable abi3t builds.
  • Update PyO3’s FFI wrappers to model the _Py_OPAQUE_PYOBJECT macro and set it at build time along with Py_LIMITED_API and Py_GIL_DISABLED when doing builds with the abi3t feature.
  • Add support in maturin for generating abi3.abi3t wheels.

You can look over the PyO3 changes and maturin changes to get an idea of what I did. These diffs are pretty substantial but I’m hoping I’ll be able upstream part of my changes now, before PEP 803 is decided. I think the refactoring I did to replace boolean flags with enums makes the code clearer than it was before.

I also had to make a small change for cryptography’s CFFI wrappers to account for the adjusted limited API build and enable the abi3t feature crate-wide. Crytography’s CI does it at build time by passing --config-settings, I made the source-level change for convenience.

I also used branches of setuptools and CFFI, as described in my other post.

In the end, I’m able to successfully build cryptography abi3.abi3t wheels using this setup on my Mac. The same wheel installs and passes all tests on both builds. I do need to use a patched version of pip and packaging to actually install the wheels I get. I’d also need to use a patched version of uv to install the wheel on the free-threaded build, because uv doesn’t yet know about the abit3t tag. The current releases of uv and pip will both install abi3.abi3t wheels on the GIL-enabled build, so patching of installers is only needed to support these wheels on the free-threaded build.

All in all, I’m pretty happy with this! So far I haven’t found any issues with testing builds for real-world packages that simultaneously support both builds. I’d also like to re-try this on Linux and Windows to see if there are issues on other platforms.

For what it’s worth, in the context of whatever remaining disagreements there still are in this thread: I don’t think changing the critical section struct or API would make any major difference here. I’d just adapt to it in PyO3 and add conditional compilation to support older free-threaded builds. That said, I personally would still prefer to simply expose the critical section API as-is as I’ve done in my CPython branch, since that requires zero internal churn in CPython.

2 Likes

As I don’t see ways to handle this part of evolution, I opened another vote at the C API WG.