A new API for ensuring/releasing thread states

ZeroIntensity · March 10, 2025, 9:51pm

History

There’s been a desire to have an API to safely acquire the GIL (or really, an attached thread state) when Python could finalize (or is finalizing), which isn’t possible with PyGILState_Ensure, or any public API right now. The problem is that even with a check like Py_IsFinalizing, the runtime could still finalize during the call to PyGILState_Ensure, which will (most likely) end up crashing. There’s an open issue about this too: Fail PyGILState_Ensure() If Finalizing? · Issue #124622 · python/cpython · GitHub

Originally, the plan was to call this new function PyGILState_EnsureOrFail, but I had some concerns about the name. Mainly, the term “GIL” isn’t something we want in the public API anymore; there can be more than one GIL (so “the GIL” isn’t something that really exists), and the GIL can be disabled entirely on free-threaded builds. I’ve outlined this more clearly in an unrelated issue.

Additionally, a few months ago, @eric.snow, @encukou, and I discussed some of the implications that PyGILState_Ensure has on subinterpreters. If a thread spawned by a subinterpreter were to call PyGILState_Ensure, then the thread will have the GIL of the main interpreter, rather than the subinterpreter. That means the thread can’t safely interact with the calling interpreter at all.

So, it’s pretty clear that we need a new API that covers a few bases:

Returns errors that can be handled–PyGILState_Ensure either segfaults, hangs the thread, or emits a fatal error.
Can safely run at finalization without crashing.
Takes an explicit interpreter to prevent problems with interpreter mismatch.
Semantically clear that we’re dealing with thread states, not necessarily the GIL.

Proposal

I’m proposing two new functions:

int PyThreadState_Ensure(PyInterpreterState *interp, const char **errmsg);
void PyThreadState_Release();

The usage would be like this:

/* Dummy structure to hold a Python object and interpreter */
typedef struct {
    PyObject *callable;
    PyInterpreterState *interp;
} pyrun_t;

static int
c_thread_func(pyrun_t *to_run)
{
    /* Acquire a thread state so we can call Python */
    const char *err;
    if (PyThreadState_Ensure(to_run->interp, &err) < 0) {
        fprintf(stderr, "Failed to talk to Python: %s\n", err);
        abort();
    }

    /* Run the function */
    /* ...  */
    PyThreadState_Release();
}

@vstinner has been very helpful in implementing this so far:

github.com/python/cpython

gh-124622: Add PyThreadState_Ensure() function

main ← vstinner:tstate_ensure

opened 06:56PM - 11 Feb 25 UTC

vstinner

+362 -19

Add PyThreadState_Ensure() and PyThreadState_Release() functions. Add new "Or…Fail" internal functions: * _PyEval_AcquireLockOrFail() * _PyEval_RestoreThreadOrFail() * _PyThreadState_AttachOrFail() * take_gil_or_fail() * Issue: gh-124622 ---- 📚 Documentation preview 📚: https://cpython-previews--130012.org.readthedocs.build/

But, there’s still some parts of the API that need to get ironed out, so that’s why I’m here.

Currently, it returns -1 on failure, and is passed a pointer to a C string to set the error message. I’m fine with this, but users cannot safely rely on the error message to differentiate between errors–if that’s something people want to do, it might be worth supplying an error code alongside the message. So, something like this instead of a const char **:

typedef enum {
    OUT_OF_MEMORY,
    INTERPRETER_FINALIZING,
    /* ... */
} PyThreadState_ErrorCode;

typedef struct {
    const char *message;
    PyThreadState_ErrorCode code;
} PyThreadState_Error;

Is there any added benefit to passing an int handle around, instead of internally storing that on the thread state itself? I’m not a fan of this myself, but there’s no consensus around it, and it’s slightly easier to implement. The API would look like this, then:

static int
c_thread_func(pyrun_t *to_run)
{
    const char *err;
    int state = PyThreadState_Ensure(to_run->interp, &err);
    if (state < 0) {
        fprintf(stderr, "Failed to talk to Python: %s\n", err);
        abort();
    }
    PyThreadState_Release(state);
}

This is a more technical point, but it might be worth using an interpreter ID rather than an interpreter state, because a state might get invalidated if the interpreter exits. It’s more clunky for users though, because there has to be an extra PyInterpreterState_GetID call.

Feel free to add any other ideas/concerns here too.

Deprecate `PyGILState`?

I think it would be good to fully deprecate all PyGILState APIs–they’re confusing, buggy, and will have a much better replacement. Even in the public API today, you can replicate everything in PyGILState with a less ambiguous PyThreadState API:

PyGILState_Ensure → PyThreadState_Swap/PyThreadState_New (would instead be PyThreadState_Ensure)
PyGILState_Release → PyThreadState_Clear/PyThreadState_Delete (would instead be PyThreadState_Release)
PyGILState_GetThisThreadState → PyThreadState_Get
PyGILState_Check → PyThreadState_GetUnchecked() != NULL (in fact, we recently removed all usages of assert(PyGILState_Check()) internally).

steve.dower · March 10, 2025, 10:07pm

I’d prefer to see an activate/deactivate API rather than ensure/clear:

PyThreadState *ts
int err = PyThreadState_New(interp, &ts);
if (err) {
    const char *msg;
    PyThreadState_GetErrorMessage(err, msg);
    print(msg);
    exit;
}

int err = PyThreadState_Activate(ts);
if (!err) {
    // do Python stuff
    PyThreadState_Deactivate(ts);
} else { ... }

// when done
PyThreadState_Delete(ts);

(Details open to change - it’s an illustration, not a “yes/no” spec.)

For CPython in its current state, we’d need to fail if you tried to activate a thread state on a different OS thread, because they have strong affinity. But there’s no reason that couldn’t change in the future.

More importantly though, I think this makes it clearer who owns the thread state - a manually created one is controlled by the code that created it, and once it’s deleted it can’t be activated again.

Internally, we can set them up to be deleted when our bootstrap function returns, and can warn/fail if the interpreter state is being closed with threads still active.

I also much prefer just using error codes (they can all be negative) and having a function to get the string over returning structs. It’s basically just making an internal API into a public one (I can’t imagine we wouldn’t have a “get error message” helper function), but I see no harm in letting users choose when they get the static string. We don’t have to force it into their hands.

ZeroIntensity · March 10, 2025, 10:48pm

We pretty much already have a public activate/deactivate API: PyThreadState_Swap. Apart from the error messages, you could pretty much implement it as:

#define PyThreadState_Activate(tstate) PyThreadState_Swap(tstate)
#define PyThreadState_Deactivate(tstate) PyThreadState_Swap(NULL)

I think these could be interesting convenience functions to add for those looking to do some low-level things with thread states, but the added abstraction of ensure/release API has a few benefits:

We can handle nested calls more easily; with something like Activate/Deactivate, an existing thread state will seemingly get detached, but Ensure/Release could (ideally) restore a prior thread state on the Release call.
Especially if we deprecate PyGILState, it would be good to keep it as similar as possible.
Manually dealing with thread states is, unfortunately, not great (right now). The minimum boilerplate to create and delete a thread state is:

PyThreadState *tstate = PyThreadState_New(/* ... */); // allocate it
PyThreadState_Swap(tstate); // switch to it
/* tstate is now usable */
PyThreadState_Clear(tstate); // clear it, but it has to be attached
/* tstate is now unusable */
PyThreadState_DeleteCurrent(); // destroy it and set attached tstate to NULL

Four lines of boilerplate will be a very hard sell for a migration from PyGILState_Ensure .

steve.dower · March 10, 2025, 11:00pm

Then these should figure into the description, especially since there are two scenarios here: we’re going in/out of an existing thread state, or we’re starting a new thread.

The “restore a prior thread state” is the bit that I particularly dislike. That means that we have additional thread state outside of the thread state in order to track the thread state when there’s no thread state!

Now, having additional state within our own call stack is fine, but once we fully escape from Python back into the host app’s code (yes, I’m thinking about embedders, as usual ), and none of our code is on the stack, we can’t know what the thread state should be. So the caller has to preserve it if they want to get back in (or we have to destroy it, but I’m not suggesting that).

And if the caller has to preserve it, then they should be the one to create and destroy it as well, so that it’s obvious they are responsible for it. The “Ensure” operation seems to take that responsibility away from them, compared to doing a “New”.

Hopefully it’s “four lines and correctness” Otherwise, why are we migrating them at all?

ZeroIntensity · March 10, 2025, 11:17pm

Note that we’re not always starting a new thread. We’re just ensuring that we have a thread state that matches the interpreter we want.

I think you’re overthinking the restoration case a little. We need it to switch interpreters, or in APIs where it’s possible that the caller doesn’t have a thread state (e.g., _PyObject_Dump), not necessarily when we’re trying to add an extra layer of state for the thread.

Thanks for bringing this up, I didn’t consider this. I don’t think it should be an issue though–we definitely don’t encourage (or support?) arbitrarily switching the thread state and leaving it. I cannot think of a case where it would be safe (or useful) to change the caller’s thread state.

We could go with the int handle that I mentioned if it’s that bad, as that’s not nearly as annoying as dealing with the allocation–PyGILState_Ensure gives the same kind of responsibility, right?

Fair point, but I would rather try to be nice to end users instead of telling them “be wrong or use a bad API”

da-woods · March 11, 2025, 7:51am

It’d be nice if it was documented that this was the same. I recently spent some time looking at the code trying to work out if it was (and came to the conclusion that it probably was but I wasn’t certain).

In general I’m in favour of something close to PyGILState_Ensure that works with subinterpreters - and specifically the idea that you can nest

state1 = PyGILState_Ensure();
{
    Py_BEGIN_ALLOW_THREADS

    {
        state2 = PyGILState_Ensure();
        PyGILState_Release(state2);
    }
    Py_END_ALLOW_THREADS
}
PyGILState_Release(state1);

and the inner “ensure” will restore the state from the outer one.

pitrou · March 11, 2025, 11:34am

I don’t think an error message is useful. Failures should be truly exceptional. A return code is enough IMHO.
I would expect void PyThreadState_Release(PyInterpreterState*) for consistency, but perhaps that is not necessary.

pitrou · March 11, 2025, 11:38am

What if the current thread already has an existing thread state? Your proposal of calling PyThreadState_New may violate all kinds of exceptions (would the threading module still associate the current thread correctly? I suppose that’s implementation-dependent, and may change from version to version).

The PyGILState APIs exist precisely for cases where it is not known if we’re running in an existing Python thread or not. The new APIs should retain this feature, otherwise they’re indeed pointless.

ZeroIntensity · March 11, 2025, 12:13pm

It’s actually not the same, and PyThreadState_GetUnchecked() != NULL is significantly better. PyGILState_Check will always return 1 if a subinterpreter was ever created; that’s why I went through the effort of removing it throughout the codebase. But yeah, it’s probably worth documenting.

steve.dower · March 11, 2025, 1:46pm

So we’re going to allow different interpreters to have Python threads on the same OS thread? That sounds potentially problematic (even if it’s only possible when you’re embedding) - what happens when the thread is blocked on an OS primitive that Python can’t bypass? The other interpreter also gets locked out? I think this is a can of worms, and as long as we have strict 1-1 for Python thread to OS thread, we ought to infer that mean strict 1-1 for Python interpreter to OS thread as well.

At this point, what we’re doing by “ensuring” is starting a temporary thread, much like start_new_thread. It just happens to be on an existing OS thread. As I say, for restoring a thread (because Python was running, and we called into C and so released our thread, but we’re still on the stack) this is fine.

But “ensure” is mostly valuable when you are just on a brand new OS thread (e.g. in a callback on a native thread pool) and want to run some Python code - and in that case, I think the caller ought to be more thoughtful about their state.

So a save/restore API for within a Python stack, which can save an opaque token to allow restoring the same thread state (possibly in TLS), and also a create/set/destroy API for outside of the Python stack, seem like a good set of APIs that are convenient when convenience is needed, and explicit when explicit is better.

If we’re running in an existing Python thread, then a Get API will tell you (or New can fail, if we are going to bind the threadstate to the current thread at that point, which seems reasonable here). An API that does “get or create and we’ll do automatic cleanup” is too magical for my tastes. At the very least it should return ownership of the new thread state to the caller, but that’s going to be just as complicated.

And I’d argue that if Python code has been run previously on that thread, but we’ve left it, then we shouldn’t presume to bring back the same thread state. For example, if you queue work to a native threadpool, and it gets queued on a particular thread, then you don’t want the previous task’s thread state there - you want a “new” thread. Or if you do want the old thread state, you can bring it back because you created it and so you “own” it.

A lot of these issues came up recently on Discord - worth scrolling up the free-threading chat to the discussion between Trent, Thomas, Sam and myself for more context if you haven’t seen it. (Or if I get motivated enough I might find and summarise it myself, but the key point was that the ownership of the thread was not clear, which is why I’m arguing that a new API should make it clear.)

pitrou · March 11, 2025, 2:03pm

Ok, so everyone is going to rewrite the same tedious boilerplate because we don’t want to provide a nice convenience API?

Or perhaps the same everyone will continue using the broken (but convenient) PyGILState APIs because they can’t be bothered to write boilerplate instead.

Well, nobody forces you to use those APIs if they are not to your taste.
However, their behavior is well-defined and easily documented.

I don’t use Discord, and I’m not really excited at the idea of joining a proprietary discussion service.

steve.dower · March 11, 2025, 5:22pm

The “nice convenience API” has undefined release semantics - if I jump onto a new native thread, “ensure” it has a Python state, and then exit the thread, when does the thread state get freed?

pitrou · March 11, 2025, 5:33pm

PyThreadState_New has the same problem, right?

More generally, if you exit a thread without cleaning up (and without letting a more helpful language such as C++ do it for you), then yes, you’ll be leaking resources.

steve.dower · March 11, 2025, 5:50pm

Except it clearly gives you ownership of the thread state. You’re less likely to treat it as a fire-and-forget API to “just make Python work” because the thread state pointer is yours.

A “get or create” API that basically does “ensure” but also returns an additional flag saying whether you own it or not would also work, but even from that description I think it’s obvious that it’s more complex.

But probably what’s going on here is that I can’t think of any scenarios where I’m writing a native function without knowing whether to expect Python to already be running on the thread or not. In every case I’ve come across the only time it’s been ambiguous has involved thread pools and the fact that PyGIL_Ensure leaves threads “active” when they ought to have been cleaned up.^[1]

So if you’ve got a scenario in mind where your native code starts running on an existing thread and you either want (a) a new Python thread state or (b) to interrupt an existing one without coordination, and can’t determine ahead of time which it’ll be, I’d love to learn more about it.

I’ve also encountered while debugging faults in the hosting code, but I’m not counting those ↩︎

da-woods · March 11, 2025, 10:04pm

Cython uses PyGILState_Ensure all the time to implement with gil: blocks. Because they’re user-written functions we don’t know where they’ll be called from - it’s possible that a “nogil” function may be called:

on a thread where no thread state has ever been set up,
on a thread that currently holds the GIL (because a “nogil” function only says that it doesn’t need the GIL - it doesn’t require you not to have it),
on a thread where the GIL has been held, but has been released by a with nogil: block.

2 and 3 are most common, but all can happen. That’s just a consequence of the fact that we’re generating code in isolation without seeing the full context that it’s used in.

While it’d be possible for us to track some of this internally, Cython functions are generic Python and/or C callables so it’s not possible for Cython alone to track this. It might also have been possible to require the user to pass an existing thread-state down through their call-stack to give to with gil blocks, but adding that now would be a big change.

Where this currently falls down is with sub-interpreters, which hasn’t been a major concern but which we’re just starting to support. We’ll probably have to be slightly less flexible there and require the user to at pass the interpreter state for us though.

We do also treat PyGILState_Ensure and PyGILState_Release as a pair that are always generated together, so I don’t think there are any specific clean-up issues compared to any other API.

ZeroIntensity · March 11, 2025, 10:33pm

Well, there’s not that much we can do at this point. The interpreter has been determined by the thread state since 3.12 (maybe even earlier?), and all the current subinterpreter code relies on this idea. I hope that if it really is an issue, somebody would’ve noticed years ago.

Most of these ideas already exist and can be used right now, so there’s not much to discuss there. I definitely do see use for low-level manipulation of thread states, but I don’t see the need for it to be the only option. (Well, it’s the only option right now, and nobody uses it .)

I think what you want is an API like PyThreadState_Swap that can properly fail like PyThreadState_Ensure would–which is something that’s totally worth adding as part of the new API–just not as the only thing we expose.

h-vetinari · March 11, 2025, 10:46pm

C now has a draft technical specification (basically a preview feature for the standard) for defer (rationale), which tries to address that topic. Of course that’ll only be generally available in C2y or later, but at least it’s on the horizon – and positive feedback from the community might accelerate it.

steve.dower · March 11, 2025, 11:46pm

Okay, generated code is a good one (which I should’ve thought of, since I do use this quite often, though my own Cython code could certainly specify whether it’s expecting to be run as part of an existing thread or in a new one, if that were required).

The main concern was about requiring more “boilerplate” code, which is far less of an issue with a code generator than when expecting humans to write it by hand. Having to attempt a “get” (essentially, a thread-local lookup in a slot private to CPython) before doing a “new” (and knowing that you’re responsible for freeing this new thread state) is hardly a big step from calling a function that does those same things apart from telling you that you now own a thread state.

One of the needs in the Discord discussion I mentioned earlier was to actually preserve the thread state beyond the length of the call, so the caller wouldn’t always release it if they wanted to use it again. The “ensure/release” API doesn’t allow for that today - I’d hope that a new API would allow for that kind of thing.

ZeroIntensity · March 12, 2025, 1:13am

Yeah, but that would just be done via the raw PyThreadState_New function and friends. Why would PyThreadState_Ensure have to cover that base too?

pitrou · March 12, 2025, 7:30am

It actually comes quite easily when you have some code that can either be run synchronously (in an existing Python thread) or in the background / in parallel (in a potentially non-existing Python thread), depending on various execution specifics that the function itself doesn’t control.

For example in Arrow C++ we have IO abstractions that can be implemented for different backends:

github.com/apache/arrow

cpp/src/arrow/io/interfaces.h

2316ce88d


      
          class ARROW_EXPORT InputStream : virtual public FileInterface, virtual public Readable {
           public:
            /// \brief Advance or skip stream indicated number of bytes
            /// \param[in] nbytes the number to move forward
            /// \return Status
            Status Advance(int64_t nbytes);
          
            /// \brief Return zero-copy string_view to upcoming bytes.
            ///
            /// Do not modify the stream position.  The view becomes invalid after
            /// any operation on the stream.  May trigger buffering if the requested
            /// size is larger than the number of buffered bytes.
            ///
            /// May return NotImplemented on streams that don't support it.
            ///
            /// \param[in] nbytes the maximum number of bytes to see
            virtual Result<std::string_view> Peek(int64_t nbytes);
          
            /// \brief Return true if InputStream is capable of zero copy Buffer reads
            ///

This file has been truncated. show original

And in PyArrow (the Python bindings for Arrow C++) we have an implementation of these IO abstractions that delegate to a Python file-like object. This is so that PyArrow users can use Arrow C++ functionality with arbitrary file-like objects, including their own (for example you could probably call the Arrow CSV or JSONL reader on a ZipFile entry).

Since the C++ IO abstractions can be called in any context for the purpose of reading/writing data, whether they are called from a Python thread or, say, a C++ thread pool thread, entirely depends on what functionality is being called and how. This is not under control of the IO routines themselves.

So, for example, the Tell() implementation for Python file-like objects:

github.com/apache/arrow

python/pyarrow/src/arrow/python/io.cc

2316ce88d


      
          Result<int64_t> PyReadableFile::Tell() const {
            return SafeCallIntoPython([=]() -> Result<int64_t> { return file_->Tell(); });
          }

wraps its underlying functionality in the SafeCallIntoPython wrapper that ensures that Python APIs can be safely called from that point:

github.com/apache/arrow

python/pyarrow/src/arrow/python/common.h

2316ce88d


      
          // A helper to call safely into the Python interpreter from arbitrary C++ code.
          // The GIL is acquired, and the current thread's error status is preserved.
          template <typename Function>
          auto SafeCallIntoPython(Function&& func) -> decltype(func()) {
            PyAcquireGIL lock;
            PyObject* exc_type;
            PyObject* exc_value;
            PyObject* exc_traceback;
            PyErr_Fetch(&exc_type, &exc_value, &exc_traceback);
            auto maybe_status = std::forward<Function>(func)();
            // If the return Status is a "Python error", the current Python error status
            // describes the error and shouldn't be clobbered.
            if (!IsPyError(::arrow::internal::GenericToStatus(maybe_status)) &&
                exc_type != NULLPTR) {
              PyErr_Restore(exc_type, exc_value, exc_traceback);
            }
            return maybe_status;
          }

… and PyAcquireGIL there is just a RAII wrapper around the PyGILState_Ensure/PyGILState_Release

(yes, at some point we noticed that taking the GIL is not sufficient and you also need to ensure you don’t have an error status set)

A new API for ensuring/releasing thread states

History

Proposal

Deprecate PyGILState?

Deprecate `PyGILState`?