A new API for ensuring/releasing thread states

colesbury · March 12, 2025, 3:52pm

I’m not entirely convinced by the use cases and I’m concerned about a number of behavioral details. I think it’s important that any new API interacts correctly with existing APIs like PyGILState_Ensure()/Release(). Real use cases often involve multiple C API extensions so any given thread might use a mix of existing APIs like PyGILState_Ensure and new APIs like PyThreadState_Ensure.

I don’t think the linked PR is in a good enough state to evaluate, and I think having an actual implementation is important here to understand the behavior.

This API and PyGILState_Ensure() center around thread-local variables. For PyGILState_Ensure, it’s autoTSSkey, which is the potentially inactive PyThreadState per-OS thread. Here I think you need something that’s per (OS thread x interpreter). How is that going to interact with PyGILState_Ensure()? The PR currently uses current_fast_get() (_Py_tss_tstate) – that’s definitely not correct.
What happens if PyThreadState_Ensure(interp2, ...) is called while a thread state from a different interpreter is active? What does PyThreadState_Release() do in that case? If it reactivates the original thread state, then it might fail. Can PyThreadState_Release() fail?
This API isn’t enough to reliably avoid hanging C++ initialized threads ^[1] during Python shutdown outside of trivial cases. There’s a good deal of code that calls Py_BEGIN/END_ALLOW_THREADS internally and Py_END_ALLOW_THREADS ^[2] will hang during Python shutdown in 3.14+.

Thread states created by PyGILState_Ensure() or PyThreadState_New() outside of the threading module. ↩︎
or PyEval_RestoreThread() or PyEval_AcquireThread() ↩︎

pitrou · March 12, 2025, 3:59pm

It’s a bit unfortunate to split the discussion like this and then have bidirectional links

Agreed. This is certainly a bit tedious implementation-wise, but should be doable anyway.

It should certainly ignore the “thread state from the different interpreter”. That’s the whole point of passing the interpreter explicitly.

Hmm, why would it fail and in which circumstances? I’m a bit surprised that anything should be “reactivated” actually, since interpreters really have distinct states. The only remaining global thing should be the legacy PyGILState.

Why?

What does this have to do with the new API? “The new API is not good enough because old APIs suck” is not a terrific argument.

colesbury · March 12, 2025, 4:53pm

A given OS thread can deal with multiple interpreters, but only one at a time. There can only be one active thread state per-OS thread. You have to do something if there’s an active thread state from a different interpreter – you can’t just ignore it! You could make it an error or you can try to deactivate it. The current PR tries to deactivate it.

From what I understand, there are a few motivations of this new API:

To avoid crashes in PyGILState_Ensure(). This is not an issue in 3.14.
To handle multiple interpreters. This is not going to work correctly because of this check, but it’s fixable. But you can also work with multiple interpreters with the current APIs.
To avoid the thread hanging in PyGILState_Ensure() in 3.14+. While the API achieves that in a very narrow sense, it isn’t actually useful for doing that in real world code. You will still encounter cases that hang the thread during Python shutdown if you have non-trivial code between PyThreadState_Ensure() and PyThreadState_Release().

To expand on (3), consider something like:

int err = PyThreadState_Ensure(interp);
if (err == 0) {
  PyObject_CallFunctionObjArgs(obj, ...);
  PyThreadState_Release();
}

Given Python’s implementation, I think you should assume that:

PyObject_CallFunctionObjArgs() will sometimes run Python’s GC.
The GC will call arbitrary code via finalizers, some of which is out of your library’s control.
Some of that code may call Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS internally. This includes code from Python’s standard library.
Py_END_ALLOW_THREADS will hang the thread in 3.14 during Python interpreter shutdown.

So even if PyThreadState_Ensure() doesn’t hang the thread during shutdown, you will have a really hard time reliably avoiding that for the Python code you run between PyThreadState_Ensure() and PyThreadState_Release().

My argument is basically:

Being able to reliably avoid hanging the thread requires a much more massive API change than is currently being proposed.
I don’t think being able to unreliably clean-up without hanging the thread is useful enough to justify a new API.
Hanging the thread during interpreter shutdown, which we do in 3.14, avoids crashes and is probably good enough!

pitrou · March 12, 2025, 7:18pm

We need a convenience API that replaces the PyGILState API and works with multiple interpreters. Do you have an alternative proposal?

colesbury · March 12, 2025, 10:14pm

No, I definitely don’t have well thought out alternate proposal. This might be a good starting point for that, but I think it’s going to be pretty difficult to implement correctly.

In particular, while the main interpreter’s PyInterpreterState is a global variable with the same lifetime as the process, subinterpreters can have much shorter lifetime. It’s not clear to me how one can ensure that the interp passed to PyThreadState_Ensure() is still valid. Subinterpreters have mostly avoided this problem so far by not supporting daemon threads or PyGILState_Ensure. I think you will additional need new APIs to support calling PyThreadState_Ensure() safely on subinterpreters.

ZeroIntensity · March 12, 2025, 10:57pm

Oh, right. We should probably make it fail-able if we want it to restore the previous thread state.

Oh! When did that get fixed, and how? Looking at the main branch, it still looks like PyGILState_Ensure will fail if called after finalization due to autoTSSkey being deallocated–ideally, PyThreadState_Ensure would not have such an issue.

And I have an open PR to fix that . I don’t think that should be a blocker here.

Probably, but I’d rather we do this one API at a time, starting with the mess of PyGILState. There’s some lingering bits and pieces in the codebase of Eric’s experimentation with reference counting for interpreter states, so if we could get that finished, then PyThreadState_Ensure should work reliably with daemon threads. Something like this:

int
PyThreadState_Ensure(PyInterpreterState *interp, /* ... */)
{
    _PyRuntimeState *runtime = interp->runtime;
    HEAD_LOCK(runtime);
    if (!make_sure_interp_exists(interp)) {
       return the_failure_mechanism(/* ... */)
    }
    _PyInterpreterState_IDIncref(interp);
    HEAD_UNLOCK(runtime);
    // ...
}

PyThreadState_Release would also decref the interpreter accordingly.

I’m primarily thinking out loud here, but I guess we could also add a variation of Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS that can fail? (I think we do need some sort of semantically clearer alternative to those macros for free-threading in the long-term; I’ve heard from several people that it’s only going to get more and more difficult to remember those in a GIL-deficient world.)

Or, instead of treating all native threads as daemon, perhaps we could add an API that tells Python to wait on a thread (state) before finalizing. Something like that would play pretty nicely with PyThreadState_Ensure.

eric.snow · March 13, 2025, 1:11am

I’m pretty sure this isn’t quite right. If there is already an active thread state associated with the current thread then PyGILState_Ensure() will make sure the GIL is held and move on. Otherwise the last thread state created or used in the current thread is used. If the current thread has never had a thread state then we create a new one using the main interpreter. Threads spawned by a subinterpreter (meaning through the threading module) should not encounter any unexpected behavior.

The unsolved tricky part involves that case where PyGILState_Ensure() creates a new thread state using the main interpreter, especially relative to callbacks in external C libraries (e.g. readline). Here’s the scenario for an extension module (or embedded CPython):

the extension defines a callback function that does some operation relative to the module’s state
- the callback uses PyGILState_Ensure()/PyGILState_Release() to ensure the runtime’s in the correct state in the current thread
- (Bonus wrinkle for extra credit: the callback takes a PyObject * argument)
the extension registers that callback with some external C library
later, the library calls the callback due to some event
the thread in which the library calls it has never seen Python before, so PyGILState_Ensure() creates a new thread state under the main interpreter and switches to it
the callback does something relative to the extension module
PyGILState_Release() destroys the thread state created by PyGILState_Ensure()

As long as the main interpreter is the only one, everything’s fine; that’s the designed behavior. The following cases are not fine using PyGILState_Ensure():

crash: the callback is triggered in an interpreter that doesn’t have the module imported yet
crash/race: in a subinterpreter, the callback was registered along with an object owned by the subinterpreter (must not be used by main interpreter)
mostly not an option: the extension wants the callback to target a specific subinterpreter
unexpected behavior: (related to the previous one) a subinterpreter’s copy of the extension registers the callback but PyGILState_Ensure() uses the main interpreter anyway
others?

That looks very similar to the motivation behind PEP 311, where PyGILState_Ensure() comes from. Just like then, each of the problematic cases can be worked around manually (without PyGILState_Ensure()). I expect those are the cases we would want to address, for the same reason they were solved originally for the main interpreter in PEP 311.

These are separate issues (though related to one another), but certainly worth addressing. It would definitely make sense to accommodate these in any API that replaces PyGILState_Ensure().

This makes sense because already we pay for a lot of complexity due to having an implicit “current” thread state stored in a thread-local.

The question of naming is somewhat orthogonal to what any new API needs to solve, though I’m not opposed to some light bikeshedding.

eric.snow · March 13, 2025, 1:56am

Why not return PyStatus?

ZeroIntensity · March 13, 2025, 2:03am

Sorry, I should have been clearer! I’m talking about native threads, not the threading module. threading has worked perfectly in subinterpreters from when I’ve played with it

Think of a case like this:

static int
c_thread(MySpecialThreadData *whatever)
{
    PyGILState_STATE gil = PyGILState_Ensure();
    /* Main interpreter, not subinterpreter! */
    store_my_object(whatever);
    PyGILState_Release(gil);
}

static PyObject *
something(PyObject *self, PyObject *arg)
{
    /* In subinterpreter. */
    start_c_thread(c_thread, /* ... */);
    PyObject *res;
    Py_BEGIN_ALLOW_THREADS;
    res = get_object_from_thread();
    Py_END_ALLOW_THREADS;   
    // Uh oh, *res* is from the main interpreter, not the
    // subinterpreter!
    return res;
}

Ideally, PyThreadState_Ensure would solve this issue by forcing the user to actually specify what interpreter they want.

That works too, it’s just less fun to use because you have to store an extra variable rather than a simple < 0 check. I originally suggested PyStatus in an earlier design separate from this thread.

eric.snow · March 13, 2025, 2:09am

Perhaps I’m misunderstanding you, but what you’ve described is exactly how the status quo works, and has for decades, and is fundamental to using multiple interpreters. The alternative would be to create a new OS thread any time you want to run something in a subinterpreter.

colesbury · March 13, 2025, 1:44pm

You’re right. That still needs to be fixed.

You don’t need to tackle multiple use cases at once, but you should propose all the APIs needed to get a single use case working correctly end to end. Otherwise, you risk glossing over fundamental problems with the API that you only discover later.

The make_sure_interp_exists() function doesn’t seem like something that one can write in C. You can’t generally check if a variable is a pointer to freed memory.

I suspect that PyThreadState_Ensure() should probably take a handle to an interpreter that can be safely invalidated, rather than a PyInterpreterState *. That would require new APIs.

eric.snow · March 13, 2025, 3:22pm

Yeah, that’s why I added the id to interpreter states.

ZeroIntensity · March 13, 2025, 3:25pm

I probably should have been clearer on that. All the valid interpreters are stored in a linked list on the runtime state, so you could iterate through and check if the passed pointer is in there; as long as you hold the lock, the interpreter state can’t get freed while validating it. (Granted, it’s not a particularly efficient search, but there are ways to optimize it.)

IDs are another option, and we do already have a public API for getting them from a PyInterpreterState *. Do we need to expose the interpreter incref/decref APIs as well?

colesbury · March 13, 2025, 3:51pm

No, you can’t safely compare a live pointer to a dead pointer. That’s undefined behavior and more practically the memory storing the dead PyInterpreterState may have been reused for a new PyInterpreterState.

If I understand correctly, this is referring to operations like _PyInterpreterState_IDIncref(interp) and _interpreters.incref(interp_id). I don’t think that’s what you want here. Objects like PyReadableFile from the Apache Arrow example shouldn’t keep the interpreter alive even though they will need a reference to the interpreter. Keeping the interpreter alive could lead to deadlock. (Objects in the main interpreter do not prevent the main interpreter from shutting down.)

I think you want something that behaves like a weak reference to the interpreter. It shouldn’t keep the interpreter alive, but it should be safe to use with PyThreadState_Ensure even after the interpreter ends.

Here is a possible sketch:

PyInterpreterHandle PyInterpreterHandle_Open(PyInterpreterState *state);
PyInterpreterHandle_Close(PyInterpreterHandle handle);
PyStatus PyThreadState_Ensure(PyInterpreterHandle handle);
PyStatus PyThreadState_Release(PyInterpreterHandle handle);

// Possible use in Apache Arrow
PyReadableFile::PyReadableFile(PyObject* file) {
  handle = PyInterpreterHandle_Open(PyInterpreterState_Get());
}

PyReadableFile::~PyReadableFile() {
  PyInterpreterHandle_Close(handle);
}

Result<int64_t> PyReadableFile::Tell() const {
  return SafeCallIntoPython(handle, [=]() -> Result<int64_t> { return file_->Tell(); });
}

auto SafeCallIntoPython(PyInterpreterHandle handle, Function&& func) -> decltype(func()) {
  PyStatus status = PyThreadState_Ensure(handle);
  if (<error>) { throw ... }
  ...
  func();
  ...
  status = PyThreadState_Release(handle); // error check
  if (<error>) { throw ... }
}

Open interpreter handles are stored in a linked list on PyInterpreterState
Handles can live longer than the PyInterpreterState – if so, they are marked as invalid
PyThreadState_Ensure() on an invalid handle safely returns an error.
PyInterpreter_End(interp) marks every open handle as invalid
PyInterpreterHandle_Open(), PyThreadState_Ensure(), and PyInterpreter_End() all operate under the same lock.

ZeroIntensity · March 13, 2025, 8:33pm

I forgot pointers could do that. IDs it is, then.

I think we could reuse IDs for that purpose. An ID doesn’t necessarily have to keep the interpreter alive, it can just be used to safely check if the interpreter still exists.