Understanding the GIL and creating new ThreadState instances

I’m trying to understand the various APIs around managing the Python GIL and ThreadState. I maintain a library that does something similar to greenlet, in that we manipulate the Python ThreadState object to achieve async processing.

Prior to 3.11, we could directly manipulate the PyThreadState->frame member, as seen here, but that member was removed as part of an overhaul to the C API. There appears to no longer be a public way to modify it, only retrieve it via PyThreadState_GetFrame.

So I’ve been trying to get some Python3.11 code working - my best attempt so far is:

# Create a new python ThreadState, release the GIL, swap to new ThreadState, grab the GIL
PyThreadState *thread_state = PyThreadState_GET();
PyThreadState *new_threadstate = PyThreadState_New(thread_state->interp);
PyThreadState *eval_threadstate = PyEval_SaveThread();
thread_state = PyThreadState_Swap(new_threadstate);
gstate = PyGILState_Ensure();

# Call arbitrary Python code
...
PyObject_CallFunctionObjArgs(...)
...

# Put original threadstate back, delete new threadstate, restore original GIL state
new_threadstate = PyThreadState_Swap(thread_state);
PyThreadState_Clear(new_threadstate);
PyThreadState_Delete(new_threadstate);
PyEval_RestoreThread(eval_threadstate);
PyGILState_Release(gstate);

We end up hitting an assert that was newly added to Python3.11, found here:

python: Python/ceval.c:6402: _PyEvalFrameClearAndPop: Assertion `(PyObject **)frame + frame->f_code->co_nlocalsplus + frame->f_code->co_stacksize + FRAME_SPECIALS_SIZE == tstate->datastack_top' failed.

I don’t really understand this assert and was hoping that someone may be able to shed some light on it. I found a comment on cpython issue 93252 #issuecomment-1138682093 (apologies, I’m limited in how many links I can add to this post…) that implies it may be that the frame and ThreadState have become detatched but I can’t see how; it would imply that the PyThreadState_Swap isn’t working?

For reference, I was able to get a simplified version of this working in most cases:

# Create a new python ThreadState, swap to new ThreadState
PyThreadState *thread_state = PyThreadState_GET();
PyThreadState *new_threadstate = PyThreadState_New(thread_state->interp);
thread_state = PyThreadState_Swap(new_threadstate);

# Call arbitrary Python code
...
PyObject_CallFunctionObjArgs(...)
...

# Put original threadstate back, delete new threadstate
new_threadstate = PyThreadState_Swap(thread_state);
PyThreadState_Clear(new_threadstate);
PyThreadState_Delete(new_threadstate);

But when I try and run this alongside complex projects, like PyQt5, I start seeing mysterious segfaults that appear to be happening on the translation layer between C and Python. I started using Python Debug builds and received warnings about not holding the GIL during memory assignment, which led to the first code snippet trying to solve that.

Thank you for any pointers on this topic!