Subinterpreters with multiple threads

We have a setup with a Python interpreter embedded into a server binary using the C API. From this, we create multiple subinterpreters; for each piece of work to process, we find the correct subinterpreter, restore its thread (via PyEval_RestoreThread), have it do some work, then save the thread again (PyEval_SaveThread).

We’re finding that this doesn’t work any more in 3.12 (it crashes at various points on restoring the thread). It appears that this is happening because we’ve initialised the subinterpreter on one OS thread, but are now restoring the Python thread on another OS thread. The docs mention this case, and suggest PyGILState_Ensure(), but also note that assumes a single global interpreter and shouldn’t be mixed with subinterpreters. From experimenting, it doesn’t seem to make any notable difference.

Might anyone have any guidance on what the best way to manage multiple subinterpreters is here? Is there some other API that can be used to guarantee that the OS thread is ready to be used?
Theoretically one could do it by ensuring the OS thread that initialises the subinterpreter is always the one that executes it, but that will be very difficult to manage in a multi-threaded server; I’m hoping to find an alternative…

Thanks in advance!