Dear all,
tricky corner cases sometimes arise in huge and complicated (think: TensorFlow, PyTorch, etc.) mixed C/C++/Python codebases.
There are too many situations to all cover, but for simplicity let’s say that:
- Some code runs in C/C++, and it needs to use the Python C API to update some state.
- To do so safely, it first calls
PyGILState_Ensure()
.
Often, the triggers are asynchronous and diverse:
- A kernel has finished running on the GPU
- A network packet was received.
- A thread has quit, and the C++ library is executing static finalizers of thread local storage.
- etc…
Now consider that the Python interpreter has shut down by the time that this happens, which means that it is not able to service something as basic as PyGILState_Ensure()
or Py_DECREF
anymore? What happens then? The docs say that PyGILState_Ensure()
will then terminate the thread. Sadly, there isn’t a reliable way to do this in C++, and it usually segfaults the application :-(. If such events can occur, there is a long tail of spurious crashes that are difficult to reproduce and fix.
Python has an API that is supposedly an answer to this problem. One can call Py_IsFinalizing()
. If that returns false
, the interpreter is in the process of being shut down. Unfortunately, this API doesn’t solve the problem.
Consider a pattern like this:
if (!Py_IsFinalizing()) { /* #1 */
PyGILState_STATE state = PyGILState_Ensure(); /* #2 */
/// ....
PyGILState_Release(state);
}
Just because we succeeded at #1
doesn’t mean that #2
is still safe to execute. The main thread might have made further progress in the meantime, causing the interpreter to fully shut down. It’s a race condition.
The second reason is that we often still want to use the Python C API even when Py_IsFinalizing()
is true
. That’s because shutdown logic can trigger asynchronous events that cause some resource to be finally deleted. As long it is still possible, we should deliver Py_DECREF()
calls etc, so that the garbage collector can clean things up.
What I am really missing is an API that looks like the following:
PyGILState_STATE state = PyGILState_EnsureOrSafelyFail(); /* will never crash */
if (state != PyGILState_FAILURE) {
/* Python API safe to use until the release statement below */
/* ... do stuff ... */
PyGILState_Release(state);
} else {
// Oh well. Do the best that we can do here without talking to Python
}
To my knowledge, Python doesn’t have something like this at the moment. Is it possible to provide such an API?
Thanks!