I’m working on an addition to the C-API that would make it possible to check for pending Unix signals (most importantly, the SIGINT delivered when the user presses control-C to interrupt a running calculation) from inside a block of compiled extension code that has detached its thread state (or, in the older terminology, released the global interpreter lock). For detailed background see this PyCon talk and this pending design request to the C-API working group.
While writing up the unresolved design issues for the C-API working group, it occurred to me that feedback from extension developers would be really helpful with some of them, so I’m asking for that here.
Background
The essence of the proposed new feature is, suppose you have this code
void random_fill(bitgen_t *rng, npy_intp cnt, double *out)
{
Py_BEGIN_ALLOW_THREADS
for (npy_intp i = 0; i < cnt; i++) {
out[i] = next_double(rng);
}
Py_END_ALLOW_THREADS
}
and you want to make it interruptible, right now you need to write that like this
int random_fill(bitgen_t *rng, npy_intp cnt, double *out)
{
int interrupted = 0;
Py_BEGIN_ALLOW_THREADS
for (npy_intp i = 0; i < cnt; i++) {
out[i] = next_double(rng);
Py_BLOCK_THREADS
interrupted = PyErr_CheckSignals();
Py_UNBLOCK_THREADS
if (interrupted) break;
}
Py_END_ALLOW_THREADS
return interrupted;
}
which has unacceptably high overhead, to the point where I told people in my talk to check the system clock and do the block/check/unblock dance only once a millisecond or so. But PyErr_CheckSignals
only needs an attached thread state if there is a signal pending. The part of its work that determines whether there’s a signal pending, can be done without the thread state being attached. So we could add an API that lets you write something like this instead,
int random_fill(bitgen_t *rng, npy_intp cnt, double *out)
{
int interrupted = 0;
Py_BEGIN_ALLOW_THREADS
for (npy_intp i = 0; i < cnt; i++) {
out[i] = next_double(rng);
interrupted = PyErr_CheckSignalsDetached();
if (interrupted) break;
}
Py_END_ALLOW_THREADS
return interrupted;
}
with negligible cost.
Input needed from extension developers
There are two pending design decisions that will affect the ergonomics of the new API, and I’ve never written a complicated C extension myself so I have no sense for what’s likely to be a problem. These decisions are described in terms of what the API will be in the C-API workgroup thread linked up top, but here I want to reframe them in terms of ergonomics. Suppose you are adding calls to PyErr_CheckSignals
and/or the hypothetical new PyErr_CheckSignalsDetached
to your extension. You need to ensure that every loop that can run for more than like 10ms (1ms is better) contains a signal check.
-
How often would you need to put signal checks in places that are dynamically but not lexically inside a
Py_BEGIN_ALLOW_THREADS
…Py_END_ALLOW_THREADS
block? This could, for example, come up if thePy_BEGIN_ALLOW_THREADS
…Py_END_ALLOW_THREADS
block appears in a function that’s directly part of your module interface, with calls to other functions inside that block. How complicated do the nested function calls get? Is there ever recursion involved?This matters because one of the unresolved design decisions is whether the new function should take a
PyThreadState*
argument, and the big way I see for that to be a problem is if people frequently need to call the new function from places that aren’t lexically inside aPy_BEGIN_ALLOW_THREADS
…Py_END_ALLOW_THREADS
block. (It’s semi-undocumented but there is aPyThreadState*
value available to all code that is lexically inside such a block.) -
How often would you need to put signal checks in places where it’s not obvious to a human or maybe even unknown at compile time whether the code is running with or without an attached thread state?
The other big unresolved design decision is whether the new function should only be callable with the thread state detached, and the big way I see for that to be a problem is if maybe it’s not always easy, or even possible, to tell which of
PyErr_CheckSignals
andPyErr_CheckSignalsDetached
should be used.
Other feedback on the proposal is also welcome.