Ergonomics of signal checks with detached thread state

zwol · June 3, 2025, 7:52pm

I’m working on an addition to the C-API that would make it possible to check for pending Unix signals (most importantly, the SIGINT delivered when the user presses control-C to interrupt a running calculation) from inside a block of compiled extension code that has detached its thread state (or, in the older terminology, released the global interpreter lock). For detailed background see this PyCon talk and this pending design request to the C-API working group.

While writing up the unresolved design issues for the C-API working group, it occurred to me that feedback from extension developers would be really helpful with some of them, so I’m asking for that here.

Background

The essence of the proposed new feature is, suppose you have this code

void random_fill(bitgen_t *rng, npy_intp cnt, double *out)
{
    Py_BEGIN_ALLOW_THREADS
    for (npy_intp i = 0; i < cnt; i++) {
        out[i] = next_double(rng);
    }
    Py_END_ALLOW_THREADS
}

and you want to make it interruptible, right now you need to write that like this

int random_fill(bitgen_t *rng, npy_intp cnt, double *out)
{
    int interrupted = 0;
    Py_BEGIN_ALLOW_THREADS
    for (npy_intp i = 0; i < cnt; i++) {
        out[i] = next_double(rng);
        Py_BLOCK_THREADS
        interrupted = PyErr_CheckSignals();
        Py_UNBLOCK_THREADS
        if (interrupted) break;        
    }
    Py_END_ALLOW_THREADS
    return interrupted;
}

which has unacceptably high overhead, to the point where I told people in my talk to check the system clock and do the block/check/unblock dance only once a millisecond or so. But PyErr_CheckSignals only needs an attached thread state if there is a signal pending. The part of its work that determines whether there’s a signal pending, can be done without the thread state being attached. So we could add an API that lets you write something like this instead,

int random_fill(bitgen_t *rng, npy_intp cnt, double *out)
{
    int interrupted = 0;
    Py_BEGIN_ALLOW_THREADS
    for (npy_intp i = 0; i < cnt; i++) {
        out[i] = next_double(rng);
        interrupted = PyErr_CheckSignalsDetached();
        if (interrupted) break;        
    }
    Py_END_ALLOW_THREADS
    return interrupted;
}

with negligible cost.

Input needed from extension developers

There are two pending design decisions that will affect the ergonomics of the new API, and I’ve never written a complicated C extension myself so I have no sense for what’s likely to be a problem. These decisions are described in terms of what the API will be in the C-API workgroup thread linked up top, but here I want to reframe them in terms of ergonomics. Suppose you are adding calls to PyErr_CheckSignals and/or the hypothetical new PyErr_CheckSignalsDetached to your extension. You need to ensure that every loop that can run for more than like 10ms (1ms is better) contains a signal check.

How often would you need to put signal checks in places that are dynamically but not lexically inside a Py_BEGIN_ALLOW_THREADS … Py_END_ALLOW_THREADS block? This could, for example, come up if the Py_BEGIN_ALLOW_THREADS … Py_END_ALLOW_THREADS block appears in a function that’s directly part of your module interface, with calls to other functions inside that block. How complicated do the nested function calls get? Is there ever recursion involved?

This matters because one of the unresolved design decisions is whether the new function should take a PyThreadState* argument, and the big way I see for that to be a problem is if people frequently need to call the new function from places that aren’t lexically inside a Py_BEGIN_ALLOW_THREADS … Py_END_ALLOW_THREADS block. (It’s semi-undocumented but there is a PyThreadState* value available to all code that is lexically inside such a block.)
How often would you need to put signal checks in places where it’s not obvious to a human or maybe even unknown at compile time whether the code is running with or without an attached thread state?

The other big unresolved design decision is whether the new function should only be callable with the thread state detached, and the big way I see for that to be a problem is if maybe it’s not always easy, or even possible, to tell which of PyErr_CheckSignals and PyErr_CheckSignalsDetached should be used.

Other feedback on the proposal is also welcome.

encukou · June 9, 2025, 3:53pm

Hi,
It’s hard to reply on behalf of all developers, and (as suggested by this issue existing) it’s possible that not that many people think about this.

When calling non-Python libraries, there’s always the possibility that the library will simply not support this use case.
Libraries with long-running tasks might allow setting callbacks (e.g. for drawing progress bars) and possibly even callbacks that can cancel the task.

For question 1, I guess people should be calling PyErr_CheckSignals in pretty much any callback their library exposes, but today, they aren’t very likely to do it.

(It’s also notable that PyErr_CheckSignals will raise an exception, which must bubble back to the caller, and must be saved/restored around most calls to Python API, like other callbacks.)

Perhaps we could expose the inexpensive checks, as a lower-level building block, which for now I’ll call PyErr_CheckPossibleSignalsDetached. It wouldn’t need an attached thread state, would never reattach it, but could be called with one, and it would return 1 if its caller should reattach and call PyErr_CheckSignals. It would be allowed to give false positives.
With it, random_fill would look like this:

int random_fill(bitgen_t *rng, npy_intp cnt, double *out)
{
    Py_BEGIN_ALLOW_THREADS
    for (npy_intp i = 0; i < cnt; i++) {
        out[i] = next_double(rng);
        int possibly_interrupted = PyErr_CheckPossibleSignalsDetached();
        if (possibly_interrupted) {
            Py_BLOCK_THREADS
            int interrupted = PyErr_CheckSignals();
            Py_UNBLOCK_THREADS
            if (interrupted) break;
        }
    }
    Py_END_ALLOW_THREADS
    return interrupted;
}

This could give some more options to low-level code - it could skip the PyErr_CheckSignals call, and return (without an active Python exception) to:

request that the entire operation should be repeated (similar to Unix EINTR), or
signal partial success, requesting another call to finish the operation (similar to lower-level write).

That could solve the “hard” cases you are asking about.
Then, a PyErr_CheckSignalsDetached that takes a thread state as argument would be a convenience wrapper meant for general use.

Or add a PyErr_CheckSignalsDetached for general use now, and then worry about the “hard cases” later.