PyThread cannot capture signal spawned by CPython

Hi, I’m implementing a logic in a thread spawned by PyThread_start_new_thread(), inside which I need to use sigsetjmp to reroute the SIGSEGV to a registered handler (I know it’s not the best practice though).
This is how you normally do it in C:

static sigjmp_buf jump_buffer;
void sigsegv_handler(int sig)
{
    fprintf(stderr, "Caught segmentation fault, jumping to recovery point...\n");
    siglongjmp(jump_buffer, 1);
}

void enable_sigsegv_handler()
{
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_flags = SA_SIGINFO;
    sa.sa_handler = sigsegv_handler;
    sigemptyset(&sa.sa_mask);
    sigaction(SIGSEGV, &sa, NULL);
}

int main()
{
    if (sigsetjmp(jump_buffer, 1) == 0)
    {
        int *ptr = NULL;
        *ptr = 123; // generate a segfault
    }
    else
    {
        printf("Recovered from a segmentation fault at element %d.\n", i);
    }
}

However, when I have the exact same logic inside the spawned PyThread, it’s not able to capture the SIGSEGV but still crashes. It doesn’t matter whether I place enable_sigsegv_handler() in my thread or in CPython main thread logic.

I noticed CPython has some dedicated logic in Modules/faulthandler.c. Is it rewriting the registered signal handler? Or is some compile switch that is preventing me from this? Thanks.

The common wisdom is, you can’t recover from a segfault. Even if it’s in a thread, the entire process environment is corrupted. Signal handlers are there to help you gracefully respond to a signal, but when that signal is SIGSEGV, the “graceful response” is to tidy up whatever needs tidying up, because you are on the deck of the Titanic and it is already at this angle: /

And as the longjmp() docs say:

Although longjmp () is an async-signal-safe function, if it is invoked from a signal handler which interrupted a non-async-signal-safe function or equivalent (such as the processing equivalent to exit() performed after a return from the initial call to main ()), the behavior of any subsequent call to a non-async-signal-safe function or equivalent is undefined.

So, calling siglongjmp() from a signal handler for SIGSEGV to then have it resume your main() and do that printf() is undefined behavior. The signal interrupted non-signal-safe code (main()), and therefore can’t resume it.

glibc used to include a tool catchsegv that would wrap around a binary and dump some debugging information when it crashed, but it was removed a couple of years ago because even that had problems (and used non-async-signal-safe functions to generate backtraces).

…All that being said, Python installs its own signal handlers, yes, which are still going to fire when your SIGSEGV occurs. (Though I don’t believe faulthandler is used unless explicitly enabled. It’s the code in Python/pylifecycle.c that you’d be dealing with, I think.)

When you’re running module code, you’re running in the Python process environment (even on another thread), so you’re going to be subjected to its signal handling. There’s no “blank slate” environment like you’d have when launching an executable as a new process.

2 Likes