Is there some way to handle the SIGABRT signal? The signal python module doesn’t seem to work.
I am running tensorflow python and one particular call causes the script to exit with $? = 134 and the message:
Aborted (core dumped)
I know that the abort is triggered in c code, although I don’t know where or how. Although it seems likely it is from a std::abort() call.
I tried registering a handler in my script with:
import signal
def handler(signum, frame):
print(f'Signal handler called with signal {signum}', flush=True)
raise BaseException
signal.signal(signal.SIGABRT, handler)
but, unfortunately, it is never called, and there doesn’t seem to be any warning of this in the docs for the ‘signal’ module.
The abort() function in glibc first calls raise(SIGABRT). This call returns if the signal was handled. In this case, abort() restores the default handler that should terminate the process, and then it raises SIGABRT again. If raising the signal still returns, abort() tries a system specific abort instruction. If even that doesn’t work, it simply calls _exit(127). No matter what, if abort() is called, the process is meant to terminate.
A SIGABRT handler is still meant to be useful, e.g. for logging state or a controlled shutdown. However, Python’s signal module can’t handle abort(). It uses a C signal handler that sets a flag for the interpreter to check signals and then immediately returns, in which case the process is terminated immediately. You can use ctypes to handle an abort() call, but bear in mind that the process is terminated as soon as the handler returns. For example:
import ctypes
from signal import SIGABRT
c_globals = ctypes.CDLL(None) # POSIX
@ctypes.CFUNCTYPE(None, ctypes.c_int)
def sigabrt_handler(sig):
print('SIGABRT handled!')
c_globals.signal(SIGABRT, sigabrt_handler)
>>> import os
>>> os.abort()
SIGABRT handled!
Aborted (core dumped)
That cleared things up perfectly. So as it turns out I do need to continue the program even when the third-party code calls abort. (It shouldn’t be calling abort, but I’m stuck with it). I am currently experimenting with calling the third-party code in a separate process using python’s multiprocessing module. It appears to be working so far.
How does it call abort()? At the C level, or at the Python level? One solution - normally a bad idea, but in this case, the alternatives are worse - would be to monkeypatch the os.abort function to raise an exception instead.