Convert SIGTERM to an exception by default?

takluyver · March 28, 2025, 11:37am

By default, Python converts SIGINT (triggered by Ctrl-C in a terminal) into a KeyboardInterrupt exception. In most cases this causes the program to exit quickly, while giving it a chance to do simple cleanup such as deleting temporary files or flushing write buffers.

Tools that supervise long-running processes have a convention to stop a process nicely by sending SIGTERM, waiting a bit, and then SIGKILL if it hasn’t exited. The first signal is often configurable, but SIGTERM is the default for systemd, supervisor, daemontools, docker and Slurm. I haven’t found an example that uses another signal by default in a similar context. By default, SIGTERM terminates a Python process abruptly, with no cleanup, as if you had called os._exit().

As a result, if I’m writing a service in Python, I often set a handler for SIGTERM to raise KeyboardInterrupt, like SIGINT, so that it gets a chance to exit ‘normally’, clean things up and dump buffers.

Would it make any sense to make a new exception class (Terminated?) similar to KeyboardInterrupt and set a default handler for SIGTERM to raise this? I think this would make it simpler to write well behaved (Linux? *nix?) services in Python.

Possible counterarguments:

Exceptions from nowhere can do weird things, there are better ways to do cleanup, as described in the signal module docs.
- The better ways need to be integrated with the application code, and an exception is much better than immediately killing the process.
It’s not possible/desirable to do something equivalent for Windows.
- I don’t see it as a problem if this is specific to posix-y platforms.
This doesn’t guarantee normal termination, systems still need to be prepared for abrupt crashes (SIGKILL, segfaults, power loss).
- There’s still value in making that rare. The convention of sending a catchable signal first exists precisely to allow for in-process cleanup.

barry-scott · March 28, 2025, 11:55am

In my experience I will add the SIGTERM signal hander and start the service doing a graceful shutdown. I would have configured the systemd service to have appropiate timeout values setup if the defaults are not too short of too long.

I would expect that having SIGTERM raise an expection is not going to be a good way to start a graceful shutdown. You will be aborting possible a very important task the service is currently performing.

If this was added the first thing my service would have to do is stop this exception being raised ever.

-1 from me.

takluyver · March 28, 2025, 12:10pm

If you’re setting your own handler for SIGTERM already, that would override the default handler, so my proposal wouldn’t change anything for you.

If you’re not adding your own handler, then the current default behaviour is to abort immediately, which is just as likely to be in the middle of an important task. Currently the program will just die wherever it is. An exception gives you a chance to handle it - which might mean finishing or rolling back the important task if that’s possible within a few seconds.

storchaka · March 28, 2025, 12:40pm

Looks reasonable to me.

I think that we can use SystemExit for this. It is not a subclass of Exception, so is not caught by except Exception. Unlike to other exceptions, unhandled SystemExit does not cause printing a traceback. It finishes the REPL.

barry-scott · March 28, 2025, 12:40pm

The solution that anyone running a python service should use is to add the signal handler and do a graceful shutdown.

The exception does not seem like a helpful change.

Rosuav · March 28, 2025, 12:41pm

I think so. It may also be worth handling SIGQUIT the same way - that one can be sent from the console using Ctrl-Backslash, at least on some systems.

Is there value in having a subtree for termination exceptions, which would then include KeyboardInterrupt as well? They are definitely not subclasses of Exception, but there are a growing number of things that subclass BaseException and it may be worth having an easy way to catch “any exception that is an intended abort signal”.

Rosuav · March 28, 2025, 12:42pm

How would you do a graceful shutdown? The most obvious way is to raise an exception. It makes good sense to me for Python to do this naturally and by default.

barry-scott · March 28, 2025, 12:46pm

I the cases where I have implemented this:
By asking threads to finish up and exit.
By asking child processes to finish up and exit.
Stop accepting new network connections.
Shutting down existing connects.

But all at points in the business logic where its safe to do so.

takluyver · March 28, 2025, 1:01pm

Thanks all.

I would lean slightly towards an exception that does produce a traceback by default, like KeyboardInterrupt does, because if you stop a process that has got stuck, it’s handy to see where it was stuck. But I’d be fine with using SystemExit if that’s the consensus.

Maybe also SIGHUP, used in scenarios like closing an SSH connection, if we’re making a list.

That would make sense, especially if we’re adding more than one. If we don’t want to proliferate too many exception classes, we could have a TerminatingSignalException which holds the signal number, and have KeyboardInterrupt as a child of this, like FileNotFoundError represents OSError with an ENOENT errno.

Rosuav · March 28, 2025, 1:46pm

That’d work too, yeah! I’m not sure how many exceptions should be handled in this way though; for example, even though SIGUSR* will terminate the process, they’re not normal termination signals. And if you’re getting SIGSEGV in a Python program, there’s possibly something much more serious going wrong. But I would support SIGHUP, SIGINT (special-cased for compatibility), SIGQUIT, SIGTERM, and possibly others, all being turned into exceptions.

It may also be of value to have an easy way to request an exception for any other signal too, eg signal.signal(SIGALRM, SIG_RAISE) or signal.enable_exception(SIGALRM), as this kind of behaviour could be convenient for other signals if appropriate to the application.

gerardw · March 28, 2025, 4:04pm

This is a bad idea. Anyone who wants to catch SIGTERM already and raise an Exception can.

Installing a signal handler by default would cause finally and atexit to execute in a future version of Python when they don’t now, so switching from Python 3.13 to Python 3.N would cause a change in behavior for little benefit.

zware · March 28, 2025, 4:28pm

Well, the benefit is that finally and atexit will run

I would expect that in most cases of SIGTERM, a naive user () would actually prefer an exception that by convention should never be caught^[1] to give the service a chance to clean itself up a bit before being unceremoniously killed. If that’s what you really want, you can always set the signal handler to os._exit.

it’s probably not worth it to add EvenBaserException that can’t be caught ↩︎

takluyver · March 28, 2025, 4:41pm

I agree it should be limited to things that are likely intended to end the process, so not SIGUSR*. And I think after SIGSEGV we can’t continue running general Python code (hence the faulthandler module printing a very limited traceback).

Good point, especially if there’s a generic exception class for terminating signals.

gerardw · March 28, 2025, 5:29pm

If we were discussing Python 3.1 I would totally agree. This user justs wants his code to work the same way in new releases as it did in the old. (Faster or less memory is okay ) but program function should not change unless we change the code to take advantage of new features.

Of course, if necessary to fix a security hole changes are understood.

The signal.signal(SIGTERM, SIG_RAISE) idea is a good one. Let’s just not make it the default.

takluyver · March 28, 2025, 6:18pm

Behaviour changes affecting existing code deserve some caution, but Python developers routinely weigh this up against the benefits, and do make changes that can affect existing code. There’s a ‘Porting to Python 3.x’ section in each ‘what’s new’ document precisely for this reason.

It seems unlikely that many systems rely on finally blocks and similar mechanisms not executing, since their purpose is to execute code in both normal & exceptional conditions. Someone might inadvertantly rely on, say, a temp file not being deleted after SIGTERM, but I imagine this is pretty rare, compared to the new and existing code that can benefit from by-default cleanup.

In particular, I think it’s worth changing the default because it’s easy to overlook that the default stop/down/cancel action in a lot of these supervisor systems will terminate your process abruptly, or to assume that that will be fine. If your code uses a temp file for 100 ms every 10 seconds, stopping it with an unhandled signal will only leave that temp file behind 1 time in 100. Or you start out with a super simple service that doesn’t need any cleanup, so you don’t set a signal handler, and over time people add features to it assuming that the basic setup is fine.

I said in my first post that I often set a SIGTERM handler to raise KeyboardInterrupt. I just went and looked at a few relevant projects, and… I don’t actually remember to do this nearly as often as I imagine.

colesbury · March 28, 2025, 6:23pm

The whole ctrl-c / SIGINT handling as KeyboardInterrupt is not particularly robust, so I’m not thrilled with the idea of expanding it further. Practically, this means that kill <python pid> or killall python will likely stop working for a lot of people.

Some of the problems include:

Exceptions (including KeyboardInterrupt) during finalizers and garbage collection just get swallowed with at most an unraisable warning.
Lots of code, including the standard library, is not robust to exceptions being raised at arbitrary points. Exceptions raised within finally blocks are particularly likely to cause problems.

For example, if you press ctrl-c while a threading.Condition.wait() is running its finally block, you will corrupt the state of the condition variable:

github.com/python/cpython

Lib/threading.py

d260631be


      
          finally:
              self._acquire_restore(saved_state)
              if not gotit:
                  try:
                      self._waiters.remove(waiter)
                  except ValueError:
                      pass

csm10495 · March 28, 2025, 10:29pm

I don’t see a benefit in doing this vs allowing a user to opt into the behavior. Would the exception percolate to all multiprocesses/interpreters/threads?

If someone does something that holds the GIL during a finally block or exception handling for this, the whole process death could hang. The same can happen with signal handling but not as the default behavior.

It seems kind of weird to assume usage in terms of TERM then KILL. I tended to agree with the if this is a early Python 3, then it could make sense, but otherwise it just seems like a behavior change.

-1 from me at least as a default behavior change.

If it must be further decided: I think there should be a PEP and let the SC decide.

barry-scott · March 29, 2025, 9:20am

An alternative to this change would be to design and publish a python module that implements the needed APIs to support services written in python.

Add a API to allow for shutdown that has callbacks that deal with the signals for example. This would provide (1) documentation about service shutdown and (2) hide the details about how this is achieved.

Such an API could also handle the reporting of startup status to systemd.

dimpase · March 30, 2025, 5:18pm

One way or another, this ought to play nicely with cysignals, which was specifically designed to do a nice recovery from extensions using Cython. It can also be used with more general C/C++ and Fortran extensions (not sure how wide-spread its use is beyond Cython, though).

shocklateboy92 · June 14, 2025, 4:48am

So, the primary benefit I can think of this proposal is if I have something like:

with someImportantAllocation() as allocatedValue:
    doSomeLongRunningComputation()

and it gets a SIGTERM in the middle of that, the __exit__() code gets executed and I don’t have to worry about leaking resources or leaving hardware in bad states when shutting down services.

Are there better ways of achieving that?