Cancelling threads

I’ve been writing an application that makes extensive use of threads (I’d love to use virtual threads someday), and I’ve been really wishing that I had a reliable way to gracefully cancel threads. In prior discussions, it has been rightly noted that, ultimately, threads cannot be forcibly aborted and keep the process stable.

I want the process to be stable, however I don’t strictly need my threads to be forcibly aborted. I can live with forcible abortions only being possible with isolated processes. What I’m really going for is allowing the code that is executing in the thread to avoid deep knowledge of the environment that is running it, and the environment to need to be unaware of the specifics of the implementation of the thread’s code. Cancellation exceptions as an API seems to fit very nicely to me as having minimal cognitive overhead and leakage.

To the extent that I followed the most significant conceptual hurdles to cancelling threads, it was on two fronts:

  1. C calls cannot be interrupted without cooperation from the C library.
  2. Cancellation exceptions might raise in resource cleanup handlers, and break cleanup.

For the first challenge of interrupting C libraries, I think that limitation is at least as acceptable for cancelling threads as it is for raising KeyboardInterrupt. We might want to expose it differently to C extensions, but the general principle of the limitation seems entirely reasonable to me.

The second challenge of cancellation exceptions potentially raising during resource cleanup, however, seemed more troubling, and I didn’t see any suggestions that seemed to adequately solve that challenge. But I’ve been thinking it over today, and reflecting on the main motivation I have for cancellation – allowing the thread creator to handle the thread lifecycle without explicit cooperation with the thread’s code – and I think there may be a reasonable approach.

As commenters have noted, existing solutions that try to solve the problem are inadequate. As a particular example, one suggestion was to surround uninterruptible code to disable cancellation, but no matter how you slice the python code, there’s no way to avoid unintentionally having cancellation disabled either too broadly or insufficiently. It seems to me that this would require cooperation from the interpreter itself.

This got me thinking about where this type of unsafe-to-interrupt code generally is, and what patterns might be available to cooperate with the interpreter to ensure that cleanup code won’t be abandoned. I see three ways that resources are cleaned up in Python code, and that we might wish to avoid being interrupted:

  1. Explicitly called cleanup functions (e.g. File.close(), Lock.release())
  2. finally blocks (and perhaps except blocks as well)
  3. __exit__ methods of context managers (with blocks)

For my own code, I have moved to avoiding explicitly calling cleanup functions except in finally blocks, and I get the sense that it may be common practice in the Python community as well. It seems reasonable to think that we should encourage using with and finally more. So then, the two ways that I would bless for doing this type of resource cleanup already cooperate with the interpreter by the language syntax of with blocks and finally blocks.

What if the interpreter tracked when it was entering cleanup code from a with or finally block, and waited to raise any cancellation until they completed? Because it would be done by the interpreter, it wouldn’t have the opportunity to raise, for example, immediately inside the finally block, where necessary cleanup code might be interrupted.

To my eyes, I think the C-code limitation, and this proposed limitation to avoid interrupting cleanup code, would both be reasonable for my uses.


Prior and related discussions:

2 Likes

The same problem exists with KeyboardInterrupt in non-threaded code, so even if nothing were done about this, it wouldn’t be any worse than what we have now.

2 Likes

I don’t think so. Context managers were added in PEP 343 explicitly to:

This PEP adds a new statement “with” to the Python language to make it possible to factor out standard uses of try/finally statements.

I don’t recall ever using a finally in about nine years of Python now. But many, many exit(…) functions.

In the async world, one of the main use cases for cancellation is timeouts (or I guess deadlines would be the more precise term here). If the interpreter has to delay the cancellation for an arbitrary length of time (maybe forever) that doesn’t quite work :wink:

How does that work currently with KeyboardInterrupt? I expect that it handles it, and expect cancellation could do the same thing.

2 Likes

Sorry, how does what work with KeyboardInterrupt?

KeyboardInterrupt eventually gets raised by the interpreter loop as it checks for asynchronous events to process. It is entirely reasonable and expected for code to catch this and do its own thing. It sits to the side in the exception hierarchy as a subclass of BaseException rather than a subclass of Exception so that it isn’t caught unintentionally by except Exception:.

A new ThreadInterrupt exception, if introduced (this would be a PEP), would be best off behaving in the exactly same manner. It might even be reasonable to suggest that be a KeyboardInterrupt so that existing logic to cope with interruptions could already work with it.

Bare except: statements catch any of these regardless, but there is a reason that antipattern is frowned upon even harsher than except Exception: is by people and their linters.

It would introduce "problems” similar to what @gcewing alludes to in that code that isn’t prepared for these could wind up stopping in a bad state where it never stopped without a SIGKILL before. But as noted, that isn’t a new problem. It is already somewhat routine for code to get explicit KeyboardInterrupt cleanup special case handling wrong and need bug fixes around that after putting a program in front of a bunch of real world interactive users.

7 Likes

@Tinche I was asking how things that you might want to have timeouts for currently work with KeyboardInterrupt. For example, what happens if you’re waiting to acquire a lock with a timeout, and you get a KeyboardInterrupt? If we can handle that correctly, then I figure that cancellation can work the same way.

On my local machine, I can run this experiment (pressing Ctrl+C while waiting to re-acquire the lock).

% uvx --python 3.14.2 python
Python 3.14.2 (main, Dec  9 2025, 19:29:30) [Clang 21.1.4 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
>>> l = threading.Lock()
>>> l.acquire()
True
>>> l.acquire()
^CTraceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    l.acquire()
    ~~~~~~~~~^^
KeyboardInterrupt

If we can interrupt with KeyboardInterrupt, it makes sense to me that we could interrupt this with cancellation in basically the same way.


@gpshead Subclassing from KeyboardInterrupt sounds both terribly impure and perhaps immensely practical. That said, I rather expect that most places catching KeyboardInterrupt aren’t expecting anything useful to happen if they’re not on the main thread, so I think we might be okay to be pure and create an Interrupt parent class for both of them instead of having ThreadInterrupt subclass KeyboardInterrupt.

2 Likes

Well, in an asyncio context a control+c would generally translate into the cancellation of the main asyncio task. This manifests as an asyncio.CancelledErrorgetting raised at the site currently being awaited.

from asyncio import Lock, run, timeout


async def main() -> None:
    lock = Lock()
    await lock.acquire()
    async with timeout(5):
        await lock.acquire()


run(main())

So as soon as you do a ctrl+c, the waiting stops. If you do not do anything for 5 seconds, the timeout context manager arranges for the current task to, again, be cancelled (but it catches this and translates it into a different exception).

Not sure how this is relevant, or why the timeout here even matters. There’s no detection of any finally blocks though, or delaying the cancellation.

Doing async things in finally blocks is recognized as risky though, and usually requires special care.

David Beazley did this fun talk a while back titled “Die threads” https://www.youtube.com/watch?v=U66KuyD3T0M

7 Likes

If you want to mess around with this in cursed eldritch ways, you can ctypes.pythonapi.PyThreadState_SetAsyncExc

3 Likes

If your cleanup code can take an arbitrarily long time to run, I’d say you’re doing something wrong.

1 Like

Hang on a moment. I suspect it’s the SIGINT from the Ctrl-C that’s
interrupting the system call at the OS level, not anything that Python
is doing. Replicating this would require sending a signal to a specific
thread. On Unices this seems to be possible with pthread_kill().

I don’t know how you would go about this in Windows.

That’s interesting. I’ll have to dig into that more. I know that signal handlers wake a thread, but that in that context there’s very little that the signal handler can do. From the Python side, I expected that the interpreter set an internal flag to tell it to raise the exception. I’m not sure what the OS would do. I might suspect that OS calls would be forbidden in interrupts, but it’s also not intuitive to me that the lock would be forcibly unlocked just because there was an interrupt.

Early signs suggest that there are some syscalls that you can make in signals, and that waits for OS-level mutexes are not automatically cancelled.

KeyboardInterrupt is great at breaking resources. Long-lived services that need to handle shutdown gracefully already install signal handlers and begin shutdown at a point that is safe to do so, rather than at an arbitrary one where an exception would be raised with the default handler.

asyncio cancelation is also pretty good at breaking resources, and you can’t use .cancel blindly on an asyncio task without knowing if it is safe to cancel.

A new exception getting thrown into threads for cancellation purposes would be pretty awful to deal with if there wasn’t a way to opt out of that, which to me points to keeping the current status quo:

if you have work in a thread you need to be cancelable, but which may not always be in a safe state to cancel, you need to communicate it to the thread, and the thread needs to check for it periodically when in a safe-to-cancel state.

2 Likes

We don’t need to make system calls in signal handlers, and we don’t need
anything to be forcibly unlocked, we just need a thread that’s waiting
for a lock to wake up and be able to do something if it receives a
signal while it’s waiting.

It looks like this should be possible if the locks are implemented using
semaphores. One of the documented error returns of sem_wait(2) is

 [EINTR]            The call was interrupted by a signal.

So the signal handler sets a flag, the thread making the sem_wait call
wakes up, notices the flag is set and raises an exception.

Releasing any locks held by a thread when it receives an exception would
be the responsibility of the thread itself using suitable cleanup code.

1 Like

Honestly, I don’t think this is strictly related to threads. Though surely interrupting threads could be a nice feature. I think the problem in general has to do with the fact that the language doesn’t allow for safe interruptions. In fact, this is a problem for non-threaded code, too.

One important detail is that KeyboardInterrupt can only be observed in the main thread, so introducing a ThreadInterrupt would definitely break existing code in sneaky ways, because existing multithreaded code has never needed to account for interruptions in its design.


What if we only treat the problem of safer KeyboardInterrupt, though? I think we could make some interesting generalizations. I was reading the official docs on signal handlers. There is an example of a concurrency issue that I want to copy over here (comments in [...] are mine):

class SpamContext:
    def __init__(self):
        self.lock = threading.Lock()
        # [the example was using locks and I sticked with it, but]
        # [the same reasoning can be applied to any resource]

    def __enter__(self):
        # If KeyboardInterrupt occurs here, everything is fine
        self.lock.acquire()
        # If KeyboardInterrupt occurs here, __exit__ will not be called
        ...
        # KeyboardInterrupt could occur just before the function returns

    def __exit__(self, exc_type, exc_val, exc_tb):
        ...
        # [KeyboardInterrupt could occur just before the lock is released]
        self.lock.release()

Remember that only the main thread can see KeyboardInterrupt, so let’s suppose it’s the main thread that’s using this SpamContext. Let’s also assume that the lock isn’t just pointless and that there is another thread waiting for it, so that not releasing the lock is bad.

So yes, we are in a pretty bad situation over here. A SIGINT at just the right time might lead the process to never stop, because the interpreter will wait for the non-main thread to exit (unless it’s a deamon thread), but it never will because the main thread is no longer capable of releasing the lock. (There are more interesting and intricate examples in the article by @njs that I linked at the beginning.)

I dug up PEP 343 which introduced the with statement and it mentions this problem exactly.

You may ask, what if a bug in the __exit__() method causes an exception? Then all is lost – but this is no worse than with other exceptions; the nature of exceptions is that they can happen anywhere, and you just have to live with that. Even if you write bug-free code, a KeyboardInterrupt exception can still cause it to exit between any two virtual machine opcodes.

True. But KeyboardInterrupt is not a bug, it’s an explicit intention to stop a process. If a bug in __exit__() causes a resource to not be cleaned up, fair enough. If it’s an explicit interrupt (that wanted those cleanups to be executed!), then it’s just bad.

Python has never specified when a KeyboardInterrupt gets raised. What if it specified when it is not raised? In my mind that would be inside finally blocks and __enter__ and __exit__ methods used by with. I think it would be great to have a guarantee that interruptions (which are not bugs!) cannot happen in those scopes. Using with and finally to handle resource lifecycles is so ubiquitous that I think they should deserve special treatment. I think this might be PEP-worthy, though I wouldn’t know how to start writing one.

Note that finally blocks have a language disadvantage compared to with blocks, in that they are equivalent to __exit__, but lack a counterpart to __enter__. That is, where is the boundary of the __enter__-equivalent code that needs protection here?

lock.acquire()
... # do some other unrelated things
try:
    spam()
finally:
    lock.release()

It’s impossible for the VM to tell, because there are no distinct language boundaries. So it’s fair to suggest that with statements should also be considered safer, under this proposal, in addition to being more succinct. I think giving with even more advantages compared to try/finally than what it already has isn’t such a bad thing, though.

So, a desugared with statement that does the same thing as above is:

# start of uninterruptible scope 1
lock.__enter__()  # (*)
try:
    # end of uninterruptible scope 1
    spam()
finally:
    # start of uninterruptible scope 2
    lock.__exit__()
    # end of uninterruptible scope 2

If a SIGINT is received in line (*), it gets delayed until the end of uninterruptible scope 1. This also implies that the try block starts, therefore the lock will get released. Furthermore, if another SIGINT is received during the call to __exit__, the lock will also get released anyway, and the second KeyboardInterrupt delivery gets delayed, too.

I think that delaying is totally acceptable: yes, I’m sending an interrupt because I want the program to exit. But I never want the program to reach unreachable states where resources never get cleaned up. And this desire, in my view, prevails over the promptness of program exit.

Implementation

Implementing this behavior is entirely possible. The tricky part is that those scopes that need protection can in fact be nested. (Nothing prevents a finally block to call a function that contains a with block, and so on.) So, effectively, as soon as we enter one of these protected scopes (__enter__, __exit__, or finally) we need to make sure that we left the outer-most protected scope before raising KeyboardInterrupt. This can be done by adding a stack of protected scopes to PyThreadState, and pushing and popping as the scopes are traversed.

Then, KeyboardInterrupt is raised with the usual mechanism, that additionally has to check that the new stack is empty. Otherwise, the interruption is delayed.

This is fine because Python doesn’t specify when an interrupt is raised. With this change, Python would specify when it’s not going to be raised. Still, the exact point in which it is eventually raised is left unspecified, as it is now.

Generic interrupts

On top of this, having a generic interrupt class BaseInterrupt that inherits from BaseException and that behaves like KeyboardInterrupt would be very useful. We wouldn’t like to force people that already use multithreading today to have to cope with a new ThreadInterrupt that’s sneakingly breaking their existing code. OTOH, it would be very useful for libraries/frameworks to have the possibility of saying “your code may be interrupted by my GreenInterrupt but that’s totally fine because your context managers will exit gracefully anyway.”

I think the problem posed in the OP would also be solved with this. In a single codebase, a custom MyInterrupt refactor to gracefully cancel threads is less frightening than an ecosystem-wide change.


Unless this idea gets shut down immediately, I’d be happy to write a PEP for it. But I seriously don’t know where to start.

5 Likes

This suggests that thread interruptions should be disabled by default, and a thread should have to explicitly enable being interrupted. I’d be happy to help write a draft PEP. This is something I’ve been thinking Python ought to have for a while now.

1 Like

@dpdani I really appreciated your making the point that __enter__ needs to be protected just as much as __exit__ does. I totally agree. You noted the incongruity of opportunity between with statements and finally , and that made me worried that my preferred @contextmanager decorator approach to making context managers wouldn’t work because they use try/finally internally, but that’s not true because it’ll have already been in with block when it’s being used!

Like you, I don’t see any issue with the with block getting a little better than alternatives. If anything, it shows that the with block is really well designed that we can use it to handle this subtle case.

I do worry how pervasively people might be depending on KeyboardInterrupt working the way that it does (with all its problems). I don’t know that there’s a reasonable path to making this protection opt-in, though, even temporarily like with a __future__ import or something. Still, this interruption challenge seems like one that is worth resolving. If we really can’t apply it by default, perhaps a special context manager can be used to enable this better-protected mode.


@gcewing I can see your point about interruptions being disabled by default. But so far I don’t agree. First, I think that disabling KeyboardInterrupt on the main thread by default is off the table for backward compatibility and general usability concerns.

So then if it’s just about threads and thread interrupts, then it seems like we’re talking either a keyword argument to the Thread() constructor, or perhaps a special context manager that enables interrupts. That decision almost certainly needs to be made at an early stage, so ISTM that we still end up needing the interruption safety that we’re talking about here, just in the context that is interruptible.

In addition to still needing that safety anyway, if it’s the creator/owner of the Thread that decides whether it is interruptible, that’s also who will be actually interrupting the thread. In other words, calling Thread.interrupt() might be sufficient signal that the creator/owner expects that a thread is indeed interruptible.

2 Likes

IMHO, interrupting threads doesn’t need new syntax or new stdlib exceptions or basically anything new.

most signals are delivered to the main thread under POSIX, except realtime signals. Those can be sent to specific threads.

all an “interruptible” thread needs to do is record own tid, register a realtime signal handler, and raise an exception in said handler. If the exception extends BaseException stack unwinding follows we’ll defined rules, and I’m pretty sure that stdlib will Handel the exception correctly, because stdlib code generally doesn’t care if it’s run on the main thread and can handle KeyboardInterrupt

OP, go and roll a PYPI package along these lines… unless there’s one there already.

1 Like