Cancelling threads

ryanhiebert · December 19, 2025, 9:23pm

Just because I like doing these exercises, I’m curious how that might be spelled; it might be informative. So far for myself, I feel like I’d have the syntax the same, but have cancellation be part of the semantics of these data structures from the get-go because I find the current semantics to be a confusing rake to step on that I wouldn’t expect developers to immediately understand, and because the weight of additional syntax feels a little heavy to my taste.

gcewing · December 20, 2025, 1:11am

Maybe not all context managers should have their enter and exit
be critical sections, but only those that are marked in some way?

In any case, if you really want to do something interruptably in what
would otherwise be a critical section, there should be a way to do that,
e.g.

with InterruptsEnabled():
f = open(“my_file”)

gcewing · December 20, 2025, 1:33am

Thinking about this has made me realise something else. It’s not always
going to be possible to just take existing code that uses with for
cleanup and rely on the atomicity of enter and exit to make it
interrupt-safe:

with open(“my_file.txt”) as f:
…

Here, opening the file gets done before the with statement gets started,
so it’s possible the file will get opened but not closed.

Maybe the initial critical section of a with statement shouldn’t just be
the enter method, but should include evaluating the expression that
produces the context manager and the assignment to the as variable.

gcewing · December 20, 2025, 1:38am

Maybe I/O should always be interruptible, even if it’s in a critical section, and handled the same way as if the I/O operation failed. After all, if you do something like trying to open a non-existent file in a critical section, you’re going to get an exception, and you need to be able to cope with that.

ryanhiebert · December 20, 2025, 1:42am

Your example is a good one. I think in a “magic wand” world you would avoid the file opening until the enter method. Still we need to be practical. We might not be able to solve every problem, but we should give tools that make solving the problem possible.

In this case, I’m skeptical about trying to include the with expression in the protected block. I can’t immediately think of a reason that it wouldn’t work, though, so I’m not dismissing it as a practical alternative, and if too much gets put into the expression, it should always be possible to calculate the context manager outside the with expression, and just reference the value.

We have a similar problem with finally blocks because there is no __enter__ counterpart, and I don’t think there’s a good solution to that except making it into a context manager.

gcewing · December 20, 2025, 1:55am

Some things that bother me about having two different thread classes:

It could make it awkward to write code that deals with threads
generically. E.g. suppose you have a library for dealing with work
queues, and you pass it functions to be run as worker threads. They
may or may not be written to be interrupt-safe, so the library
doesn’t know what kind of thread class to create for them.
To my way of thinking, non-interruptible threads would be kind of a
legacy feature to support old code, and writing all new thread code
to be interrupt-safe should be encouraged. Having to use a special
subclass of Thread to get interruptibity doesn’t really fit well
with that philosophy.
It assumes you want to use thread classes at all. You don’t have to
do that, you can use the functions in the _thread module directly.
So there would have to be ways of dealing with interruptible threads
that don’t rely on having a wrapper class.

gcewing · December 20, 2025, 4:55am

What kind of thing do you have in mind here that couldn’t be built on top of the proposed and protection?

gcewing · December 20, 2025, 5:29am

That would require using a special context manager that opens files. The
nice thing about the current idiom is that you can open the file however
you want, you don’t have to use the builtin “open” function. That’s
useful, because there are multiple ways that you can obtain a file
object. With a file-opening context manager, you’d be restricted to
whatever it provided.

It would also be a bit disappointing if, having just educated people to
use with open as a safer way of dealing with files, we have to tell them
to use yet another idiom for extra safety in threads.

ryanhiebert · December 20, 2025, 1:39pm

A queue library is what I’m building, so what I have in mind for this case may be instructive, and we can think of other cases and how they might be different. I intend to allow for timeouts as an optional new feature, where the timeout would be implemented as thread cancellation. Because of some unique differences in my library, I can’t just use processes so that I can have cancellation.

It may take some time for things to actually be cancel-safe, because of caveats outside my control. But I feel that’s a fair trade-off for this feature. Work function authors may have to deal with some odd challenges in their own dependencies because they aren’t cancellation-aware, and I may have to help them. But there will at least be ways to do it safely and I can guide them to better outcomes, and that feels like it would be enough for me.

ryanhiebert · December 20, 2025, 1:51pm

It would indeed be disappointing. I’m sold on making the with expression part of the automatically protected context. We can’t solve the similar problem for try/finally without new syntax (and at that point, may as well just use with), but there’s at least a spot in the with syntax to solve this problem with this and similar already-existing builtin functions that we want to work without breaking compatibility.

pf_moore · December 20, 2025, 2:14pm

Can I turn the question round? What precisely are the semantics being proposed for thread cancellation? If we’re proposing to block cancellation around arbitrary sections of Python code, that presumably means that cancellation could (in effect) be a no-op (if, for example, a thread was in an infinite loop in protected code). So what can a caller of the cancel function rely on?

gcewing · December 20, 2025, 2:42pm

If there’s an infinite loop inside a critical section then there’s a bug in the code. Critical sections should be as small as possible and execute in a (preferably short) bounded time.

pf_moore · December 20, 2025, 3:19pm

Well, yes, but what about a network connection that hangs? That’s precisely the sort of thing I’d want a timeout/cancel to interrupt. And for that matter, interrupting buggy code is also something I’d expect to use a cancellation operation for (in development, not production, of course ).

As I say, the key question is what the documentation for a proposed “cancel” function would say the caller can rely on.

ryanhiebert · December 20, 2025, 3:21pm

I think we’re actually trying to solve two use cases, both without explicit cooperation from the running code:

Signaling a graceful and safe end to a running body of work.
Signaling ungraceful and quick end to a running body of work.

In the first motion, it’s more important that the program is safe than that it ends. We’re using an exception because it’s exceptional, but it’s not actually an error condition at all. This is what thread cancellation needs, because if there is anything ungraceful about a thread shutdown, the whole process can be corrupted (unreleased locks, for example).

In the second motion, we’re trying to gather everything together, and its more important that the program end that that it’s fully safe. This is KeyboardInterrupt right now, and for good reasons isn’t easily available to threads. But there’s also good reason that one could want this motion to get pushed to threads as well, but only during final shutdown of the process.

gcewing · December 21, 2025, 1:46am

As I mentioned earlier, I’m wondering whether all I/O operations should be interruptible even if they’re in a critical section. Meaning that if you perform I/O in a critical section, you need to be prepared to catch any exceptions arising from it and clean up appropriately. You’re going to have to do that anyway if the I/O fails for some reason, so I don’t think it will be a problem. I think the only guarantee that can be made is that non-buggy well-written thread code will terminate itself cleanly.

ryanhiebert · December 21, 2025, 12:45pm

My concern with allowing cancellation at any point where IO might happen and might raise an exception is that it breaks the encapsulation assumptions that a with statement might have. For example, if I’m starting a new connection on a connection pool, My __enter__ might call a method that obtains a connection, ensures that it’s still operating correctly by pinging the other side of the connection, and if not does establishes a new connection, all without returning any error to my __with__.

If this analysis holds up, it suggests to me that raising would only be appropriate for IO errors that are unhandled. But I don’t think the interpreter can (or should) know before deciding to throw an exception whether or not the exception will be handled. And by the time it’s fully unhandled, we break out and the thread is done.

So far I’ve envisioned this only protecting cleanup (__exit__/finally) and preparation (__enter__), but this is making me think that exception handling ( except) also needs to be included.

ryanhiebert · December 21, 2025, 12:58pm

All that to say, it still feels like we’re talking about two different purposes, that have different requirements. To avoid “interrupt” as ambiguous in our current context, I’ll say that these two purposes are “cancellation” and “termination”, and they have different needs.

Both signal a desire for running code to end and to allow for some safety, but each prioritizes one over the other. When these two purposes are in conflict, cancellation prefers safety to actually ending the running code, and termination prefers actually ending the running code over safety.

There are uses for both, but cancellation can be used safely at any time, where termination, because it deprioritizes safety, is really only going to be appropriate in the context of finally shutting down a process, since it’s more likely to hit scenarios where the process will not be safe to continue.

gcewing · December 21, 2025, 3:59pm

What if something happens that prevents it from succesfully doing either of those? The server is down, someone tripped over the network cable, etc. I don’t see how it can guarantee full encapsulation, there will always be situations in which it has to report an error somehow. I’m not sure what you mean by that. My suggestion is that it always throws an exception if a thread is interrupted while waiting for I/O. If the code making the call is in a critical section, part of the requirements for writing interrupt-safe code would be to anticipate this possibility and clean up appropriately. I’m not convinced of that yet. Bear in mind that if a statement is inside a critical section, the entirety of it is already protected from interrupts the same as the rest of the code.

ryanhiebert · December 21, 2025, 5:34pm

I’m seeing your point in __enter__. I suspect it doesn’t equally apply in __exit__. What do you think?

I’m also not sure how to decide whether something qualifies as io for this purpose. Do you have a sense of that? It feels like maybe “things you should expect to wait for an unknown amount of time”, but I don’t know that we have a good way to specify that.

mikeshardmind · December 21, 2025, 6:07pm

There’s a fundamental difference between it being safe to interrupt that IO, and errors that IO call raise on completion being handled.

I’m not gonna be engaging with this further at the ideas stage, but I would strongly discourage any sort of cancellation that either doesn’t acknowledge that such critical sections need protection, or that do, but then try and subvert a declared critical section for a specific class of operation, and as things stand, I’ll be firmly in opposition to this should it progress beyond the ideas stage with most of the ideas that have been floated.