Daemon threads and background task termination

ncoghlan · January 19, 2025, 4:42am

(This is a follow up to Improving support for non-daemon background threads and Getting Rid of Daemon Threads after recently running into a case where I had to set daemon=True in my own code)

Working with httpx-ws recently, I wanted to provide both synchronous and asynchronous versions of a websocket client API. I also wanted to allow easy interactive use of the synchronous API by handling the case where callers didn’t explicitly close their client sockets. Trying to do this without daemon threads runs directly into the “threads are joined before atexit handlers run” problem. However, having the concrete use case to work with gave me a better idea of what features I wish the standard library could have provided to let me solve the resource management problem without daemon threads.

Thread termination callbacks

I still wanted the functionality of atexit.register_early (suggested in the previous thread), but with a narrower framing: thread termination. For anything other than terminating threads, the existing atexit machinery is fine, since it won’t block the “join all non-daemon threads” shutdown step.

Framing the registration problem this way leads to the following design proposal:

Add terminate=None to the threading.Thread constructor
Add a default threading.Thread.__terminate__ method that calls the terminate callback (if set) and does nothing otherwise
Add an additional step to threading._shutdown (after marking the main thread as shutting down, but before joining the non-daemon threads) that calls __terminate__ on all the active threads reported by threading.enumerate()

A dunder-name is suggested to avoid backwards compatibility issues with thread subclasses that already define terminate() methods that may not be suitable for calling in this shutdown use case. Compatible subclasses can opt-in to the new feature by setting __terminate__ = terminate.

In the absence of this feature, I emulated it with atexit by marking the affected threads as daemon threads (so they didn’t block the implicit join step) and then triggering their termination from the atexit handler.

Asynchronous background threads

This is technically a separate idea, but it is what makes the thread termination callback approach more generally useful. The problem with synchronous background threads that perform blocking operations is that it isn’t always easy (or reliable) to interrupt those blocking calls from the main thread. Instead, the idea is most effective when the background thread is running an asynchronous event loop, so “please terminate now” is just another event to be processed (and the termination handling can be entirely abstracted away via task groups).

This part of the suggestion would be to provide a BackgroundThread class in a new asyncio.threading submodule:

class BackgroundThread(threading.Thread):

    def __init__(self, group=None, task_target=None, name=None, args=(), kwargs={}):
        # Accepts the same args as `threading.Thread`, *except*:
        #   * a  `task_target` coroutine replaces the `target` function
        #   * No `daemon` option
        #   * No `terminate` option (always sets the termination event)
        # Variant: accept `debug` and `loop_factory` options to forward to `asyncio.run`
        # Alternative: accept a `task_runner` callback, defaulting to `asyncio.run`
        self._task_target = task_target
        self._terminate = asyncio.Event()
        self._event_loop = None
        super().__init__(group, None, name, args, kwargs, self.terminate)

    def run():
        """Run an async event loop in the background thread"""
        asyncio.run(self._run_until_terminated())

    async def run_task():
        try:
            if self._task_target is not None:
                await self._task_target(*self._args, **self._kwargs))
        finally:
            del self._target, self._args, self._kwargs

    def terminate(self):
        loop = self._event_loop
        if loop is None:
            return
        loop.call_soon_threadsafe(self._terminate.set)

    async def _run_until_terminated():
        """Run task in the background thread until termination is requested."""
        self._event_loop = asyncio.get_running_loop()
        try:
            raise_on_termination, terminated_exc = self._raise_on_termination()
            async with asyncio.TaskGroup() as tg:
                tg.create_task(raise_on_termination)
                tg.create_task(self.run_task())
        except* terminated_exc:
            pass # Graceful shutdown request
        finally:
            self._event_loop = None

    def _raise_on_termination(self):
        class TerminateTask(Exception):
            pass

        async def raise_on_termination() -> NoReturn:
            await self._terminate.wait()
            raise TerminateTask

        return raise_on_termination(), TerminateTask

This part of the idea is taken directly from what I actually implemented to solve my synchronous API design problem (although my current code doesn’t cleanly separate concerns the way this code does - the termination support is implemented directly in the thread class that implements the rest of the background thread behaviour).

Edit: fixed the thread termination request implementation (I had oversimplified it when extracting the generalised proposal from my actual code)

mikeshardmind · January 19, 2025, 5:36am

Regarding the asynchronous background threads, there are a few things I’ve done differently here that might be useful for comparison. async-utils/src/async_utils/bg_loop.py at main · mikeshardmind/async-utils · GitHub Not sure if it’s better or worse in your specific use case, but this approach essentially has any number of persistent background threads with an event loop as a context manager (typically one, though in a nogil world, there’s a reasonable chance this changes) and the ability to schedule coroutines to it as needed.

With the daemon threads for websockets one, I think this is solvable already by using context managers, specifically having __exit__ signal to the underlying thread to clean itself up (possibly clean up a pool of resources at last context exit), so I’d be interested in better understanding why that approach isn’t usable here before commenting further on this, especially since I generally think that daemon threads are something to be avoided.

ncoghlan · January 19, 2025, 6:30am

Context managers solve the application use case, but they don’t solve the interactive use case.

All the potential cleanup triggers in interactive use don’t actually work:

context managers: with statements can only apply to a single interactive command, they can’t span multiple commands
contextlib.ExitStack: still needs a with statement or some other callback to trigger cleanup
__del__: the __main__ module globals are only cleared after threads are joined at shutdown
atexit: these hooks also run after threads are joined at shutdown

The last two can be made to work, but only if you mark the background thread as a daemon thread so it gets ignored by the “wait for all non-daemon threads to terminate” step at shutdown, and then set up an appropriate atexit hook to trigger the lazy cleanup.

This means the current two simplest ways to implement background threads for synchronous applications are to:

Just make them regular threads, with only deterministic cleanup supported. These APIs hang on shutdown if you attempt to use them interactively.
Make the background threads daemon threads, without arranging to clean them up before shutdown (in the absence of deterministic cleanup). These APIs are likely to throw exceptions on shutdown if you attempt to use them interactively.

I do think public thread-safe APIs to schedule tasks and run arbitrary callables in the background thread’s event loop would be worthwhile additions (my actual implementation has them), but the sample code in the post was already complicated enough without them (one subtle point with such injections is that it’s OK for them to be outside the termination task group, since the loop shutdown will terminate everything else after the main task gets terminated).

(Given the impact on the threading API and the shutdown process, I think this idea would need a PEP to be actually implemented, but I wanted to get feedback on it before investing that kind of time into it)

mikeshardmind · January 19, 2025, 8:19am

Okay, I don’t really have a reason to get into any bikeshedding of the API presented, and the general idea is one that seems good to me, whatever specific form it takes shouldn’t be an issue, I also can’t think of a strong reason why this would prompt the full atexit.register_early solution over this, even if I’m personally interested in a better multi-phase shutdown overall.

I have two followup questions, but the answers won’t be ones that would change my opinion of if the feature should exist; This seems like a good way to reduce reliance on daemon threads to me irrespective of the answers to them.

Will threads have their termination callbacks invoked in a deterministic reliable order? If not, is it worth a simple shuffle to ensure people don’t end up relying on the implementation ordering and add their own synchronization to shutdown if necessary?
Would it also be worth an API in atexit that can register shutdown that’s specific to when in an interactive session and is a noop otherwise to help cover any non-threading cases, or do you think that the non-threading cases are suitably covered by atexit without this?

ncoghlan · January 19, 2025, 12:58pm

Thread termination would be in the reverse order of what threading.enumerate produces (which is pretty much thread creation order since dicts became order preserving).

A dedicated interactive loop shut down hook would be tricky to define, since there are so many REPL implementations out there.

ncoghlan · January 24, 2025, 1:51am

Just noting urllib3 in 2024 and it’s mention of socket.shutdown as something worth mentioning in relation to terminating synchronous background threads. Writing sentinel values to synchronous queues or periodically polling shut down events are also worth mentioning.

The PEP’s background section should also cover using weakref.finalize to gracefully terminate daemon threads without having to write a custom atexit hook (which is the best we can do for shut down in the status quo).

Mixing foreground and background methods in the one thread interface also needs to be justified (based on the long-standing precedents in the threading and futures APIs).

methane · July 4, 2025, 3:01am

In wsgi app, there is no common API for early atexit.

For example, OpenTelemetry uses background threads for sending telemetry data.
They use atexit to flush buffered data. And they need to use daemon thread for it works.

opentelemetry-python/opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py at 43341d793be49b8071f8223c7c2a9123307aa3de · open-telemetry/opentelemetry-python · GitHub

On the other hand, concurrent.futures.ThreadPoolExecutor doesn’t use daemon thread.
It means we cannot use ThreadPoolExecutor in WSGI application!

uwsgi has uwsgi.atexit that is called before Py_Finalize(). But it is not defined in WSGI standard.
We clearly need a standard way to stop background threads from atexit.