The standard library queue log handler is implemented using multiprocessing.Queue, however that class (and the multiprocessing module in general) has built-in logging itself. If that is enabled log messages will be logged in the course of processing log messages, and we deadlock (see gh-91555 and gh-90321).
Options under consideration for dealing with this are:
Treat it as user error and close the issues as such
Remove the logging from multiprocessing.Queue
Suppress logging (on a per-thread-and-logger basis) while handling log messages
The first option seems sensible at first glance. However, this is a nasty and non-obvious latent bug to leave lying around. Sentry’s logging integration, for example, monkey-patches logging internals in order to avoid it.
The second option was suggested in discussion of gh-91555: it would fix those two bugs, but would remove useful (and publicly documented) functionality from the module, and change logging behaviour for any existing code that uses it.
The third option is a small and simple fix (see the PR linked from gh-91555), and prevents this entire class of bug. However there may be subtle problems with using TLS, or other concerns I’m missing.
I have not checked the code to see how feasible this is, but: Wouldn’t it be possible to add a parameter to multiprocressing.Queue that prevents all attempts to log something that logging would then use? So essentially option 2, but locked behind a flag in case people consider it useful?
The Queue is passed to the handler not created by it, so it couldn’t be done with a constructor parameter. It wouldn’t be hard to add a flag member variable to disable logging, but the handler would have to set it when it is constructed and the queue passed in. It all feels a bit hacky to me though, and it would not help other handlers using third-party libraries with the same issue.
I haven’t used contextvars before, but having a quick look I agree with Antoine: TLS is appropriate here. The logging module itself is synchronous by design, and an async context switch during message handling would not be safe since it might be holding arbitrary threading locks.
I don’t know if it’s a good pattern to follow, but Lib/multiprocessing/resource_tracker.py uses an RLock and lock._recursion_count() method to detect recursive call:
def _reentrant_call_error(self):
# gh-109629: this happens if an explicit call to the ResourceTracker
# gets interrupted by a garbage collection, invoking a finalizer (*)
# that itself calls back into ResourceTracker.
# (*) for example the SemLock finalizer
raise ReentrantCallError(
"Reentrant call into the multiprocessing resource tracker")
def _stop(self, ...):
with self._lock:
self._stop_locked()
...
def _stop_locked(...):
# This shouldn't happen (it might when called by a finalizer)
# so we check for it anyway.
if self._lock._recursion_count() > 1:
return self._reentrant_call_error()
...
Yes, something like this would be an option: detect the recursion and raise an exception. That would be much better than a deadlock and marginally better than a stack overflow.
IMO though the best option is still to just ignore and silently drop re-entrant log messages: in practice that is what affected code will need to do to fix the problem anyway (e.g. see Sentry’s logging integration, referenced above).
I haven’t looked at exactly what state is being managed here, but we ought to be able to use contextvars anywhere we might use a thread local. They provide the same functionality, except contextvars give you task local, which is more “local” than thread local.
It doesn’t matter if the state being stored never persists outside of the call (e.g. if after the logging call returns, it’s never going to use the info again), and it never yields within the call.
But even if these are the case, it should still be perfectly safe to use contextvars instead of thread locals, and we should generally prefer them unless there’s a really specific reason (to share state across tasks). I don’t see any really specific reason here.
We can’t use contextvars because a handler could copy the context into a thread and drag with it the disabled state of the logger into that new thread
Hmm, yeah, I see your point. That’s unfortunate (did this come up during the logging issue that was being worked on recently?)
Arguably the handler should just not copy the context and just start with a default one, but there’s no way for it to know that. It might think that it’s the core dispatcher, especially given how mixed up sync and async code tends to get. (In any case where contextvars (or TLS) matter at all, there must be some central controller that’s spinning up new tasks - a connection handler or whatever - and anything that isn’t that ought to be starting tasks fresh rather than copying task-local state.)
a fresh task, eg asyncio.run, asyncio.create_task, asyncio.to_thread (not a task, but an async function that starts a thread) etc by default copy the current context. contextvars yean to be copied into new threads, and only make sense when you have a tree of call stacks where some context needs to wrap all of them.
Another possible solution is to log events relating to multiprocessing queue internals to a different “meta” logger such as one named multiprocessing.meta. Then those who need to monitor/debug those internals can add handlers to that logger (I imagine it wouldn’t be a common case).