Asyncio tasks: A cancels and awaits B, B cancels and awaits A -> recursion error

xitop · June 19, 2025, 8:36pm

The program creates a kind of a loop (A awaits B, B awaits A).

import asyncio

async def aux(main_task):
    main_task.cancel()
    await main_task
    print("not reached")

async def main():
    aux_task = asyncio.create_task(aux(asyncio.current_task()))
    try:
        await asyncio.sleep(5)
    except asyncio.CancelledError:
        print("main task cancelled, starting cleanup")
        aux_task.cancel()
        await aux_task
        print("not reached")

asyncio.run(main(), debug=True)

I would expect a deadlock, but it produces the following error output.
(Tested on all major versions between 3.9 and 3.14b)

main task cancelled, starting cleanup

Exception in callback Task.task_wakeup(<Future cance...events.py:459>)
handle: <Handle Task.task_wakeup(<Future cance...events.py:459>) created at /tmp/python/reproducer.py:4>
source_traceback: Object created at (most recent call last):
  File "/tmp/python/reproducer.py", line 18, in <module>
    asyncio.run(main(), debug=True)
  File "/usr/lib64/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
  File "/usr/lib64/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/usr/lib64/python3.13/asyncio/base_events.py", line 706, in run_until_complete
    self.run_forever()
  File "/usr/lib64/python3.13/asyncio/base_events.py", line 677, in run_forever
    self._run_once()
  File "/usr/lib64/python3.13/asyncio/base_events.py", line 2026, in _run_once
    handle._run()
  File "/usr/lib64/python3.13/asyncio/events.py", line 89, in _run
    self._context.run(self._callback, *self._args)
  File "/tmp/python/reproducer.py", line 4, in aux
    main_task.cancel()
Traceback (most recent call last):
  File "/usr/lib64/python3.13/asyncio/events.py", line 89, in _run
    self._context.run(self._callback, *self._args)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded while calling a Python object

The program contains a logical flaw, but is the RecursionError an expected result or is it a bug?

jamesdow21 · June 20, 2025, 12:19am

I don’t think this would count as a bug, but is more of a natural conclusion of how the asyncio model works, particularly this section from the Task documentation:

To cancel a running Task use the cancel() method. Calling it will cause the Task to throw a CancelledError exception into the wrapped coroutine. If a coroutine is awaiting on a Future object during cancellation, the Future object will be cancelled.

Essentially what is happening is:

The asyncio event loop is started and asked to run until main() is finished
main schedules a task to run aux with its own task as an argument
main is set to wait for a 5 second sleep to complete and yields control of the event loop
aux starts running and requests that main’s task be cancelled
main’s task was asked to be cancelled, so it cancels the sleep, and raises CancelledError inside main
main catches the CancelledError and requests that aux’s task be cancelled
main waits for aux’s task to finish and yields control of the event loop
aux’s task was asked to be cancelled, so it cancels the task it was waiting on (main’s task)
main’s task was asked to be cancelled, so it cancels the task it was waiting on (aux’s task)
Go to Step 8

Those last few steps keep repeating in a recursive loop until you eventually hit the recursion limit

xitop · June 20, 2025, 6:30am

Thanks for your step by step analysis of what is happening.

I’m still not sure if this should not be prevented in asyncio. A task cancelling other task (steps 8 and 9) could set an internal flag “I have cancelled this one” and then don’t repeat this step, but log a warning instead, at least in the DEBUG mode.

I don’t know the internals of Python’s recursion limit, but anything related to overflowing stack looks like a potential danger to me.

JamesParrott · June 20, 2025, 11:08am

It’s an interesting “what if”, but it’s basically like a fight to the death between the two tasks. Weird code produces weird behaviour - quel-surpise?

It’s not even clear (to me at least) what the dev should expect to happen. Is it a case of which one should win, like resolving a race condition? Or should both tasks be cancelled, in case some kind of timing/ synchronisation mechanism should be used to ensure that?

But less is more. Why not simply have something else entirely kill either the one that should lose, or both of them? The host process could even do nothing more than call sys.exit() if it should be ended too.

If the expectations for how to resolve a case can’t be defined clearly an unambiguously, then it’s futile developing asyncio to support the case.

I still haven’t the faintest idea of the intention behind two tasks that mutually cancel each other. But are other async models better suited for whatever the problem was in the first place? Could multiprocessing processes send signals to each other? Is there a synchronization primitive in threading?

xitop · June 20, 2025, 12:40pm

I’m quite sure asyncio should not try to fix the deadlock-like situation.

I would prefer some helpful warning or error message that would point the developer in the right direction. I encountered the problem in a more complex program and it was not obvious at all, what was triggering the RecursionError.

The main reason I posted the question was that maybe it is also triggering something in asyncio that should not happen - a trap the asyncio library should protect itself from falling into. That’s the point I am not sure about and would like to get an answer from experienced members.

jamesdow21 · June 22, 2025, 5:32am

This has stuck in my head quite a bit, so I’ve been looking at the asyncio implementation more closely and have rewritten the steps from my previous reply to add more details and be more explicit about the order and timing of the event loop

(I’m definitely not an expert though, so there might be some minor errors, but this is my current best understanding of how the base asyncio event loop works)

The asyncio event loop is started and asked to run until main() is finished
The event loop creates a task for main
The event loop starts running main
main creates a task aux_task to run aux with its own task main_task as an argument, which the event loop schedules to run on the next cycle.
main calls asyncio.sleep and awaits the result
main_task processes the result from main, which is a Future that asyncio.sleep created (I’ll refer to as sleep_future), by setting that it is waiting on sleep_future and adds its own wakeup method as a callback when sleep_future is done
The event loop finishes its current cycle and starts the next cycle, seeing that only aux_task is ready
aux starts running and calls main_task.cancel()
main_task.cancel marks that main_task was requested to be cancelled by incrementing a counter but doesn’t actually cancel itself directly, instead calling the cancel method on the Future that main_task is awaiting (i.e. sleep_future)
sleep_future.cancel marks sleep_future as cancelled and asks the event loop to unschedule its timer and schedule all of its callbacks (just main_task._wakeup) to run on the next event loop cycle
aux awaits main_task
aux_task processes the result from aux by setting that it is waiting on main_task and adds its own wakeup method as a callback for when main_task is done
The event loop finishes its current cycle and starts the next cycle, seeing that only main_task._wakeup is ready
main_task starts running, which gets a CancelledError from sleep_future and throws it inside main
main catches the CancelledError and then calls aux_task.cancel()
aux_task.cancel calls the cancel method of the task it is awaiting (main_task)
Because main_task is currently running and not awaiting anything, main_task.cancel sets a flag that main_task should be cancelled at the next opportunity
main awaits aux_task
main_task processes the result from main by setting that it is waiting on aux_task, then sees the flag from step 17, so it calls the cancel method of what is waiting on (which was just set to aux_task)
aux_task.cancel method calls the cancel method of what aux_task is waiting on (main_task)
main_task.cancel calls the cancel method of what main_task is waiting on (aux_task)
Go to Step 20, repeat until the recursion limit is hit

Essentially, this is all due to the fact that calling Task.cancel does not directly cancel a task, but instead either:

If the task is awaiting something, cancels whatever the task is awaiting
If the task is scheduled to run, leaves a flag to throw a CancelledError into its coroutine when it starts running
If the task is currently running, leaves a flag to either cancel whatever its coroutine ends up awaiting or discard the result and directly mark itself as cancelled if its coroutine returns

I would still lean that getting a RecursionError is “working as intended”, since calling Task.cancel can be viewed as “If this task is waiting on something, recursively iterate through a list of tasks, each waiting on the next one, until a base case of something that isn’t waiting on anything”

You’ve just got a setup where that recursive iteration is going around in a circle instead of reaching an end, essentially the same as

@dataclasses.dataclass
class ListNode:
    next: ListNode | None = None

    def get_end(self) -> ListNode:
        if self.next is not None:
            return self.next.get_end()
        return self

main = ListNode()
aux = ListNode()

aux.next = main
main.next = aux

aux.get_end()
# aux is waiting on main which is waiting
# on aux which is waiting on main... and so on to infinity

asyncio.Task and await are the nodes and directed edges that you use to build a directed graph, but it ends up either deadlocking or getting caught in an infinite loop if you don’t build a directed acyclic graph.