Asyncio hangs forever after all corouting threads are done

Hi all, I often get an asyncio program hanging indefinitely after all subprocesses spawned by all of its 10 co-routines are long gone and done:

# py-spy dump -p 274323
Python v3.12.6 (/usr/bin/python3.12)

Thread 274323 (idle): "MainThread"
    select (selectors.py:468)
    _run_once (asyncio/base_events.py:1948)
    run_forever (asyncio/base_events.py:641)
    run_until_complete (asyncio/base_events.py:674)
    run_workers (avocado_i2n/plugins/runner.py:345)
    run_suite (avocado_i2n/plugins/runner.py:381)
    run (avocado/core/suite.py:316)
    run_tests (avocado/core/job.py:672)
    run (avocado/core/job.py:619)
    run (avocado_i2n/intertest_setup.py:437)
    run (avocado_i2n/plugins/manu.py:101)
    run (avocado/core/app.py:111)
    run_suite (cli/run_api/suite.py:188)
    run_manu (cli/run_api/suite.py:256)
    run (cli/run_api/suite.py:1283)
    run_module (qa:214)
    run (qa:229)
    run (qa:913)
    main (qa:1641)
    <module> (qa:1653)
Thread 275198 (idle): "asyncio_0"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 276863 (idle): "asyncio_1"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 278758 (idle): "asyncio_2"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 280633 (idle): "asyncio_3"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 282964 (idle): "asyncio_4"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 286743 (idle): "asyncio_5"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 287271 (idle): "asyncio_6"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 287491 (idle): "asyncio_7"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 288269 (idle): "asyncio_8"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)
Thread 288645 (idle): "asyncio_9"
    _worker (concurrent/futures/thread.py:89)
    run (threading.py:1012)
    _bootstrap_inner (threading.py:1075)
    _bootstrap (threading.py:1032)

I can see that the process is epolling forever and there no input ever arrives at the given file descriptors below:

lsof -p 274323 | grep 9
# python3 274323 root    9u  a_inode               0,16           0      3107 [eventpoll:11,14,75]
readlink /proc/274323/fd/11
readlink /proc/274323/fd/14
readlink /proc/274323/fd/75
# socket:[1055883]
# socket:[1055885]
# socket:[14923548]
stat /proc/274323/fd/11
#  File: /proc/274323/fd/11 -> socket:[1055883]
#  Size: 64              Blocks: 0          IO Block: 1024   symbolic link
#Device: 0,61    Inode: 20669420    Links: 1
#Access: (0700/lrwx------)  Uid: (    0/    root)   Gid: (    0/    root)
#Access: 2025-04-08 23:06:02.512448730 +0800
#Modify: 2025-04-08 23:06:02.511448731 +0800
#Change: 2025-04-08 23:06:02.511448731 +0800
# Birth: -
fuser -v /proc/274323/fd/11
#                     USER        PID ACCESS COMMAND
#/proc/274323/fd/11:  root      274323 F.... python3

Any help regarding how to debug this would be greatly appreciated as I am not sure it is something worth reporting as a bug yet and it seems to be some race condition with the co-routines.

Can you show your code? Does it reproduce on 3.12.10?

Could it be this: asyncio.create_subprocess_exec does not respond properly to asyncio.CancelledError · Issue #103847 · python/cpython · GitHub ? Is it that you are leaving your main asyncio.run task with other tasks still pending?

So far I have produced this on Python 3.12.9

Python 3.12.9 (main, Mar 31 2025, 00:00:00) [GCC 14.2.1 20240912 (Red Hat 14.2.1-3)] on linux

I hope I understand this right but in my case no cancellations take place, the code that waits for the threads looks like this:

        slot_workers = sorted([*graph.workers.values()], key=lambda x: x.params["name"])
        to_traverse = [graph.traverse_object_trees(s, params) for s in slot_workers]
        asyncio.get_event_loop().run_until_complete(
            asyncio.wait_for(asyncio.gather(*to_traverse), self.job.timeout or None)
        )

and the hang happens at the very end where all these workers are done with the subprocesses they have spawned (from what I can see in the related logs).

Please show your full code, including imports. A runnable example

Note that asyncio.get_event_loop() is deprecated you should be using asyncio.run

Sharing the entire code base about this is not possible here but if you really insist feel free to look around avocado-i2n/avocado_i2n/plugins/runner.py at 31126641e18ce369ba1695771c599bc7a1506239 · intra2net/avocado-i2n · GitHub.

I could try to get a smaller reproducer but this will likely take time. Note that I didn’t have such troubles for 1-2 years with the same code and what I am reporting above is recent.

Indeed, I am aware but could not migrate yet. I still don’t expect this to make up for the difference for the time being though but thanks for mentioning this.

Can you print the stacks of all the currently running coroutines when it hangs?

Eg using stackscope and all_coros= [t.get_coro() for t in asyncio.all_tasks()].