Along with the task cancellation/creation tracking thread, I’d like to propose some enhancement ideas for aiomonitor, from which I’m getting great help for debugging production issues.
UX
Currently it opens a local TCP port that aiomonitor.console
can attach to and run inspection commands, with direct REPL over the socket.
When there are thousands of tasks, simply dumping all task’s repr()
does not help much.
(This is what happened in my customer site!)
- Key features
- A scrollable / orderable / filterable view of active tasks
- A statistics view of atcive tasks (such as the number of tasks) by the location where they are created
- A scrollable / orderable / filterable view of (long-running) task groups
- All its child tasks
- A scrollable / orderable / filterable view of async generators
- An active task inspection view with:
- Its stack trace
- The “creation source” traceback (where
create_task()
is called, possibly chained) - The location and line of the “await” statement or the event loop’s handle where it’s currently blocked on
- A scrollable / orderable / filterable view of cancelled tasks
- A cancelled task inspection view with:
- Its stack trace (at which point it is cancelled)
- The “creation source” traceback (same as above)
- The “cancellation source” traceback (where
.cancel()
is called, possibly chained) - To prevent memory leak, we could just copy these information to aiomonitor upon actual cancellations and expire the information after a configured timeout or limit by the number of historical tasks
- Filter / order / group conditions: the creation/cancellation source location, the task names, the timestamp when they are created, terminated, or cancelled
- Reference
Extensions
Let’s reuse and extend the above views with specialization to various asyncio-based libraries and frameworks such as:
- FastAPI & aiohttp web request handlers
- aiohttp ClientSession
- asyncio TaskGroup
- Synchronization primitives such as asyncio Queue, Lock, Semaphore / janus Queue
- The list of tasks that are blocked by each object or acquired one
- Transaction blocks of databases (SQLAlchemy / databases / asyncpg / aiosqlite / aiomysql / etc.)
- Connection pools of databases (redis-py / SQLAlchemy / databases / etc.)
I think we could incorporate pytest’s plugin system based on entrypoints to auto-detect and enable such additional integrations.
About the stdlib support
To keep track of intrinsic objects such as asyncio.Queue
and asyncio.Lock
used by arbitrary asyncio-based codes and libraries, we need a registry to have (weak) references to them. I’m not sure how to achieve this solely in aiomonitor without modifying the stdlib asyncio.
Also, as I mentioned in other threads, we need some hooks in asyncio to take the event of task creation, cancellation, and completions. AFAIK, asyncio already allows using a custom Task
factory by an alternative event loop implementation, but I’d like to keep using existing event loops (the vanilla one or uvloop
) while extending Task
to provide hooks for aiomonitor
.
My questions
- How do you think about my proposed enhancements to aiomonitor?
- What are your experiences on debugging complex asyncio-based applications in production?
- Do you expect that the above proposals could improve your experience?
- What are the required modifications in the asyncio stdlib to achieve this goal? How could we minimize the changes of the stdlib asyncio?