Enhancement proposals to aiomonitor

achimnol · August 25, 2022, 7:00am

Along with the task cancellation/creation tracking thread, I’d like to propose some enhancement ideas for aiomonitor, from which I’m getting great help for debugging production issues.

UX

Currently it opens a local TCP port that aiomonitor.console can attach to and run inspection commands, with direct REPL over the socket.

When there are thousands of tasks, simply dumping all task’s repr() does not help much.
(This is what happened in my customer site!)

Key features
- A scrollable / orderable / filterable view of active tasks
- A statistics view of atcive tasks (such as the number of tasks) by the location where they are created
- A scrollable / orderable / filterable view of (long-running) task groups
  - All its child tasks
- A scrollable / orderable / filterable view of async generators
- An active task inspection view with:
  - Its stack trace
  - The “creation source” traceback (where create_task() is called, possibly chained)
  - The location and line of the “await” statement or the event loop’s handle where it’s currently blocked on
- A scrollable / orderable / filterable view of cancelled tasks
- A cancelled task inspection view with:
  - Its stack trace (at which point it is cancelled)
  - The “creation source” traceback (same as above)
  - The “cancellation source” traceback (where .cancel() is called, possibly chained)
  - To prevent memory leak, we could just copy these information to aiomonitor upon actual cancellations and expire the information after a configured timeout or limit by the number of historical tasks
- Filter / order / group conditions: the creation/cancellation source location, the task names, the timestamp when they are created, terminated, or cancelled
Reference
- tokio_console - Rust

Extensions

Let’s reuse and extend the above views with specialization to various asyncio-based libraries and frameworks such as:

FastAPI & aiohttp web request handlers
aiohttp ClientSession
asyncio TaskGroup
Synchronization primitives such as asyncio Queue, Lock, Semaphore / janus Queue
- The list of tasks that are blocked by each object or acquired one
Transaction blocks of databases (SQLAlchemy / databases / asyncpg / aiosqlite / aiomysql / etc.)
Connection pools of databases (redis-py / SQLAlchemy / databases / etc.)

I think we could incorporate pytest’s plugin system based on entrypoints to auto-detect and enable such additional integrations.

About the stdlib support

To keep track of intrinsic objects such as asyncio.Queue and asyncio.Lock used by arbitrary asyncio-based codes and libraries, we need a registry to have (weak) references to them. I’m not sure how to achieve this solely in aiomonitor without modifying the stdlib asyncio.

Also, as I mentioned in other threads, we need some hooks in asyncio to take the event of task creation, cancellation, and completions. AFAIK, asyncio already allows using a custom Task factory by an alternative event loop implementation, but I’d like to keep using existing event loops (the vanilla one or uvloop) while extending Task to provide hooks for aiomonitor.

My questions

How do you think about my proposed enhancements to aiomonitor?
What are your experiences on debugging complex asyncio-based applications in production?
Do you expect that the above proposals could improve your experience?
What are the required modifications in the asyncio stdlib to achieve this goal? How could we minimize the changes of the stdlib asyncio?

achimnol · August 25, 2022, 4:15pm

I’m now experimenting by forking aiomonitor.
In the asyncio’s debug mode, Task._source_traceback has the stack frames where the task is created and repr(Task) also includes this information, though this difference is not explicitly documented. _source_traceback is provided by asyncio.coroutines.CoroWrapper and asyncio.events.Handle.
It would be much better if we have a chain of _source_traceback, of course.

With the asyncio debug mode enabled, repr() of a task additionally prints “created at …”. Spawning additional tasks by calling this example server using curl shows that it shows where create_task() is called.

Here is my humble “tc” / “trace_creation” command result:

In my case, I’d like to inspect production systems that are currently running without the debug flags and without stopping/restarting the application to apply the debug flag. Reagrding this, Q1. How much is the impact of enabling the asyncio debug mode in terms of performance and memory usage?

Still looking at how I could implement chaining of task creation tracebacks. asyncio.BaseEventLoop.set_task_factory() seems to allow setting a custom task factory without replacing the underlying event loop. I’ll give it a try to have my custom tasks that keeps a chained traceback information of task creation.

achimnol · August 26, 2022, 4:28am

And… I could do it.

Note that the current codebase of aiomonitor seems to be a bit old (last release was 2019) and requires updates for latest Python (e.g., deprecation of loop arguments).