As an APM vendor (Datadog), I’d love to be able to provide deeper insights/metrics about the asyncio event loop for our customers.
Some of the metrics I have been thinking about so far are:
- Over the lifetime of the process
- total number of tasks created/finished
- total number of context switches
- min/avg/p95/p99/max/etc blocking time per-task run
- Per-task
- total number of context switches
- total blocking vs idle time
I’d also like to be able to associate the per-task metrics on any active APM tracing spans (“span” = duration + metadata measuring a function or arbitrary code block). This way I can show when a traced function is responsible for blocking the event loop for a certain period of time (similar concept to the existing debug logging in asyncio).
I wasn’t able to figure out a great way to get these metrics from the existing implementation/API (if I missed something, please let me know!).
I made an attempt at a reference implementation to validate one potential approach.
This approach adds sys.set_async_monitoring_hooks
/sys.get_async_monitoring_hooks
which allows users to add hooks for the following events:
-
task_enter(task)
- Called whenModules/_asynciomodule.c:enter_task
is called, a task is being context switched to. -
task_leave(task)
- Called whenModules/_asynciomodule.c:leave_task
is called, a task is being context switched away from. -
task_register(task)
- Called whenModules/_asynciomodule.c:register_task
is called. -
task_unregister(task)
- Called whenModules/_asynciomodule.c:unregister_task
is called.
task_register
and task_unregister
don’t appear to be called as part of “normal” standard library only asyncio
usage, so I am not convinced of their value here, but I added them to play with them.
Example usage
This is a very rudimentary example to show the high level API usage.
import asyncio
import sys
def task_enter(task: asyncio.Task):
print("Entering task:", task.get_name())
def task_leave(task: asyncio.Task):
if task.done():
print("Task completed:", task.get_name())
else:
print("Leaving task:", task.get_name())
sys.set_async_monitoring_hooks(
task_register=None,
task_enter=task_enter,
task_leave=task_leave,
task_unregister=None,
)
async def task():
pass
async def main():
await asyncio.gather(*[task() for _ in range(100)])
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
Why event hooks vs built-in metrics?
An obvious alternative here is to have asyncio
maintain a set of metrics and expose them via an API that can be polled when the data is needed.
I think this is a good idea in the long run, but I thought adding the hooks could be a good intermediate step which allows the community to try generating their own metrics which we can then validate exactly which are valuable to have built-in.
As well, for some of my own specific use cases built-in, metrics would not be enough. For an APM tracing product, I’d like to be able to compute/access these metrics on a per-task basis while the task is still active to associate with any current active spans (I basically need to access contextvars from the currently active task).
Prior Art
Greenlet offers a settrace hook that allows you to register a function that gets called every time a context switch occurs.
Example:
import greenlet
def callback(event, args):
if event in ('switch', 'throw'):
origin, target = args
print("Transfer from %s to %s with %s"
% (origin, target, event))
greenlet.settrace(callback)
I may have missed something, but I had trouble consistently getting the current and next tasks to be able to offer a similar API with a single hook (e.g. callback(from_task, to_task)
).