Can I get a second opinion on an asyncio issue?

guido · August 24, 2022, 8:43pm

In Tracking the source of cancellation in tasks - #6 by achimnol there’s a discussion about debuggability of task cancellations.

The OP is asking to “un-deprecate” the cancel message and moreover add a default behavior that puts the caller’s file/line information in the cancel message so that whenever a task is cancelled, we have at least some idea of the reason for the reason of the cancellation – the OP claims this helps debugging “mystery cancellations” and other situations. Note that the caller doesn’t have an exception object that can be chained, and the CancelledError is instantiated much later when it must be thrown into the task’s coroutine.

I personally simply don’t have enough experience writing and debugging asyncio applications to have a sense for how important it is to improve the debuggability of cancellations.

achimnol · August 25, 2022, 3:23am

I encountered another case in one of my customer sites this morning: too many asyncio tasks are created and they made the server to stop working.
There are many potential causes of this such as network/disk failures, 3rd-party component (e.g., database) failures, security rule changes of customer firewalls, bugs in our codes, etc.
First, we need to determine the location of the problem: the task code or the task creation code.

The above screenshot is taken from aiomonitor, showing the following information:

The number of tasks created so far
The list of tasks currently attached to the event loop with their repr()
The stack trace and status (“PENDING”) of each task

with extra functionality of aiomonitor to cancel a task or send a signal to the entire process.
Our support staff has suffered from the missing information: where these tasks are created?

Here are my ideas to improve visiblity and debuggability of asyncio to help him.
Currently, we have the following facilities:

Tracking task cancellations (the original post I’ve written)
- Task.cancel(msg)
Tracking task creations (the situation described above with aiomonitor)
- asyncio.create_task(..., name)

Here are the issues with them:

It will be helpful for debugging this kind of situation if we deliberately set msg and name parameters everywhere.
- In reality, it is not expandable to 3rd-party libraries that I cannot control.
- It is the programmer’s burden to specify them everywhere.
The msg and name arguments are written by humans and we cannot expect consistent formatting and fidelity of the information they have for programmatic use.
- Example of programmatic use
  - Run different cleanup procedures or leave different logs in the task by the source of cancellation, such as “timeout” or “shutdown”
  - Collect statistics of cancellation triggers and task creations by their locations in aiomonitor
There is no chained information when tasks are created in tasks or tasks are cancelled by another cancellation.

Here are my initial suggestion as described by Guido:

Let’s attach the caller stack/location information to asyncio.CancelledError when calling Task.cancel(), either reusing the msg argument, adding a new attribute, or adding a hook like sys.audit()

Now I observe that the same measure is required for task creation.

Additionally, I’m now going to start a side project with my friend to enhance aiomonitor in various ways. I’ll create a new thread about that topic.