About the task cancel messages

I’ve heard that we are going to deprecate the msg parameter of Task.cancel() method introduced in Python 3.9.

I believe that it is still useful though the API redesign may be required, even considering that we don’t have to use it as a hack to keep track the cancellation scopes. For example, it could correspond to a signal number in a signal handler. Cancellation is a kind of a “big bomb” thrown to the entire control flow of an asyncio task, and I think it is still useful to know why that happened in the API user’s code. Depending on the reason of cancellation (e.g., timeout vs. server shutdown), we could run different cleanup steps or leave different log messages.

The current API is to just pass a single string object, but I think this could be updated to hold an arbitrary object, just like a concept of “tag”. Also, the new timeout context manager and TaskGroup should be able to specify this parameter when they cancel the target task.

How about your thoughts?

Recently an interesting paper about task cancellation has been published:
https://www.usenix.org/conference/osdi22/presentation/sethi
Although the paper did not mention Python at all, I think that Python 3.11’s asyncio already covers most issues.

In addition to the paper’s summary about cancellation patterns, I think we need to address:

  • Proper task (or task group) abstractions/extensions for long-running (open-ended) tasks, similar to my suggestion for PersistentTaskGroup (though the naming may need update)
  • Tracking the source of cancellation (probably via the path name of module who called .cancel() as an intrinsic argument to asyncio.CancelledError, as a replacement of cancel messages)
  • Prevention of cancelling shielded tasks or an improved abstraction of asyncio.shield() to guarantee completion of such tasks upon event loop shutdown
  • Tracking and potentially force-terminating hanging shielded tasks with explicit signals or timeouts
  • Prevention of in-mid or double cancelling the tasks who are now performing cleanup in reaction to a prior cancellation, or more clear examples and description in the docs to demonstrate how to avoid such situations – or we may need some better abstraction for cancellation

https://docs.sqlalchemy.org/en/14/changelog/changelog_14.html#change-721169f1bc0710a5a8c4f6f2d64a73b1

This highlights the issue of:

Prevention of cancelling shielded tasks or an improved abstraction of asyncio.shield() to guarantee completion of such tasks upon event loop shutdown

I skimmed the sqlalchemy issue you linked to and it appears to be due to a behavior in anyio that tries to emulate Trio. Their fix is to add an asyncio.shield() call and they seem to be happy with it. I don’t see how this points to a need for an improved shielding abstraction.

I have a request for you: rather than pointing to an academic paper that doesn’t even mention Python and extracting a list of high-level abstract recommendations, if you want this community to take some specific action I recommend that you split the topic up into separate threads (e.g. one for each bullet point you posted) and present a specific piece of demo code that exemplifies the problem (so people can test that a proposed fix actually resolves the issue) as well as a discussion (again for each thread) explaining why the demo should work the way you say it should work rather than the way it currently works.

If you are not capable of doing that I’m afraid that you’re not going to get any traction on your suggestions (already you have posted three messages and not received any responses).

2 Likes

Thanks for the comment.
I will add new separate threads here about each issues with more details when I have some time!