Make asyncio eager task factory default

asvetlov · December 24, 2024, 1:00pm

Now asyncio uses lazy task factory by default; it starts executing task’s coroutine at the next loop iteration.
Optional asyncio.eager_task_factory() was added in Python 3.12. It generally makes the code a little faster, but eager factories are not 100% compatible with lazy ones, especially if the test relies on deferred execution.

I propose making eager factory default:

loop.create_task() calls asyncio.Task(..., eager_start=True).
Add create_lazy_task_factory() and lazy_task_factory() functions optionally switch to the previous behavior if needed.
Python test suite works with eager tasks, but a test or test case could install lazy factory if making the test green requires a lot of changes.
Third parties could be affected if their test suite relies on lazy behavior; they could also use lazy_task_factory if needed.
The change is safe; the current code keeps working as is in 99.99% of cases. Only very specific scenarios could show the difference between lazy and eager behavior.

As a side effect, new defaults are better understandable by users. I recall discussions for code like the following:

task = asyncio.create_task(f())
task.cancel()

With lazy tasks f() is not executed before it is cancelled, try/except block inside this async function doesn’t catch CancelledError. To solve this confusing behavior, we either suggest installing the eager factory or putting await asyncio.sleep(0) just after create_task() call to give f() a chance to start its execution.
I believe that changing the default behavior could avoid this misunderstanding for asyncio users.

@yselivanov @kumaraditya303 @guido do you have objections?

Tinche · December 24, 2024, 6:59pm

I’m in favor of this since it removes a foot gun - the performance gain itself probably wouldn’t make it worth it. So a +1 from me.

graingert · December 25, 2024, 9:14am

If this does happen, can we get a asyncio.create_task(coro, eager_start=False) to opt out in code that’s run under code that’s opted in?

asvetlov · December 25, 2024, 10:24am

I’m not sure; it requires a separate discussion.
task = asyncio.lazy_task_factory(loop, coro) should work though.

asvetlov · December 25, 2024, 12:32pm

Also, eager task could be easily converted into the lazy one by adding await asyncio.sleep(0) in the very beginning of the called async function.

graingert · December 25, 2024, 1:26pm

The issue is not knowing if it will be eager or lazy in advance, so not knowing if you should delay or not. We also want to keep the passed in coro as the coro of the task to keep current_task.get_coro() working

graingert · December 25, 2024, 1:30pm

We don’t want to call the task factory directly, we still want to go via the set task factory on the loop because there could be other side effects there. It’s used to add extra features to tasks or add extra debug information and we want to keep supporting that. We just want the eagerness toggleable on create

asvetlov · December 25, 2024, 1:43pm

Why? What is your use case? Could it be solved by await asyncio.sleep(0)?
Asyncio supports pluggable tasks, third-party event loop could provide own tasks that are not inherited from asyncio.Task. IIRC tornado used to support custom tasks, maybe some library does it as well. These custom tasks could not accept eager_start argument. I see this as the main barrier for adding this feature to asyncio.create_task(). Maybe we can do it without breaking backward compatibility, I don’t know. Let’s eat an elephant piece by piece.

guido · December 25, 2024, 2:31pm

I definitely agree that eager tasks are the future and eventually we should just deprecate and then remove the option to use lazy tasks.

But I think it’s too soon to turn it on by default in 3.14. I would want a super simple way for users to decide whether to use lazy or eager, probably at event loop creation (maybe a new flag to asyncio.run()?)flag,

We can then advertise this flag, and do a careful deprecation cycle.

Rosuav · December 25, 2024, 2:42pm

The docs currently recommend setting the task factory. How would such a super-simple way interact with other task factories?

(I’ve never used custom task types so I have no idea what the consequences would be. All I know is, that one-liner from the docs does indeed make Python behave the same way other languages do.)

asvetlov · December 25, 2024, 4:03pm

A flag sounds good.
Do I understood you correctly:

Start with def run(..., eager_tasks=False)
Eventually (py3.16?) deprecate False flag value and enforce people using asynio.run(..., eager_tasks=True) everywhere.
Later (py3.18?) make eager_tasks=True the default value.
Deprecate passing eager_tasks at all in py3.20
Drop the parameter entirely.

Could we omit some steps?

asvetlov · December 25, 2024, 4:06pm

asyncio.run() could install the proper task factory. If a user override the factory in his own code – it’s totally ok

guido · December 25, 2024, 4:54pm

We may have to allow forever. It doesn’t hurt anyone. eager_tasks=False will first be deprecated and eventually will be an error.

asvetlov · December 25, 2024, 8:17pm

BTW, I’ve switched all the CPython to eager tasks locally.
Tests were failed, it was expected.
To make the test suite green again I have added a dozen of await asyncio.sleep(0) lines in ./Lib/test folder.

The only thing was really broken: asyncio repl (./python -m asyncio).
It depends on very exact procedure of contextvars setup, one-line fix makes the repl working again. The fix is backward compatible, it can work with both lazy and eager tasks safely.

So, I think that the flip to eager tasks is safe, more or less.
But I agree with the migration plan proposed by @guido, it is much safer for the community.

graingert · December 25, 2024, 9:11pm

the usecase is anyio - we want the start_soon behaviour to match trio’s. We don’t want to introduce an await asyncio.sleep(0) to the user’s coroutine because we want asyncio.current_task().get_coro() to return that coroutine. We also don’t want to always introduce a sleep(0) because on lazy tasks we get two sleep 0s when we only want exactly 1

graingert · December 25, 2024, 9:17pm

I think it would be good to start by switching on eager tasks with the uvicorn, fastapi, starlette, litestar, aiohttp and httpx, jupyter, tornado etc test suites (we’re working on it for anyio but it looks like a big job) and once they’re all working start the deprecation

kumaraditya303 · December 26, 2024, 9:10am

I agree that in future eager tasks should become the default and like the idea of adding a new eager_tasks flag to asyncio.run.

From a performance standpoint, in very large applications, eager tasks make cause performance issues because if all tasks by default use it, it will put lot of pressure on the GC because the eager_tasks set is currently a strong set

github.com

python/cpython/blob/5c814c83cdd3dc42bd9682106ffb7ade7ce6b5b3/Modules/_asynciomodule.c#L109


      
          
          /* Dictionary containing tasks that are currently active in
             all running event loops.  {EventLoop: Task} */
          PyObject *current_tasks;
          
          /* WeakSet containing scheduled 3rd party tasks which don't
             inherit from native asyncio.Task */
          PyObject *non_asyncio_tasks;
          
          /* Set containing all eagerly executing tasks. */
          PyObject *eager_tasks;
          
          /* An isinstance type cache for the 'is_coroutine()' function. */
          PyObject *iscoroutine_typecache;
          
          /* Imports from asyncio.events. */
          PyObject *asyncio_get_event_loop_policy;
          
          /* Imports from asyncio.base_futures. */
          PyObject *asyncio_future_repr_func;

and in the future wrt free-threading, there would be lock contention around this global set. I am thinking of switching out this global set in favor of linked-lists implementation as I did that for regular tasks GH-107803: double linked list implementation for asyncio tasks (GH-10… · python/cpython@4717aaa · GitHub.

I can look more into this after finishing the policy deprecation which I am currently working on.

guido · December 26, 2024, 5:55pm

This makes me realize that global collections anywhere in asyncio are probably a bad idea under free-threading. This is off-topic here, but I think this is a strong argument for replacing global collections with per-loop collections (IIRC @yselivanov was resisting that in a PR somewhere).

asvetlov · December 26, 2024, 7:38pm

Looks like we have a consensus that global collections should be moved into the loop instance.
But we cannot just move; for the backward compatibility we should keep globals as the last resort if a loop implementation doesn’t support it, right?

guido · December 26, 2024, 8:37pm

You mean to support older versions of e.g. uvloop?