Make asyncio eager task factory default

graingert · January 3, 2025, 7:32am

I think we should update the low-level loop.create_task function such that that it supports eager_start=False and eager_start=True, (defaulting to a soon to be deprecated eager_start=None which uses the task factory default) this would be needed so anyio’s TaskGroup can call eager_start=False and asyncio’s create_and_start can call eager_start=True:

There’s no need to add new args to high level TaskGroup.create_task and asyncio.create_task

asvetlov · January 3, 2025, 10:21am

@graingert forcing users to always specify eager_start=False in each create_task() doesn’t sound friendly.

Also, if you propose enforcing the eager_start argument, why do you think this parameter is not required for TaskGroup.create_task?

From my perspective, await create_and_start(...) and create(..., eager_start=True) both introduces task coloring. If we need colors, I prefer the await create... version proposed by @Tinche.

But, if I understand @guido correctly, he wants to keep the only coarse-grained task creator; I like it for simplicity.

graingert · January 3, 2025, 10:30am

I’m not proposing to enforce eager_start=False in all tasks, just that you can either set the task factory or pass the low level flag to loop.create_task if you’re making a structured concurrency object

Yeah @guido wants only the course grained option, but we need the opt out to be fine grained in anyio: we need to associate metadata with a task before it calls coro.send(), such as name and CancelScope heritage. I have a work-around but it’s fiendishly complicated

Having await asyncio.TaskGroup.create_and_start(...) be the only way to create eager tasks and eager_task_factory is deprecated also works and I’m +1 to that

asvetlov · January 3, 2025, 10:53am

@graingert task name problem is fixable since eager_task_factory already accepts name argument.
Could you please describe CancelScope problem? Do you need task instance before the task starts its execution?

graingert · January 3, 2025, 11:00am

yes, we need the task instance before it starts execution, so when a cancel scope is cancelled it can cancel all the contained tasks

dimaqq · January 3, 2025, 2:23pm

I’d like a clarification about the function call stack when eager tasks are in use.
Is the new coro head on top of caller’s stack, or in a separate stack?

In other words, does the 1000 stack frames limit apply in the case of async recursion?

async def fib(n):
    if n < 2: return n
    a = create_task(fib(n-1))
    b = create_task(fib(n-2))
    return await a + await b

asvetlov · January 3, 2025, 2:57pm

Regarding stack usage, both lazy and eager tasks do the same. The only difference is that the lazy task calls loop.call_soon(self.coro.send, None) but the eager one calls self.coro.send(None) immediately.
Both operate with self.coro, which is a coroutine instance (kind of Python generator) with a Python frame instance embedded into the generator object.

guido · January 3, 2025, 7:38pm

Good point. There is indeed a difference! I tried it with factorial (which exercises recursion without doubling the load at each level) and without eager tasks, any depth is calculated correctly, no matter how low the recursion limit is set. (And, presumably, a bit slower, but I didn’t try to measure that.)

I tried the same with eager tasks, and it takes around 4 stack levels per recursion level (plus a small fixed offset). This means that with a recursion limit of 1000 (the default), you could compute factorial up to 248.

I imagine for a typical web server this isn’t a problem (and on most large machines you can safely make the recursion limit much higher), but for esoteric tests like fib() or fac() it does indeed matter.

Here’s my code:

Summary

import asyncio
import sys

async def fact(n):
    if n <= 1:
        return n
    f = await asyncio.create_task(fact(n - 1))
    return f*n

async def main():
    sys.setrecursionlimit(1000)
    loop = asyncio.get_running_loop()
    # loop.set_task_factory(asyncio.eager_task_factory)
    print(await fact(249))

asyncio.run(main())

asvetlov · January 3, 2025, 8:18pm

Sorry for misleading. Eager task stores its frame in a coroutine, but in the example, it calls create_task, which uses a normal stack frame; in turn, it again creates a coroutine that needs a regular stack frame, and so on.
There is a recursion here, but I agree that the typical asyncio code is usually written without a recursion in any sense.
If the code has a real suspension point, the problem will be gone.

dimaqq · January 6, 2025, 1:51am

I think this change proposal needs a “How to teach this” section, like PEPs do.

I’m not sure about generals words (not self.coro.send(None) though, too low level, not comprehensible to regular users), and a list of caveats, for example:

async def first():
    coro = second()
    print(1)
    await coro

async def second():
    coro = third()
    print(2)
    await coro

async def third():
    coro = sleep(1)
    print(3)
    await coro

run(first())
# prints, as expected
# 1
# 2
# 3

contrast with

async def first():
    task = create_task(second())
    print(1)
    await task

async def second():
    task = create_task(third())
    print(2)
    await task

async def third():
    task = create_task(sleep(1)
    print(3)
    await coro

run(first())
# prints, "surprisingly"
# 3
# 2
# 1

Why this matters?

For me, it’s for two cases:

passing a reference to task somewhere
resource allocation like sockets, file handles, maybe asgi middlewares

I’m also curious to understand what actually drives the performance improvement.

Is it the fact that first segment of the new task executed on the spawner’s stack?

Or is the event loop “bottom” inefficient?

Could the latter benefit from a separate fix?

dimaqq · January 6, 2025, 2:15am

Asyncio code with recursion does exist, here’s one example: Async os.walk attempted solution · Issue #167 · Tinche/aiofiles · GitHub

Arguably, a solution with a todo list / queue / pool could be better, and I’m not aware of real code that uses create_task with recursion.

That being said, the proposed change is rather fundamental, and might just break someone out there. Here’s one example to consider, albeit more of a toy: python - Mutually recursive coroutines with asyncio - Stack Overflow

alicederyn · January 10, 2025, 9:29am

Am I correct that this also solves the stack overflow problem, since the coro that is creating the eager task can be temporarily suspended, and then immediately resumed once the eager task has been run up?

graingert · January 12, 2025, 12:44pm

No in this case there would be an await, but deceptively no yield

Tinche · January 12, 2025, 3:08pm

If I understand the problem correctly, I don’t think this solves it. But I also think this is a red herring.

mikeshardmind · January 12, 2025, 4:21pm

I think the eager task factory is too large a gain in performance in many typical applications to be worth dropping support for, and asking users to change how every library they use creates tasks instead isn’t appropriate.

Library code should work with any asyncio-compatible task factory, application code or frameworks that manage event loops for users should be what chooses the appropriate default task factory.

I don’t particularly like adding fine-grained switches here, as it probably means that task factories need to accept a kwarg to override their own default behavior.

There may be a way to avoid that with setting a task factory, an eager task factory, and setting the loop default, but I haven’t explored the consequences of that in depth because I think the end goal here should be that library code works with either, and application code chooses the option that best matches the application they have.

Things like early-return cache hits are significantly improved performance-wise with eager task factories. Things that are unlikely to ever return early, or application authors that adopt the philosophy that all tasks should yield (and even trio, where this idea gained more popularity, relaxed this stance) likely shouldn’t use that factory.

I think that libraries like anyio shouldn’t care what the underlying behavior actually is, because it’s inherently the wrong layer of design to be making that decision at.

Tinche · January 12, 2025, 5:45pm

To my mind, we have two options here:

Ask every library to switch to eager tasks, and until they do they’re inefficient
Ask every library to test against both types of factories forever, and until they do, they’re potentially buggy on one or the other

The third option, libraries make no changes and it all just works, seems like wishful thinking. Maybe I’m wrong though.

guido · January 12, 2025, 10:05pm

Honestly, depending on the library, its users may not care one it about its performance.

IIUC there are two subtly different ways that a library can be buggy when eager tasks are turned on:

It might rely on the implicit atomicity of tasks, where every stretch of code between two await calls or between an await and the beginning or end of the task’s coroutine. Here the spawning coroutine and the spawned coroutine suddenly share a critical section whereas with lazy tasks they would be in separate critical sections. This is a change in semantics between lazy and eager tasks that weighs on me (and has convinced me that we need to keep supporting lazy tasks for a very long time indees).
It might depend on the subtle semantics of calling or not calling await asyncio.sleep(0) and how it affects the order in which tasks run, etc. While I do this too to force things in tests (its much better than sleep(dt) for dt > 0), I think that if a library depends on this, that’s actually a bug. If you have an ordering requirement between tasks it should be explicit in the code (e.g. using asyncio.Event), not implied by the implementation of the event loop. If testing with eager tasks finds such bugs, we’re doing the library maintainers a favor.

In general I completely support Mike’s most recent post.

Tinche · January 12, 2025, 11:30pm

Let’s refer to this as the asyncio invariant going forward, to save us some typing.

Yes, it’ll be a large wart. Application operators get the Sophie’s choice of better performance or keeping the asyncio invariant, and library authors don’t get a choice - they can’t count on the invariant.

Alright, in the interest of moving the discussion forward, I withdraw my proposal.

mikeshardmind · January 12, 2025, 11:40pm

This is essentially the case that caused

I’ve been exploring this more, and I’ve found a reasonable change to fix this properly without relying on eager/lazy semantics, this being something to unravel rather than simply change is a bit of a self-inflicted problem when looking at the change history, and there’s decisions we definitely wouldn’t repeat involved.

I’m not sure what the right amount of time is here, but I have several comments in both work and personal projects that look like this:

    async def _finalize(self) -> None:
        # WARNING: Do not allow an async context switch before the gather below

One could argue this is a case for a lock, but relying on structured concurrency to control ordering is generally something people have been able to rely upon. By not using a lock here, the thing in question is more performant.

Being able to rely on that isn’t actually going away here, it’s changing to include create_task as a potential context switch, but I do need to track down all such comments and find the ones where the semantics are now possibly different, especially in library code, and I think it would be to everyone’s benefit to avoid adding further new async context switches being possible unless the benefit is as large as eager tasks have been, it should not be a common occurance.

mikeshardmind · January 13, 2025, 1:10am

This is the current state (main branch as of this post) of what’s changed in just the standard library in terms of semantics, if it helps anyone evaluate how much has actually changed, a timeline on if or when we can make eager tasks default, figuring out how to explain this to others, or just checking your own code to see if you might need to double-check.

In terms of new places where the current task can change where it couldn’t before

all of the below in the “as of eager tasks” section
Any non-coroutine function, that does one of the new below while an event loop is running

Before eager tasks

await
async with (both enter and exit)
async for (per loop)
return (from a coroutine)
yield (from a coroutine)
Raising an exception (that leaves a coroutine)

As of eager tasks, the below are included

Constructing an asyncio.Task with eager_start=True while the associated event loop is running
asyncio.create_task
asyncio.ensure_future (when wrapping a non-future awaitable)
asyncio.wait_for^[1] (when any non-future awaitables are waited on) (ensure_future)
asyncio.as_completed^[1:1] (when any non-future awaitables are waited on) (ensure_future)
asyncio.gather^[1:2] (when any non-future awaitables are gathered) (ensure_future)
asyncio.IocpProactor.accept (unlikely to impact most) (ensure_future on a coro)
asyncio.StreamReaderProtocol.connection_made (library use, perhaps?) (create task)
Constructing asyncio.BaseSubprocessTransport (create task)
- Note: extremely unlikely to impact anyone, asyncio.create_subprocess_* won’t cause this as a new yield point as the task in the transport isn’t created prior to where there already was task switching.

triggers at construction, not a coroutine function, may not be meaningfully different for those immediately awaiting ↩︎ ↩︎ ↩︎