I think we should update the low-level loop.create_task function such that that it supports eager_start=False and eager_start=True, (defaulting to a soon to be deprecated eager_start=None which uses the task factory default) this would be needed so anyio’s TaskGroup can call eager_start=False and asyncio’s create_and_start can call eager_start=True:
There’s no need to add new args to high level TaskGroup.create_task and asyncio.create_task
@graingert forcing users to always specify eager_start=False in each create_task() doesn’t sound friendly.
Also, if you propose enforcing the eager_start argument, why do you think this parameter is not required for TaskGroup.create_task?
From my perspective, await create_and_start(...) and create(..., eager_start=True) both introduces task coloring. If we need colors, I prefer the await create... version proposed by @Tinche.
But, if I understand @guido correctly, he wants to keep the only coarse-grained task creator; I like it for simplicity.
I’m not proposing to enforce eager_start=False in all tasks, just that you can either set the task factory or pass the low level flag to loop.create_task if you’re making a structured concurrency object
Yeah @guido wants only the course grained option, but we need the opt out to be fine grained in anyio: we need to associate metadata with a task before it calls coro.send(), such as name and CancelScope heritage. I have a work-around but it’s fiendishly complicated
Having await asyncio.TaskGroup.create_and_start(...) be the only way to create eager tasks and eager_task_factory is deprecated also works and I’m +1 to that
@graingert task name problem is fixable since eager_task_factory already accepts name argument.
Could you please describe CancelScope problem? Do you need task instance before the task starts its execution?
I’d like a clarification about the function call stack when eager tasks are in use.
Is the new coro head on top of caller’s stack, or in a separate stack?
In other words, does the 1000 stack frames limit apply in the case of async recursion?
async def fib(n):
if n < 2: return n
a = create_task(fib(n-1))
b = create_task(fib(n-2))
return await a + await b
Regarding stack usage, both lazy and eager tasks do the same. The only difference is that the lazy task calls loop.call_soon(self.coro.send, None) but the eager one calls self.coro.send(None) immediately.
Both operate with self.coro, which is a coroutine instance (kind of Python generator) with a Python frame instance embedded into the generator object.
Good point. There is indeed a difference! I tried it with factorial (which exercises recursion without doubling the load at each level) and without eager tasks, any depth is calculated correctly, no matter how low the recursion limit is set. (And, presumably, a bit slower, but I didn’t try to measure that.)
I tried the same with eager tasks, and it takes around 4 stack levels per recursion level (plus a small fixed offset). This means that with a recursion limit of 1000 (the default), you could compute factorial up to 248.
I imagine for a typical web server this isn’t a problem (and on most large machines you can safely make the recursion limit much higher), but for esoteric tests like fib() or fac() it does indeed matter.
Here’s my code:
Summary
import asyncio
import sys
async def fact(n):
if n <= 1:
return n
f = await asyncio.create_task(fact(n - 1))
return f*n
async def main():
sys.setrecursionlimit(1000)
loop = asyncio.get_running_loop()
# loop.set_task_factory(asyncio.eager_task_factory)
print(await fact(249))
asyncio.run(main())
Sorry for misleading. Eager task stores its frame in a coroutine, but in the example, it calls create_task, which uses a normal stack frame; in turn, it again creates a coroutine that needs a regular stack frame, and so on.
There is a recursion here, but I agree that the typical asyncio code is usually written without a recursion in any sense.
If the code has a real suspension point, the problem will be gone.
I think this change proposal needs a “How to teach this” section, like PEPs do.
I’m not sure about generals words (not self.coro.send(None) though, too low level, not comprehensible to regular users), and a list of caveats, for example:
Am I correct that this also solves the stack overflow problem, since the coro that is creating the eager task can be temporarily suspended, and then immediately resumed once the eager task has been run up?
I think the eager task factory is too large a gain in performance in many typical applications to be worth dropping support for, and asking users to change how every library they use creates tasks instead isn’t appropriate.
Library code should work with any asyncio-compatible task factory, application code or frameworks that manage event loops for users should be what chooses the appropriate default task factory.
I don’t particularly like adding fine-grained switches here, as it probably means that task factories need to accept a kwarg to override their own default behavior.
There may be a way to avoid that with setting a task factory, an eager task factory, and setting the loop default, but I haven’t explored the consequences of that in depth because I think the end goal here should be that library code works with either, and application code chooses the option that best matches the application they have.
Things like early-return cache hits are significantly improved performance-wise with eager task factories. Things that are unlikely to ever return early, or application authors that adopt the philosophy that all tasks should yield (and even trio, where this idea gained more popularity, relaxed this stance) likely shouldn’t use that factory.
I think that libraries like anyio shouldn’t care what the underlying behavior actually is, because it’s inherently the wrong layer of design to be making that decision at.
Honestly, depending on the library, its users may not care one it about its performance.
IIUC there are two subtly different ways that a library can be buggy when eager tasks are turned on:
It might rely on the implicit atomicity of tasks, where every stretch of code between two await calls or between an await and the beginning or end of the task’s coroutine. Here the spawning coroutine and the spawned coroutine suddenly share a critical section whereas with lazy tasks they would be in separate critical sections. This is a change in semantics between lazy and eager tasks that weighs on me (and has convinced me that we need to keep supporting lazy tasks for a very long time indees).
It might depend on the subtle semantics of calling or not calling await asyncio.sleep(0) and how it affects the order in which tasks run, etc. While I do this too to force things in tests (its much better than sleep(dt) for dt > 0), I think that if a library depends on this, that’s actually a bug. If you have an ordering requirement between tasks it should be explicit in the code (e.g. using asyncio.Event), not implied by the implementation of the event loop. If testing with eager tasks finds such bugs, we’re doing the library maintainers a favor.
In general I completely support Mike’s most recent post.
Let’s refer to this as the asyncio invariant going forward, to save us some typing.
Yes, it’ll be a large wart. Application operators get the Sophie’s choice of better performance or keeping the asyncio invariant, and library authors don’t get a choice - they can’t count on the invariant.
Alright, in the interest of moving the discussion forward, I withdraw my proposal.
I’ve been exploring this more, and I’ve found a reasonable change to fix this properly without relying on eager/lazy semantics, this being something to unravel rather than simply change is a bit of a self-inflicted problem when looking at the change history, and there’s decisions we definitely wouldn’t repeat involved.
I’m not sure what the right amount of time is here, but I have several comments in both work and personal projects that look like this:
async def _finalize(self) -> None:
# WARNING: Do not allow an async context switch before the gather below
One could argue this is a case for a lock, but relying on structured concurrency to control ordering is generally something people have been able to rely upon. By not using a lock here, the thing in question is more performant.
Being able to rely on that isn’t actually going away here, it’s changing to include create_task as a potential context switch, but I do need to track down all such comments and find the ones where the semantics are now possibly different, especially in library code, and I think it would be to everyone’s benefit to avoid adding further new async context switches being possible unless the benefit is as large as eager tasks have been, it should not be a common occurance.
This is the current state (main branch as of this post) of what’s changed in just the standard library in terms of semantics, if it helps anyone evaluate how much has actually changed, a timeline on if or when we can make eager tasks default, figuring out how to explain this to others, or just checking your own code to see if you might need to double-check.
In terms of new places where the current task can change where it couldn’t before
all of the below in the “as of eager tasks” section
Any non-coroutine function, that does one of the new below while an event loop is running
Before eager tasks
await
async with (both enter and exit)
async for (per loop)
return (from a coroutine)
yield (from a coroutine)
Raising an exception (that leaves a coroutine)
As of eager tasks, the below are included
Constructing an asyncio.Task with eager_start=True while the associated event loop is running
asyncio.create_task
asyncio.ensure_future (when wrapping a non-future awaitable)
asyncio.wait_for[1] (when any non-future awaitables are waited on) (ensure_future)
asyncio.as_completed[1:1] (when any non-future awaitables are waited on) (ensure_future)
asyncio.gather[1:2] (when any non-future awaitables are gathered) (ensure_future)
asyncio.IocpProactor.accept (unlikely to impact most) (ensure_future on a coro)
Note: extremely unlikely to impact anyone, asyncio.create_subprocess_* won’t cause this as a new yield point as the task in the transport isn’t created prior to where there already was task switching.
triggers at construction, not a coroutine function, may not be meaningfully different for those immediately awaiting ↩︎↩︎↩︎