Make asyncio eager task factory default

kumaraditya303 · December 27, 2024, 7:54am

Yes, it can be any other loop implementation which doesn’t inherit from BaseEventLoop although the most prominent one is uvloop.

dimaqq · December 27, 2024, 8:52am

From the user’s perspective, I think we’d be trading one footgun for another.

I’m not the expert, so let me contrast Python and Javascript. Javascript doesn’t make a distinction between functions (which may or may not return a Promise) and coroutines, which leads to a known footgun of Coro „unexpectedly” beginning to execute upon instantiation, which business logic is not prepared for, most notably if the code raises an exception before first await.

At the same time create_task(…).cancel() is a footgun too. Instead of thinking of lesser of two evils, could these be considered orthogonal?

Wrt. code running faster. I’d take that with a grain of salt, because with a subtle semantics change like this, it’s the same code but no longer the same outcome being compared.

I have certainly written asynchronous code that subtly relies on current create_task semantics, broadly speaking that „I don’t need to lock anything, if the resource use doesn’t span an await point”. If I understand the proposal correctly, that assumption will be void.

I’m happy to fix my code and likewise libraries can be fixed, but we’d also potentially be breaking a lot of existing application code.

Rosuav · December 27, 2024, 8:56am

Can you elaborate on why it is unexpected, or is this merely a distinction without either side being particularly “better” than the other? I agree that there’s a difference here, resulting in a difference of execution order; I don’t know that either one is necessarily better, but I do know that the distinction (whenever it’s actually significant) is a subtle one that’s not easy to debug.

Thus I will be switching any and all Python asyncio code of mine to use eager tasks sooner rather than later.

asvetlov · December 27, 2024, 11:17am

Yes, exactly. Third-party loop providers.
They can upgrade eventually, but I prefer not to break them.

asvetlov · December 27, 2024, 11:50am

No, Python has a different behavior. Even with eager tasks, the thrown exception doesn’t bubble up to a code called create_task(...). Exception handling requires await task or task.get_result() for both laze and eager modes.

dimaqq · December 27, 2024, 2:55pm

Unexpected in terms of business logic looks something like this:


state.task = asyncio.create_task(coro())

async def coro():
    myself = state.task
    …

In terms of locking, it’s something like this:

await foo()

# no need to lock here because no other async code may be run

bank.acct1 += 42
email1 = create_task(send_email())
bank.acct2 -= 42
email2 = create_task(send_email())

await gather(email1, email2)

The bank total is ensured with lazy task start, and IIUC is not with eager tasks.

It’s like the definition of „async code” is being changed from „body of async function” to „trace through async function starting from first await point and ending in return/yield/raise”. The latter is harder to reason about.

dimaqq · December 27, 2024, 3:00pm

For the JavaScript footgun I was referring to, see Automatic batching for fewer renders in React 18 · reactwg/react-18 · Discussion #21 · GitHub which required opt-in when behaviour was changed.

I’m not sure this translates directly to Python world, YMMV.

pf_moore · December 27, 2024, 3:09pm

Sorry, I’m coming to this late and don’t really understand the distinction between “eager” and “lazy” tasks, but my understanding is that there’s still no arbitrary context switching going on here - it’s just that you need to be sure that (the part before the initial await in) send_email doesn’t mess with your bank total. That’s exactly the same as you’d need in sequential code, so it doesn’t seem that surprising to me.

I do agree that it’s a disconcerting change - I’ve been used to the idea that create_task puts a task into the event loop, to run “when the current function yields control”, and that’s now changed to the more nuanced “to start immediately but yield back to the current code”, which requires some thinking to assure yourself it’s OK - and frankly isn’t as intuitive to me.

Of course, it’s possible I’ve completely misunderstood the distinction between “eager” and “lazy” here. But if so, I hope the announcement of the change when (if) it happens covers the new semantics clearly, with illustrative examples.

guido · December 27, 2024, 7:12pm

Hm, that’s a bit of a showstopper. The lazy semantics imply that each coroutine is its own critical section (multiple when using await) but the eager semantics merges the initial part of a coroutine with the coroutine that creates it.

That looks like a major reason to expect problems, and this is something that asyncio has always promised.

asvetlov · December 27, 2024, 7:44pm

With eager tasks the first chunk (from the beginning of the function to the first await) is also non-interruptable. You still can think that it is covered by critical section as well. The only difference is that the first chunk is executed early when the task is created.
Sometimes a code may want lazy behavior, it can be simulated by await sleep(0) very easily.
But from my experience the lazy request is very rare; the most code works well in both modes.

mikeshardmind · December 27, 2024, 9:01pm

I think the ability to avoid eager behavior on-demand may need to remain forever even if the default becomes eager.

When examining this at my day job, specifically during application shutdown, we set the task factory back to the default lazy one to have a reliable ordering that is independent from anything functions we might not control do. Being sure other tasks aren’t launched during several parts of shutdown is essential here, and we don’t want to replace the task factory with something that will error, we want everything needed to still run, just after shutdown is structured and each component is told to stop accepting new work.

Outside of that part of the application lifecycle, I can’t imagine a strong reason why it would be needed, but I wouldn’t be surprised if there are applications that rely on it.

As for making it the default, I think this should happen slowly. One of the big “selling points” of async/await was in the mental model people were taught. Other coroutines/tasks won’t be switched to unless the current coroutine yields or ends. This doesn’t entirely break that model, but it does bend it, and it’s going to take time for people to adapt their mental model to include “creating a task may run up until the first yield point within it”

asvetlov · December 28, 2024, 11:54am

Ok, guys.
I see a strong request to keep lazy tasks.

Thus, I can propose the following:

run() / Runner accepts new eager_tasks argument which is False by default.
The current behavior remains the default; if we want to switch it, we need to have a separate discussion in the future.
asyncio.create_task() and loop.create_task() accept eager_start argument to provide fine control on the created task color.

Opinions?

Tinche · December 28, 2024, 1:37pm

Folks are saying eager tasks might be breaking the contract of not switching tasks outside of suspension points. What about doing an async create_eager_task(or whatever) method, making it an explicit potential suspension point? The implementation wouldn’t need to actually change.

asvetlov · December 28, 2024, 1:52pm

Do you propose declaring create_eager_task() call an yield point along with await v expression?

Tinche · December 28, 2024, 1:57pm

I was thinking we make create_eager_task async so you would have to await it when using it. Hence introducing a suspension point, visually.

asvetlov · December 28, 2024, 2:01pm

Sounds good except the longer name.
If we decide to go this way we probably need a new method in Task group also.

Tinche · December 28, 2024, 2:12pm

Agreed. In my opinion TaskGroups are more important - I think a code base that uses task groups exclusively over asyncio.create_task is going to end up looking better in the long run, and end up doing structured concurrency by default. Maybe we can use this change to nudge folks towards task groups by making the api a little nicer there, and a little lower level in asyncio proper? Just brainstorming here.

asvetlov · December 28, 2024, 2:18pm

async def spawn(...) looks short and natural, isn’t it?

danh · December 28, 2024, 2:54pm

I find “eager” to be a bit jargony, and not so self-explanatory. A new user would have to look up what it means. create_and_start() or create_and_start_now() for me would express what’s going on more clearly.

Tinche · December 28, 2024, 4:29pm

spawn sounds good. Some other options:

async def start(...)  # Short, describes that the task will start right away
async def create_and_start(...)  # Sorts before `create_task` in autocomplete