Make asyncio done callbacks consistent between Futures and Tasks

With asyncio tasks, the done callbacks are called by the time they are awaited. However, a future’s callbacks are instead scheduled separately on the event loop with call_soon. So by the time a future has been awaited, its done callback has not been called.

Having used tasks more frequently, I found this really unintuitive. The problem I ran into is an event loop that was starved by many already-complete futures. Even though I’m calling await future, its not actually awaiting anything and going back to the event loop; so all those done callbacks kept piling up and eating up memory.

My proposal would be to make futures resolve their callbacks whenever the future is first awaited. That gives the same behavior as tasks, its intuitively how we think of await (e.g. all processing, including callbacks have completed), and doesn’t have the problem with callbacks piling up.

Simple example of the inconsistency:

import asyncio

async def main():
	# using future
	future = asyncio.get_running_loop().create_future()
	future.add_done_callback(lambda r: print("future done callback"))
	print("future processed")
	await future
	print("future awaited (1)")
	await future
	print("future awaited (2)")
	# using task
	async def coroutine():
		print("task processed")
	task = asyncio.create_task(coroutine())
	task.add_done_callback(lambda r: print("task done callback"))
	await task
	print("task awaited (1)")
	await task
	print("task awaited (2)")


future processed
future awaited (1)
future awaited (2)
future done callback
task processed
task done callback
task awaited (1)
task awaited (2)

…the callback is called as soon as the task is done, regardless of whether you have awaited the task or not.

So, the task need not be awaited; that’s why it is executed soon after it is created. On the other hand, the future needs to be awaited, and as per scheduling of the callback, I believe it’s the right choice to schedule them to be executed in the event loop’s next iteration. That will help make the program non-blocking.

I’m not sure “we” think of await this way necessarily. I think of callbacks as being cleanup work that should happen soon after the task/future is done but with no guarantee as to absolute ordering. If I wanted to guarantee ordering, I would expect to wrap the task/future in another task/future that guaranteed the “callback” work was done.

Would wrapping solve your issues? Without knowing what you are placing in your callbacks it’s hard to know in the abstract why you are expecting them to work this way.

Alright, it looks like I was mistaken… futures and tasks are consistent after all.

I had assumed a task’s done_callbacks were called in a blocking manner immediately when the task completes. I had observed behavior consistent with that assumption and made a generalization that that was always the case. Then when using futures (which are optimized to not yield to the event loop on await), I observed done_callback being called in the non-blocking call_soon manner. But I did some more precise tests now and I’m seeing tasks are calling their callbacks using call_soon as well (probably using futures internally I guess).

@alicederyn Here was the issue I ran into, essentially something like this:

# data listener
def next_data()
    data = loop.create_future()
    # immediately available? (branch often taken)
    if data_already_queued:
    return data

# data consumer
while True:
   await next_data()

The await next_data() doesn’t yield to the event loop, since data is already available. So the done callback (freeing resources) is starved, never run, and memory usage keeps increasing. The code was written on the assumption that done_callbacks get called by the time the future is awaited, which is why we get the problem. The code’s already been altered to fix the issue, so no need to troubleshoot. I was mainly posting to mention this behavior was unintuitive and seemed (falsely) to be inconsistent with Task behavior…

1 Like

Thanks for the update!