What's up with garbage-collected asyncio.Task objects?

fonini · July 14, 2023, 9:29pm

Anytime I wanted to fire-and-forget a coroutine, I have always used asyncio.create_task, and ignored the returned Task object. This has always worked fine.

I have just seen the warning on the docs for asyncio.create_task that if the Task object is gc’d, it could “disappear mid-execution”. What exactly does this mean? Just the task object itself would disappear, but the code would keep running? Or the code itself could actually simply stop suddenly?

This function was created on Python 3.7, but the warning only exists on the docs for 3.9 onwards. Also, loop.create_task doesn’t mention this shortcoming of Task management, and neither does the docs for the Task class itself. Is this a documentation bug? Is this shortcoming actually real? Do I really have to manually manage a global set of running task objects, as suggested in the docs (versions 3.10 onwards)?

Rosuav · July 14, 2023, 10:45pm

I used to do that too, and it seemed to work fine, but I had a bunch of weird problems - most notably, exceptions raised by those fire-and-forget coroutines wouldn’t be logged anywhere. So this is what I ended up going with:

def handle_errors(task):
	try:
		exc = task.exception() # Also marks that the exception has been handled
		if exc: traceback.print_exception(type(exc), exc, exc.__traceback__)
	except asyncio.exceptions.CancelledError:
		pass

all_tasks = [] # kinda like threading.all_threads()

def task_done(task):
	all_tasks.remove(task)
	handle_errors(task)

@export
def spawn(awaitable):
	"""Spawn an awaitable as a stand-alone task"""
	task = asyncio.create_task(awaitable)
	all_tasks.append(task)
	task.add_done_callback(task_done)
	return task

How much of this is necessary because of the warning you’re seeing, and how much is to make it easier to track down bugs, I’m not sure, but it’s worked out well so far.

fonini · July 15, 2023, 8:32pm

Strange – I thought the default asyncio exception handler already printed tracebacks. This prints a traceback for me:

async def some_task():
    raise ValueError()

loop = asyncio.get_event_loop()
loop.create_task(some_task())
loop.run_forever()

Anyway, thanks for your example. I like your spawn solution, which at least hides the all_tasks hack. (I can’t help seeing it as a hack. If I don’t care about error handling, and I’m fine with just printing tracebacks to stdout, then I shouldn’t have to manually track running tasks just to prevent the collection of the Task object stopping my code mid-execution.)

Finally, I still don’t get why there is no similar warning in the docs for the Task class and the loop.create_task.

Rosuav · July 15, 2023, 9:00pm

Not sure, it’s probably to do with some other complexities, but the main problem was that tracebacks - if they were printed at all - showed up during interpreter shutdown, instead of immediately. Given that the app in question was a GUI app that responded to button clicks, the debuggability of problems was materially affected by this.

da-dada · July 15, 2023, 11:21pm

could it be that the blocking io (printing) should be run in a thread pool executor

Rosuav · July 15, 2023, 11:29pm

Nope The example there is of independent I/O requests (a more realistic example might be “read all these files into memory”), but printing to the console is going to need to be serialized in order to keep output looking tidy - there’s only one stdout. So the best way to do this sort of output is to do it all on the same thread - and in fact, the project I’m talking about only has one primary thread for both GUI and regular operations (it spins off dedicated threads for other purposes but there’s one main thread for the event loop).

da-dada · July 16, 2023, 7:40pm

if the GUI loop & asyncio loop reside in the same thread how can the GUI be still responsive during the execution of an awaitable

fonini · July 16, 2023, 8:10pm

I believe the awaitable can return control to the event loop if what it’s awaiting for is OS-level stuff like sockets, mutexes, file descriptors, etc?

Rosuav · July 16, 2023, 11:30pm

That’s the entire point of awaitables! They don’t block. Obviously any sort of heavy computation would be a problem, but this particular app doesn’t do a lot of that; its primary purpose is an information broker, receiving information from any of several sources and sending it to a bunch of others, under human control. So the GUI is entirely responsive at all times. Every time the coroutine hits an ‘await’ point, it goes back to the event loop.

fonini · July 17, 2023, 2:00pm

I just noticed this. What’s export?

Rosuav · July 17, 2023, 2:08pm

Oh, not significant to the current discussion, just a hack to make modularization easier. Rather than get everything nicely namespaced all at once, it was simpler to inject key exports into the builtins, and go back and tidy up later.

But of course, you’re all programmers, so you know full well that “later” never arrived…

def export(f):
	setattr(builtins, f.__name__, f)
	return f

da-dada · July 17, 2023, 5:52pm

if in the awaitable the time between two ‘natural’ interrupts is (much) higher than getswitchinterval then, I think, separate threads for each loop would be more responsive (that seems to be the point of using an executor)

Rosuav · July 17, 2023, 5:59pm

In this particular case, I’m not sure there’d be any benefit. The app is exclusively event-driven (with the exception of one polling thread, and that IS separated from the main asyncio loop), so whether those events come from button clicks, socket connections, or the SSH pipe, they’re all just events to react to.