Why does `asyncio.wait(<empty>)` raise ValueError?

udibar · July 7, 2024, 7:22am

Can someone please help me understand the rationale behind this?

My intuition suggests that just as sum([]) is 0 and math.prod([]) is 1 and for _ in []: runs 0 iterations, so should awaiting an empty set of tasks be equivalent to asyncio.sleep(0) and peacefully return two empty sets.

I have quite a bit of asyncio code the predates asyncio.TaskGroup, and manages a set of running tasks for a “worker” process in a manner similar to TaskGroup.
Specifically, upon shutdown, all tasks are canceled / signalled to exit and then awaited (this usually involves some to_thread tasks).

As the task set is dynamically created, in some scenarios (admittedly, mostly diagnostic ones) it happens to contain no tasks.
In such cases a piece of mission critical software can crash simply because that trivial possibility was not considered and the (annoying and verbose, IMHO) idiom

if tasks_running: 
    asyncio.wait(tasks_running)

was not used.

I can see why this might pose a logic problem with return_when=FIRST_COMPLETED. But with ALL_COMPLETED (the default) I don’t see the difference to all([]) being True.

JamesParrott · July 7, 2024, 11:33am

One’s intuition is often an unreliable guide.

If it’s mission critical, and the mission is important, then those checks must be done.

Is there really no other entry point in the entire asyncio library that does this for futures, e.g. asyncio.as_completed? Otherwise isn’t it possible to write your own wrapper function?

There may not be a rationale, there may have just been different authors who made different decisions. There are a huge number of considerations in creating an async library. Arithmetic helper functions by comparison are trivial - they could be one liners.

But to engage with your point, sum and math.prod can be thought of as calling functools.reduce of the binary operations of summation and multiplication. Both operations have identity elements, 1 and 0 (that leave the other arg unchanged no matter what its value). You happen to have picked commutative examples, which are trivially parallelisable, but reduction over non-commutative operations is not analogous to asyncio.wait. Function composition, and matrix multiplication for example, must be done in sequence, but the former still has an identity matrix, and for the latter a no-op could be constructed (for a set of composable functions with compatible signatures, args and return values).

asyncio.wait is not limited to sets of awaitables that come from reducing a binary operation, so it is difficult to select a natural identity element to associate with it, as tempting as it is to pick a no-op.

However implementing asyncio.wait using functools.reduce would defeat the entire purpose of the exercise (concurrency).

For others: this was unusually trying, to locate in the docs. It was easier to search the source code:

udibar · July 7, 2024, 10:04pm

Thanks for the explanation.

Intuition is indeed an unreliable guide to nature. But asyncio is man made and I think users’ intuition (and hence, expectations) should be a consideration in the design of a library.

Moreover, this intuition is not arbitrary. The examples given show how Python and its stdlib are consistently designed to save users the trouble and verbosity of testing for emptiness before operating on collections. There are many more examples.

I fail to follow the logic in the part about identity elements and reduction operations. AFAIK wait is a partitioning operation, not a reduction. It partitions the input set to a subset of completed tasks, and a subset of running tasks. For an empty input set, this partition yields two empty sets, always.
The other part of wait is when to return. I agree that if return_when is FIRST_COMPLETED, there might be some doubt, as users may expect the done set to be nonempty.

The purpose of my question is, of course, to suggest the possibility of changing this behavior - so that asyncio.wait([]) returns empty sets after yielding to the scheduler, rather than raising an exception.

I simply can’t see how this is incorrect, given that Python stipulates all(task.done() for task in []) == True.

I think this is appropriate also for return_when=FIRST_EXCEPTION, if we agree that any(task.exception() for task in []) == False (i.e. no task ended with an exception and all have completed).