Enable dictionary constructor to compute values concurrently

The following code takes 4 seconds to run. If the dictionary constructor compute its values concurrently, it only takes 2 seconds.

import asyncio
import time

async def main():
    start = time.time()
    d = {
        'a': await asyncio.sleep(2),
        'b': await asyncio.sleep(2)
    }
    end = time.time()
    print(f'elapsed time: {end - start} seconds')

asyncio.run(main())

Actual output:

elapsed time: 4 seconds

Expected output:

elapsed time: 2 seconds

Currently this can only be achieved using asyncio task groups:

import asyncio
import time

async def main():
    start = time.time()
    d = {}
    async with asyncio.TaskGroup() as tg:
        d['a'] = tg.create_task(asyncio.sleep(2))
        d['b'] = tg.create_task(asyncio.sleep(2))
    end = time.time()
    print(f'elapsed time: {end - start} seconds')

asyncio.run(main())

I’m proposing to somehow have the dictionary constructor itself automagically behave the same way.

Async functions are designed to be more general than just asyncio:

The dictionary syntax has no way of knowing what sort of task group to concurrently run the operations in. Eg should it pick an anyio TaskGroup, or a trio Nursery or an asyncio.TaskGroup or even a twisted deferred list.

The async function might not even be used with a concurrency framework and could just be running synchronously and never yielding

There’s an issue in your code, it should be more like this:

import asyncio
import time

async def async_dict(**kwargs):
    v = {}
    async def run(k, fn):
        v[k] = await fn()

    async with asyncio.TaskGroup() as tg:
         for k, fn in kwargs.items():
             tg.create_task(run(k,fn))

    return v

async def main():
    start = time.time()
    d = await async_dict(
        a=lambda: asyncio.sleep(2),
        b=lambda: asyncio.sleep(2),
    )
    end = time.time()
    print(f'elapsed time: {end - start} seconds')

asyncio.run(main())
1 Like

Thomas is right. I wonder if the original idea is confusing concurrency, i.e. suspendability, with parallel computing (the first is necessary for, but not equivalent to, the second). Automagically grabbing multiple cores behind the scenes to form a dictionary, is a resource hog, and seems like it could lead to subtle bugs.

The number of cores, and the parallel execution engine code is being run on, should be first class concerns of the developer, not guessed at by a language implementation.

Anyway, it occurred to me that dictionary comprehensions do actually support async for (see below). However when run each key value pair effectively blocks the next - it takes ~4 seconds. But just illustrates the point - the parent co-routine itself is the thing in Python, that is supposed to be suspended or run when something else is running. Not the dict literal (it is suspendable though). Automagically grabbing multiple cores, from within a coroutine that something higher up is allocating cores to, would inevitably be problematic to debug.


import asyncio
import time
import string

class SleepyAsyncIterator:
    def __init__(self, iterable):
        self.iterator = iter(iterable)

    def __aiter__(self):
        return self

    async def __anext__(self):
        try:
            item = next(self.iterator)
        except StopIteration:
            raise StopAsyncIteration
        else:
            await asyncio.sleep(2)
            return item
       

N=2
KEYS = string.ascii_lowercase[:N]
        
        
async def main():
    start = time.time()
    d = {k : v 
         async for k, v in SleepyAsyncIterator(zip(KEYS, range(N)))
        }
    end = time.time()
    print(f'elapsed time: {end - start} seconds')
    print(f'{d=}')

you can use asyncio.as_completed to get results from multiple asyncio futures in the order they are done within a dict comprehension, but if the intended use is to just create a dict and then move on, gather then building the dict is simpler because results are in the same order they went in (no need for a lookup or for the coroutine to pass back both the key and result), and awaiting the gather results in not using the dict somewhere prior to populating it.

2 Likes

It is not the dictionary constructor that computes values or awaits coroutines.
Your code is equivalent to:

import asyncio
import time

async def main():
    start = time.time()
    a = await asyncio.sleep(2)
    b = await asyncio.sleep(2)
    d = {
        'a': a,
        'b': b
    }
    print(d)
    end = time.time()
    print(f'elapsed time: {end - start} seconds')

asyncio.run(main())

Should a and b be computed concurrently?