Enhance builtin iterables like list, range with async methods like aiter, anext,

Henkhogan · November 23, 2022, 8:53pm

I’d like the builtin iterables to support async for.

Why?

I’d like to be able to use e.g. async list comprehenions on normal lists like:

[_ async for _ in list(range(1,10))]

The only missing piece in this case is the implementation of the aiter method on the list class that can be done manually (but should be there out-of-the-box IMO):

import asyncio

class alist(list):
    async def __aiter__(self):
        for _ in self:
            yield _

async def amain():
    x = alist(range(1,10))
    y = [_ async for _ in x]
    print(y)

def main():
    asyncio.run(amain())

if __name__ == '__main__':
    main()

I see no reason why this should not be implemented on the builtin types

Thoufak · November 23, 2022, 9:27pm

I guess the use cases for this are very limited. And if so, there’s not much sense in adding this to the list class.

In fact, the only benefit of asynchronously iterating over a list I can think of is for compatibility with the code that expects asynchronous iterables:

async def foo(things):
    async for thing in things:
        ...

# We can now pass the list as an argument too.
foo([1, 2, 3])

merwok · November 23, 2022, 9:28pm

Changes need positive motivation to get implemented! What are the reasons why this should be implemented? What problem does it solve, what new things does it enable?

Henkhogan · November 23, 2022, 10:03pm

That’s exactly the point here. In my project I widely use async generators to fetch db record sets.
In some places I need to pass the same record set to multiple functions and to minimize the db roundtrips, I use async list comprehension to fetch the record set into a list and then pass the list to the functions. Therefore the functions are defined with an iterable as parameter.

However in other places I need to use only one of these functions and there I would like to pass the async generator directly to the function, but can not.

I hope this example illustrates it:

async def db_fetch(r: range):
    for _ in r:
        yield _

async def afunc_1_async(x):
    [print(_) async for _ in x]

async def afunc_2_async(x):
    [print(_**2) async for _ in x]


async def afunc_1_sync(x):
    [print(_) for _ in x]

async def afunc_2_sync(x):
    [print(_**2) for _ in x]

async def option1():
    #Good readability but 2 db rountrips
    #Scope A | 2 db roundtrips
    _agen = db_fetch(range(1,10))
    await afunc_1_async(_agen)
    await afunc_2_async(_agen)

    #Scope B | 1 db roundtrips
    await afunc_1_async(db_fetch(range(1,20)))

async def option2():
    #Bad readability
    #Scope A | 1 db roundtrips
    _list = [_ async for _ in (db_fetch(range(1,10)))]
    await afunc_1_sync(_list)
    await afunc_2_sync(_list)

    #Scope B | 1 db roundtrips
    await afunc_1_sync([_ async for _ in (db_fetch(range(1,20)))])

async def option3():
    #With list.__aiter__. Good readability and only 1 db rountrips
    #Scope A | 1 db roundtrips
    _alist = [_ async for _ in (db_fetch(range(1,10)))]
    await afunc_1_async(_alist)
    await afunc_2_async(_alist)

    #Scope B | 1 db roundtrips
    await afunc_1_async(db_fetch(range(1,20)))

zware · November 23, 2022, 10:10pm

I don’t think every iterator in the standard library should grow an __aiter__ method. However, I could get behind making the aiter builtin wrap synchronous iterators to make them asynchronous, something along the lines of this terrible implementation:

def aiter(iterable, /, *, wrap_sync=False):
    try:
        return builtins.aiter(iterable)
    except TypeError:
        if not wrap_sync:
            raise
    it = builtins.iter(iterable)
    class _ait:
        def __init__(self, it):
            self._it = it
        async def __aiter__(self):
            for i in self._it:
                yield i
    return builtins.aiter(_ait(it))

The wrap_sync argument might or might not be necessary; one could argue that if you’re passing a synchronous iterable to aiter, you probably want to iterate it asynchronously.

brettcannon · November 23, 2022, 11:51pm

There’s some precedent for this as iter() will wrap an object that defines __getitem__() in its own iterator that defines _iter__().

Henkhogan · November 27, 2022, 9:07pm

what’s the best way to move this further?

guido · November 27, 2022, 11:15pm

Open an issue on GitHub that refers to this thread. If it isn’t shot down there, submit a PR. Cross fingers.

I think it’s a sensible idea but I haven’t tried to seriously kick the tires yet.

ofek · February 1, 2023, 4:56am

I just ran into this use case and actually assumed aiter would do that. Was a PR ever opened?

guido · February 1, 2023, 5:44am

I don’t know, but I searched the issues list for ‘aiter’ and found nothing, so I’m guessing it wasn’t.

storchaka · February 1, 2023, 8:11am

I used the following helper function for tests:

async def aiter(iterable):
    for i in iterable:
        yield i

It was before introducing builtins.aiter() which has a different semantic.

Now we can merge two semantics in one function, but would not it make the code more errorprone? It can hide errors with accidental passing of synchronous iterator instead of asynchronous one. While not always fatal, such error can mean a flaw in asynchronous design of the program.

Henkhogan · February 1, 2023, 2:38pm

I just created the PR: Enhance builtin iterables like list, range with async methods like __aiter__, __anext__,... · Issue #101495 · python/cpython · GitHub

Henkhogan · February 1, 2023, 2:45pm

I can not imagine a concrete example where this causes a problem. If such problems really exist we might go for a new keyword like xsync

[_ xsync for _ in sync_or_async_iterable]

h-vetinari · November 23, 2023, 4:34am

It’s a pity that the issue got closed without action. I think it would be quite natural to allow:

async for <an_iterable>:
    ...

The iterable might be produced by a third-party API you don’t control – e.g. a list of messages from a broker – and then you just want to process them asynchronously, regardless of order. Currently this raises an error, but I don’t see why it has to.

Assuming it’s unpalatable to add __aiter__ to all sync iterators, IMO there should at least be an easy convertor – builtins.aiter would be a likely candidate, but calling that on a list also fails currently:

>>> builtins.aiter(["the", "order", "here", "is", "irrelevant"])
TypeError: 'list' object is not an async iterable

PS. There’s a lot of content being written about this in blog posts and books – it’s clearly something that many people run into.