Async iterator for APIs

I have an API which I use to get a bunch of records. Simply calling the api getRecords returns the first n records in a list. Calling getRecords(prevOutput) returns the next set of of n or less records, depening on how many there are, otherwise None.

So an easy way to get all records is:

def getAllRecords():
    records = list()
    result = getRecords()

    while result:
        records.extend(result)
        result = getRecords(result)

    return records

This function works great, except for getting a set of records is takes time. Thus, the function runs for about 5 minutes for about 1k records without returning anything. During this time the program just waits for this function to finish running.

Implementing this in an async way would be great and is possible with an async requests library, thus the entire program wouldn’t halt and other tasks can be executed.

However, I don’t have other tasks to execute and would rather just want to optimize this function. I would like the create my own async generator or async iterator. Thus, I could use it as:

async for record in getAllRecords():
    print(record)

When the for starts, there would obviously be some delay, but the once the first set of n records are received, I can process them while the next set is fetched using the first set.

So I would need something like this:

def getAllRecords():
    records = list()
    result = getRecords()

    while result:
        next_set = await getRecords(result)

        for record in result:
            yield record

        result = next_set

The problem however, is that getRecords which is from a library is not async, not the function itself, but it is using async requests in the background. Thus I cannot await it.

When I did some hacking with asyncio creat_task and loops, I could get it to work, for the first list. But the async requests library then raises a NetworkException with an error that reads: New thread was created.

So it seems that, initializing the API library and then calling its requests in a different thread is the problem.

Is it possible to call the request in the main thread and do the async return in a different? Should I implement a generator or iterator class rather than trying to do it with a generator function as I would have more control over the state and how the next function would work?

Or is the solution more advanced and I should rather use threads with some shared state in between the threads?

Hey, Erik!
It seems that you can’t speed up this neither using threads nor async because you need to get previous list to get the next one.
So in this case when you call result = await getRecords() you wait for getRecords to finish. So it’s equivalent to just result = getRecords().

1 Like