Allow Return Statements with Values in Asynchronous Generators

MadcowD · October 6, 2024, 8:05pm

EDIT: We are currently looking for a sponsor for a potential PEP.

Introduction

Hello everyone,

I’ve been looking into Python’s asynchronous generators and noticed a notable difference compared to synchronous generators: asynchronous generators currently do not allow return statements with values. If you try to use return value within an asynchronous generator, it results in a SyntaxError. I believe that permitting return statements with values in asynchronous generators could enhance their functionality and bring them in line with synchronous generators. Before considering drafting a PEP, I wanted to open up a discussion here to gather your thoughts, insights, and gauge the community’s interest in this idea.

Motivation

In synchronous generators, it’s perfectly acceptable to use a return statement with a value. When the generator is exhausted, this value can be accessed through the StopIteration exception’s value attribute. This feature allows generators to convey a final result upon completion, which can be incredibly useful in various scenarios.

For example:

def gen():
    total = 0
    for i in range(5):
        total += i
        yield i
    return total  # Returns the sum of yielded numbers

g = gen()
for i in g:
    print(i)
try:
    next(g)
except StopIteration as e:
    print(f"Total sum: {e.value}")  # Outputs: Total sum: 10

However, when working with asynchronous generators, attempting to use return value raises a SyntaxError:

async def agen():
    total = 0
    for i in range(5):
        total += i
        yield i
    return total  # SyntaxError: 'async generator' can't have non-empty return value

I’d like to propose that we allow return statements with values in asynchronous generators. This change would enable them to return a final value upon completion, just like synchronous generators do.

Benefits

Allowing return statements with values in asynchronous generators would bring several advantages:

Consistency and Predictability: It would align the behavior of asynchronous generators with that of synchronous ones, making the language more consistent. This consistency simplifies the mental model for developers and eases the transition for those moving between synchronous and asynchronous code.
Expressive Power: With this change, asynchronous generators could convey a final result or status upon completion. This capability can be essential in many programming patterns, such as data processing pipelines where you might want to return a summary statistic after yielding a sequence of results.
Code Clarity and Maintainability: Developers would no longer need to rely on external variables, sentinels, or workarounds to pass back a final result. Using an explicit return value makes the code cleaner and more maintainable.

Use Cases

To illustrate the potential benefits, here are some practical examples where allowing return statements with values in asynchronous generators would be helpful:

1. Data Processing Pipelines

Consider an asynchronous generator that processes data chunks and needs to return a final aggregated result, such as a total count or checksum.

async def process_data(stream):
    total_bytes = 0
    iterator = stream.__aiter__()
    while True:
        try:
            chunk = await anext(iterator)
            total_bytes += len(chunk)
            yield chunk
        except StopAsyncIteration:
            break
    return total_bytes  # Proposed to be allowed

async def main():
    stream = async_data_stream()
    iterator = process_data(stream)
    while True:
        try:
            chunk = await anext(iterator)
            process(chunk)
        except StopAsyncIteration as e:
            print(f"Total bytes processed: {e.value}")
            break

In this example, the asynchronous generator yields data chunks for processing and then returns the total number of bytes processed. This approach avoids the need for external variables to keep track of the total.

2. Asynchronous File Reading with Summary

Imagine reading lines from a file asynchronously and wanting to know the total number of lines read after processing.

async def read_lines(file_path):
    line_count = 0
    async with aiofiles.open(file_path, 'r') as f:
        iterator = f.__aiter__()
        while True:
            try:
                line = await anext(iterator)
                line_count += 1
                yield line
            except StopAsyncIteration:
                break
    return line_count  # Proposed to be allowed

async def main():
    iterator = read_lines('data.txt')
    while True:
        try:
            line = await anext(iterator)
            process_line(line)
        except StopAsyncIteration as e:
            print(f"Total lines read: {e.value}")
            break

Here, the generator yields each line for processing and returns the total line count upon completion, providing a clear and direct way to access this final result.

3. Database Query with Aggregated Result

Suppose you have an asynchronous generator fetching records from a database and you want to return a summary, like the total value of a certain field.

async def fetch_records(query):
    total_value = 0
    iterator = database.execute(query).__aiter__()
    while True:
        try:
            record = await anext(iterator)
            total_value += record.value
            yield record
        except StopAsyncIteration:
            break
    return total_value  # Proposed to be allowed

async def main():
    iterator = fetch_records('SELECT * FROM sales')
    while True:
        try:
            record = await anext(iterator)
            process_record(record)
        except StopAsyncIteration as e:
            print(f"Total sales value: {e.value}")
            break

This pattern allows you to process each record individually while also obtaining an aggregate result at the end without extra steps.

Technical Considerations

To implement this feature, we would need to modify the StopAsyncIteration exception to include a value attribute, similar to StopIteration. The asynchronous iteration protocol would also require updates to handle the return value when the generator is exhausted.

One key aspect is ensuring backward compatibility. Existing asynchronous generators that don’t use return statements with values would continue to function as before. The change would be additive, and developers could opt-in to use the new feature as needed.

We’d also need to consider how asynchronous frameworks like asyncio handle these return values. Updating asyncio and other libraries to support the change would be part of the implementation process.

Potential Challenges

While this proposal offers benefits, there are some challenges to address:

Event Loop Adjustments: Event loops and asynchronous frameworks may need updates to support and propagate the return value from asynchronous generators. Ensuring these changes are seamless and don’t introduce regressions is important.
Error Handling: We need to be cautious with exception propagation to prevent unintended catching of StopAsyncIteration exceptions with a value in user code. Aligning with the principles established in PEP 479 regarding exception handling in generators would be essential.

Seeking Feedback & A Sponsor

I’m interested in hearing your thoughts on this proposal:

Do you see value in allowing return statements with values in asynchronous generators? Are there potential pitfalls or unintended consequences we should consider? Would this change enhance your experience when working with asynchronous code in Python?

Your insights and feedback will be invaluable in refining this idea before deciding whether to draft a PEP.

References

elis.byberi · October 6, 2024, 8:34pm

Here is an implementation that mimics the behavior of regular generator functions:

import asyncio

class StopAsyncGenerator(Exception):
    def __init__(self, value):
        self.value = value

async def agen():
    total = 0
    for i in range(5):
        total += i
        yield i
        
    raise StopAsyncGenerator(total)  # Raising custom exception with value

async def main():
    try:
        async for value in agen():
            print(value)
    except StopAsyncGenerator as e:
        print(f"Total sum: {e.value}")

asyncio.run(main())

JamesParrott · October 6, 2024, 8:49pm

I’d use this. And following Eli’s implementation, perhaps it only requires Sugar (I do not know the cost of Sugar in Python).

But all the examples given follow a reducer pattern. In any of them, why does the accumulator’s current value need to live inside the async generator? For my tastes, code reads cleaner and the intent is easier to fathom, if the accumulator state is in the ‘caller’ / outside loop. Generators should just return information, and assume as little as possible about what the client iterating over them wants to do with their yielded values. They should not even assume they will be consumed entirely.

Example 2 can be even be done, by simply wrapping with enumerate. Examples 1 and 3 can be done with the caller manually aggregating len(chunk) and record.value in an async for loop, instead of the somewhat more painful to read try/except.

I can see the importance of tracking a line count and data consumption is appreciated. But by the time the async generator is consumed entirely, it might be too late to do anything useful with that information like avoid filling up the user’s disk entirely, or avoid exceeding a cloud bucket limit. The proposal only assists with aposteriori reporting when all went well on the happy path.

MadcowD · October 6, 2024, 8:49pm

Hey @elis.byberi ! Thanks for the quick reply

While that does the trick technically, I think there are some downsides to this approach that are worth considering.

Using exceptions to control normal program flow isn’t really in line with Python’s philosophy with the exception of StopIteration. Exceptions are generally meant for error handling—not for returning values in regular operations. When you raise an exception to return a value, it can make the code harder to read and understand.

For example:

class StopAsyncGenerator(Exception):
    def __init__(self, value):
        self.value = value

async def agen():
    total = 0
    for i in range(5):
        total += i
        yield i
    raise StopAsyncGenerator(total)  # Raising custom exception with value

async def main():
    try:
        async for value in agen():
            print(value)
    except StopAsyncGenerator as e:
        print(f"Total sum: {e.value}")

In this code, anyone who uses agen needs to know about the StopAsyncGenerator exception and handle it appropriately. If they forget, it could lead to unexpected crashes or unhandled exceptions. This adds extra responsibility on the user of the generator and increases the chance of bugs.

On the flip side, if asynchronous generators allowed return statements with values—just like synchronous generators—it would make the code cleaner and more intuitive. It would look something like this:

async def agen():
    total = 0
    for i in range(5):
        total += i
        yield i
    return total  # Proposed to be allowed

async def main():
    try:
        it = agen()
        while True:
            value = await anext(it)
            print(it)
    except StopAsyncIteration as e:
        print(f"Total sum: {e.value}")

This way, the generator’s user doesn’t have to worry about catching a custom exception. The use of return clearly indicates that we’re returning a value upon completion, which is more readable and aligns with how functions generally work in Python.

So, while your solution works, I believe allowing return statements with values in asynchronous generators would provide a more elegant and Pythonic solution. It would enhance consistency between synchronous and asynchronous generators.

(for context this is how it works with normal sync generators right now:)

def gen():
    yield 1
    return "something else"

it = gen()
try: 
    while True: print(next(gen))
except StopIteration as e:
    print(e.value)

1
something else

I don’t see why we should have inconsitency between async generators and sync generators in python.

What are your thoughts on this?

elis.byberi · October 6, 2024, 8:58pm

Async/await are newer concepts, and personally, I don’t like or know how to explain why both yield and return exist in the same function. It complicates things unnecessarily. You can simply use yield instead of return without relying on exception values, which is not ideal for control flow.

MadcowD · October 6, 2024, 9:07pm

I get that having both yield and return in the same function can seem a bit messy. But since PEP 225, synchronous generators have allowed return statements with values—it’s part of Python now, whether we like it or not.

The inconsistency arises because asynchronous generators don’t allow return with a value; they raise a SyntaxError if you try. We can’t remove return from synchronous generators without breaking existing code. So, wouldn’t it make sense to make asynchronous generators consistent with synchronous ones by allowing return statements with values?

Even if the idea isn’t perfect, having consistency across the language could reduce confusion and make it easier for everyone to understand and use generators effectively.

Like don’t you agree that it’s bad that, this currently works in Python:

def gen():
    yield 1
    return 42  # Allowed in synchronous generators

g = gen()
try:
    while True:
        print(next(g))
except StopIteration as e:
    print(f"Generator returned: {e.value}")  # Outputs: Generator returned: 42

but this doesn’t

async def agen():
    yield 1
    return 42  # SyntaxError: 'async generator' can't have non-empty return value

...

elis.byberi · October 6, 2024, 9:27pm

(In line with @JamesParrott previous post)

Generators don’t need to be both a producer and a consumer at the same time; a generator should only be a producer.

Additionally, you don’t need to use try/except with the current async generator syntax.

import asyncio

async def ticker(delay, to):
    for i in range(to):
        yield i
        await asyncio.sleep(delay)

async def run():
    async for i in ticker(1, 10):
        print(i)

# Entry point to run the asyncio event loop
if __name__ == "__main__":
    asyncio.run(run())

JamesParrott · October 6, 2024, 9:35pm

Sure. That’s good to know.

I’m saying the try/except is needed to extract the return value from a normal generator. In my opinion, this wasn’t the best idea in the first place, but it and the send machinery wasn’t added for no reason.

Nonetheless, without better justification, I wonder if return values from normal generators should be discouraged, instead of developed further.

MadcowD · October 6, 2024, 9:40pm

@elis.byberi

I think there’s a bit of confusion about how generators work in Python.

First off, generators can be both producers and consumers, thanks to PEP 342. This introduced the send() method, allowing generators to receive values from the caller. So yes, generators can consume inputs.

Check out this example:

def accumulator():
    total = 0
    while True:
        value = yield total
        if value is not None:
            total += value

gen = accumulator()
print(next(gen))       # Initializes the generator, outputs: 0
print(gen.send(5))     # Adds 5, outputs: 5
print(gen.send(10))    # Adds 10, outputs: 15

Here, the generator both consumes values via send() and produces a running total. So it’s definitely doing double duty.

As for needing try/except blocks with generators, you don’t actually need them for normal iteration. The for loop (and async for loop) automatically handles StopIteration and StopAsyncIteration exceptions internally.

For example, with a synchronous generator:

def gen():
    yield 1
    yield 2
    return 42  # The return value is captured in StopIteration

for value in gen():
    print(value)

This prints:

1
2

No try/except needed there. The for loop takes care of the StopIteration exception when the generator finishes.

But if you want to get that return value from the generator’s return statement, you can catch the StopIteration exception:

g = gen()
try:
    while True:
        value = next(g)
        print(value)
except StopIteration as e:
    print(f"Generator returned: {e.value}")  # Outputs: Generator returned: 42

Similarly, with asynchronous generators (if they allowed return with values), the async for loop handles StopAsyncIteration exceptions internally:

async def agen():
    yield 1
    yield 2
    return 42  # Proposed to be allowed

async for value in agen():
    print(value)

Again, no try/except needed for normal iteration.

But if you wanted that return value:

async def main():
    g = agen()
    try:
        while True:
            value = await g.__anext__()
            print(value)
    except StopAsyncIteration as e:
        print(f"Asynchronous generator returned: {e.value}")

import asyncio
asyncio.run(main())

So, the point is, generators (both sync and async) can consume and produce values, and you don’t need try/except blocks unless you’re specifically trying to access the return value at the end.

@JamesParrott I don’t think “not liking the idea of return statements” in generators should discourage us from trying to at least make python more consistent. I’m all for a separate discussion of removing StopIteration.value from python entirely, but this would break backwards compatibility so I don’t think that’s going to happen. If it’s a part of the language we should at least try and make it a consistent part of the language.

What do you think?

JamesParrott · October 6, 2024, 9:42pm

What do you think?

I think consistency for an esoteric feature that leads to buggy code, is poor justification for the amount of work needed to change the language’s syntax.

Also I think .send was a catastrophic violation of “There should be one-- and preferably only one --obvious way to do it.” It’s useful to create a separate scope admittedly, but if .send is needed, it can be written better by putting the logic in the outer loop.

MadcowD · October 6, 2024, 9:43pm

This doesn’t change the syntax, it removes a pointless SyntaxException that was unjustified in the original asynchronous generator’s PEP. and I’ve already drafted the changes to cpython/Objects/genobject.c at main · python/cpython · GitHub to make this happen.

It’s at most a 40-60 line change: ± some additional lines.
You can find a draft here: patch · GitHub

@JamesParrott I understand why you’re upset about these features, but there are users and I don’t see how making them consistent introduces more bugs than would otherwise be introduced by people using work arounds (for the exact counter argument I gave to @elis.byberi 's original post). This standardizes a pattern (that you may not like but already exists in Python) for async generators.

elis.byberi · October 6, 2024, 9:57pm

We should not remain trapped in the past, such as using return to raise an exception. Why is that? What’s wrong with using raise? Users have to read the documentation and understand that returning will raise an exception. What happened to the principle that explicit is better than implicit?

alex-dixon · October 6, 2024, 10:05pm

We appreciate your feedback. We would prefer discussion relating to whether “iteration done” is a value or an exception be moved to a separate discussion.

The proposal we would like discussed here is simply whether Async Generators should have the same type as synchronous Generators:

Generator

Generator[YieldType, SendType, ReturnType]

Async Generator

AsyncGenerator[YieldType, SendType]

Proposal

AsyncGenerator[YieldType, SendType, ReturnType]

MadcowD · October 6, 2024, 10:17pm

@elis.byberi I see your point about being explicit. In Python, return is the standard way to signal that a function has completed its work, and this applies to generators too. While return in a generator does raise an internal exception (StopIteration or StopAsyncIteration), this is an implementation detail that most users don’t need to worry about. Using return keeps the code intuitive and aligns with how functions naturally communicate completion, without overloading raise, which is typically used for errors.

By allowing return statements with values in asynchronous generators, we make them consistent with synchronous generators and the rest of the language. This consistency can reduce confusion and improve readability, as developers can rely on familiar patterns across different types of functions. It upholds the principle of “explicit is better than implicit” by using return to clearly indicate completion and provide a final value, rather than requiring the use of raise in a way that might feel unnatural for normal operation.

On another note, I also agree with @alex-dixon this discussion is not relevant to the proposal. I’m looking for concrete reasons as to why the AsyncGenerator spec should not match the Generator spec; not whether or not we like the Generator spec itself, so let’s keep those discussions elsewhere.

To summarize so far:

Argument Against: It’s not worth the effort into making an “esoteric” feature consistent in the language;
- Counter: I’ve already implemented the changes they are small and backwards compatible.
Argument Against: We shouldn’t invest into making a feature we don’t like in Generator consistent because it means we don’t regret the Generator feature
- Counter: Even if we don’t fully like the Generator feature, making AsyncGenerators consistent with it reduces confusion and errors, enhancing the overall usability of the language without implying we endorse the original feature.

Am I missing anything?

elis.byberi · October 6, 2024, 10:39pm

You are not consistent with the OP’s post title. Please create another thread.

@MadcowD, calling our arguments ‘complaints’ is rude; you should refrain from doing so. You are not consistent with our Community Guidelines.

MadcowD · October 6, 2024, 10:42pm

@elis.byberi No offense was intended I’ll edit it to say “Argument Against” instead of “Complaint” if that helps.

MadcowD · October 6, 2024, 11:02pm

What do you think so far given my last response? To add to the discussion people can use packages like API documentation — async_generator 1.10+dev documentation to make their sync generators async, of course this would lead to the promulgation of things like yield_:

from async_generator import async_generator, yield_, yield_from_

@async_generator
async def agen1():
    await yield_(1)
    await yield_(2)
    return "great!"

But I’m not sure that “patching” yield like this is really what we want to specify as the cannonical implementation.

Let’s put ourselves in the shoes of someone who wants to support both async and sync generators in a library and a user of that libary.

Scenario:

You’re a developer using a library that provides both synchronous and asynchronous generators. The library aims to offer a consistent API for both, but due to language limitations, the asynchronous generators cannot use return statements with values.

Library Implementation:

Sync Generator:

def sync_generator():
    yield 1
    yield 2
    return "Completed"

Async Generator (cannot use return with value):

Since the library can’t use return with a value in the async generator, they resort to a workaround using a custom exception.

class AsyncGeneratorReturn(Exception):
    def __init__(self, value):
        self.value = value

async def async_generator():
    yield 1
    yield 2
    raise AsyncGeneratorReturn("Completed")  # Workaround since `return` isn't allowed

Consumer’s Perspective:

As a consumer of the library, you want to use these generators in your code.

Using the synchronous generator is straightforward:

for value in sync_generator():
    print(value)

1
2

No issues here—the for loop handles StopIteration internally, and you get the expected output.

Using the asynchronous generator:

You might expect to use it like this:

async for value in async_generator():
    print(value)

But this results in an unhandled exception:

1
2
Traceback (most recent call last):
  File "script.py", line X, in <module>
    async for value in async_generator():
  File "script.py", line Y, in async_generator
    raise AsyncGeneratorReturn("Completed")
AsyncGeneratorReturn: Completed

Problem:

The async for loop doesn’t handle the custom AsyncGeneratorReturn exception.
As a consumer, you now have to be aware of this custom exception and handle it explicitly.
This deviates from the standard usage of async for loops and complicates your code.

Consumer’s Code with Exception Handling:

To handle the custom exception, you need to modify your code:

try:
    async for value in async_generator():
        print(value)
except AsyncGeneratorReturn as e:
    print(f"Async generator returned: {e.value}")

Output:

1
2
Async generator returned: Completed

Now, you have to:

Know that async_generator uses a custom exception.
Write additional try/except blocks in your code.
Handle this differently from how you handle synchronous generators.

You could possibly solve the above by saying “let’s remove returns entirely”, but there are legitimate uses for StopIteration’s in generators and now instead of the consumer using the standard “AsyncStopIteration” object they now have to read the library source code to learn what special exception is raised in order to return final values from a generator & for the library itself to be consistent across its sync & async implementations…

@elis.byberi @JamesParrott What would you propose as a solution for this problem?

MadcowD · October 7, 2024, 12:29am

@pf_moore Do you know if Yuri is still an active maintainer? Maybe he could provide some input

JamesParrott · October 7, 2024, 7:44am

No problem. I’m not upset. You’ve provided a clear explanation, a prototype implementation and a draft PEP - all fantastic.

What would you propose as a solution for this problem?

The problem statement is misleading. It is not an apples versus apples comparison. try/ except blocks are still required to extract return values from synchronous generators, as you yourself pointed out in your original post:

def gen():
    total = 0
    for i in range(5):
        total += i
        yield i
    return total  # Returns the sum of yielded numbers

g = gen()
for i in g:
    print(i)
try:
    next(g)
except StopIteration as e:
    print(f"Total sum: {e.value}")  # Outputs: Total sum: 10

I’m not sure how well it works with the async machinery (doesn’t async use generators all the way down under the hood anyway?), but if more state is required from an iterable than some of its yielded values (i.e. if more is required than a generator with no return), then I would prefer to see a class implementing the iterator protocol. Then I can read the methods, test, tinker with, and debug instances, of that class to my heart’s content. No try/except blocks required.

[edit] the async equivalent is analogous, even if the docs haven’t named it a protocol: 3. Data model — Python 3.12.7 documentation

MegaIng · October 7, 2024, 11:44am

This is not true:

def gen():
    total = 0
    for i in range(5):
        total += i
        yield i
    return total  # Returns the sum of yielded numbers
def main():
    res = yield from gen()
    print("Result =", res)

print(list(main()))

I am not sure if similar syntax/semantics can be applied for async generators, but the value provided via return is easily accessible without messing with exceptions for sync generators in at least some contexts.