Allow Return Statements with Values in Asynchronous Generators

elis.byberi · October 7, 2024, 11:54am

yield from is not implemented:

Precisely because the PEP doesn’t plan to add ‘yield from’ support to async generators. And non-empty return makes only sense when you have ‘yield from’.

mikeshardmind · October 7, 2024, 11:57am

you might be interested in the history of generators as coroutines for understanding why the answer to this is no, why coroutines look so similar to synchronous generators , and why StopIteration having a value was ever useful, but why even within what it is useful for, it’s not something the average python user is expected to interact with

As for this proposal, frankly, I don’t think it adds anything worth designing in this way, and I’m concerned about the way this interacts with known deficiencies of asynchronous iterators as the only place I can imagine wanting something like this involves a stateful resource and returning something else at some point after iteration, but before cleanup of that resource.

Many of the examples here are already implementable in a way without these concerns by doing something along this pattern:

async with something_stateful() as ctx:
    async for thing in ctx.paginate(...):
        ...
    await ctx.get_stats()  # thing that would otherwise be returned

There’s a few reasons this pattern is preferable

statefulness is kept to the external context manager, which is the only way to ensure deterministic resource cleanup in the current landscape. It also won’t break if you break out of iteration (see link above about known deficiencies)
This allows, without mandating, deferring final calculation to when it is needed as long as the context manager is designed that way
It doesn’t expose low level control flow exceptions as a “typical experience”

MegaIng · October 7, 2024, 11:57am

And from the next comment:

It is useful, no doubt, but it’s very hard to implement given the current generators architecture in CPython. I don’t think we have time to do it properly before 3.6. So I decided to do one thing at a time – do async generators for Python 3.6, and maybe do async yield from for 3.7.

So, the obvious question is: Should yield from be implemented for async generators? If, and only if, the answer is yes, return values from async generators should also be implemented.

I don’t know the answer to this, and don’t really have a stake in this discussion - I just wanted to point out the missing discussion about why return values for sync generators are useful.

elis.byberi · October 7, 2024, 12:05pm

I would add that it would be very hard to use as well. Having both async for and yield from would be very confusing.

I believe generators are too low-level and shouldn’t be mixed with async/await statements.

JamesParrott · October 7, 2024, 12:39pm

I didn’t know that - that’s really cool. Thankyou!

main is also a generator though. To extract the return value of a generator, outside of another generator, isn’t try/except still required (or setting a global/nonlocal to it etc.)?

MegaIng · October 7, 2024, 12:46pm

Yes, but that is not really an intended usecase of this feature - the purpose was for generator chaining, and I honestly am not convinced that manually catching StopIteration is a good idea even for sync generators - Just use something like the pattern @mikeshardmind suggested.

Or we can come back to suggests that allow you to get the return value from a for loop, e.g.

for acc in accumulate() as total:
    ...

or something. But that is a different topic that has been discussed a few times already IIRC.

MadcowD · October 7, 2024, 9:05pm

My use case for this would a bit of an odd one. I want to implement a library of decorators that allow users to use this really slick syntax to interact with an LLM in a multistep way.

@use_lm("gpt-4o")
def multi_step_promopt(problem : str = "1+1"):
      response_from_lm = yield "hey llm how is it going " #" good and you"
      second_response_from_lm = yield f"could you calculate {problem} and format your answer as Answer: "
      # Here the messages are automatically accumulated behind the scenes (so "hey ...", "good and you", "could you" are in the context of the llm on the second yield)
      return second_response_from_lm.split("Answer: ")[-1]

print(mult_step_prompt()) # 2

The reason I want it to all live in a function like this and use the yield syntax and so on is because of the version control of individual fucntions we’ve developed as a part of https://docs.ell.so/ and especially Versioning & Tracing | ell documentation which involves versioning a users calls to an LLM as they do prompt engineering by computing the source code of the lexical closure of a user’s prompt function and storing that.

With the current implementation of generators I can implement @use_lm for the sync case, but I cannot implement it for the async use case.

This use of generators might look horrid to you but its a type of sugar that drastically reduces the amount of pain in the AI/prompt engineering community while very much so increasing the readability of multi-step interactions with LLMs. (We’ve seen a ton of interest at least in our standard decorator library with 4k stars in the past two weeks and I’m considering introducing this generator style prompting method but this arbitrary difference between async and non async generators would require me to implement a different solution for async users. I’m happy to do hack it out, but again I just see a clear path here to make async and sync generators have parity.

MadcowD · October 7, 2024, 9:42pm

Here’s another example where you might actually want to know the reason for termination…

async def test_async():
    yield 1
    yield 2
    yield 3
    if (some_condition:= random.random() < 0.5):
        return ("terminated early") #not allowed
    yield 4
    yield 5
    return "completed"


def test_sync():
    yield 1
    yield 2
    yield 3
    if (some_condition:= random.random() < 0.5):
        return ("terminated early") # allowed
    yield 4
    yield 5
    return "completed"

Of course you can do this with iterators, but then if I’m thinking of generators as bi-directional channels then you could come up with better example of this.

MegaIng · October 7, 2024, 9:45pm

terminated early is exactly what exceptions are perfect at signaling. Just use raise TerminatedEarly("some extra detail"). Your suggestion currently is not use distinct exception types/control flow for different situations but to use the same tool. Why? If it already requires having a try except block, why not use the power that syntax actually gives you?

MadcowD · October 7, 2024, 9:55pm

So instead of using the existing StopAsyncIteration exception that already exists in python (and which return in a generator is equivalent to) you want people to implement a MyStopAsyncIteration exception and raise it within the async generator.

What if I want the user of my generator to still be able to use a for loop and ignore the early termination signal without having to know the this special exception I’m raising. Isn’t the whole point of StopIteration so that standard consumers of iterators like for will handle StopIteration gracefully?

One motivation for generators was that:


class TestSyncIterator:
    def __init__(self):
        self.count = 0
        self.some_condition = None

    def __iter__(self):
        return self

    def __next__(self):
        self.count += 1
        if self.count <= 3:
            return self.count
        elif self.count == 4:
            self.some_condition = random.random() < 0.5
            if self.some_condition:
                raise StopIteration("terminated early")
            return self.count
        elif self.count == 5:
            return self.count
        else:
            raise StopIteration("completed")

is equivalent to

def test_sync():
    yield 1
    yield 2
    yield 3
    if (some_condition:= random.random() < 0.5):
        return ("terminated early") # allowed
    yield 4
    yield 5
    return "completed"

and yet there is no async equivalent to


class TestAsyncIterator:
    def __init__(self):
        self.count = 0
        self.some_condition = None

    def __aiter__(self):
        return self

    async def __anext__(self):
        self.count += 1
        if self.count <= 3:
            return self.count
        elif self.count == 4:
            self.some_condition = random.random() < 0.5
            if self.some_condition:
                raise StopAsyncIteration("terminated early")
            return self.count
        elif self.count == 5:
            return self.count
        else:
            raise StopAsyncIteration("completed")

And chiefly that a downstream user doesn’t need to think about the termination condition if they don’t want to they don’t have to try catch the for loop if they don’t want to. If (optionally) you wanted to look at the StopIteration exception you could by converting your consumption to a try while loop or yield from

for value in test_sync():
    print(f"Value: {value}")
print("Iterator exhausted")

# you can do this if you're interested but it's not required which I'm guessing is whole point of StopIteration & AsyncStopIteration
x = test_sync()
try:
    while True:
         print(f"Value {next(x))
except StopIteration as e:
     print(e..value)

I’m not articulating my point really clearly but do you sort of get what I mean? Now people have to check the source of the generator to see if a special MyCustomAsyncStopIteration exception is raised instead of being able to treat it like a normal iterator as with the sync counterpart.

What’s confusing to me is that typeshed/stdlib/builtins.pyi at dbd0d3521745288a3b2e345d8683bb9539d28d60 · python/typeshed · GitHub we’ve already got a “value” property on StopAsyncIteration… in python it just happens to be useless for generator induced iterators vs their class based counterparts.

alex-dixon · October 7, 2024, 11:47pm

Whether generators should be able to return a value has been asked from the start:

The BDFL Proclaimed they should.

We ask only why not the same for async generators.

bluetech · October 8, 2024, 9:57am

In pluggy, we’ve been contemplating adding support for async (currently not supported). One feature of pluggy is “hook wrappers”, which are special hook implementations which allow a plugin to wrap and/or intercept a call to some hook. For sync, it looks like this:

@hookimpl(wrapper=True)
def my_hook():
    """Wrap calls to ``my_hook()`` hook implementations."""
    print("This runs before the hook implementations")

    # All corresponding hook implementations are invoked here
    # result is the return value of the hook so far.
    result = yield

    print("This runs after the hook implementations")

    # The wrapper can modify the result by returning a different value.
    return result

For async, we would (probably) want it to work the same with the return, so the proposed feature would be nice to have.

elis.byberi · October 8, 2024, 10:19am

That doesn’t help! Note that you also need to be very precise and concise in your writing. You should be as brief as possible, ensuring no ‘lightbulb moment’ happens as the reader goes through it, allowing them to quickly return to their work.

Note that you don’t need to repeatedly enforce the same argument. Participants are free to make their points, but these points are not decisive. They help explore edge cases, and sometimes the points are made just for the sake of completeness.

I gave you a workaround in my first post because, in the best-case scenario, you could use the proposed feature after a year, in Python 3.14. Are you willing to wait a year? You also need to create a compelling (and well-articulated) use case, and not rely on consistency as the basis of your argument since these are two different technologies—one is synchronous generators, and the other is asynchronous generators.

The mere fact that there has been no interest in implementing all synchronous generator features into the asynchronous ones would indicate that these features are not highly desired or have been replaced by asyncio asynchronous frameworks.

However, if you present a compelling use case and explain why the current async framework does not fulfill it, this may certainly spark interest among other developers as well.

MadcowD · October 8, 2024, 5:48pm

Note that you don’t need to repeatedly enforce the same argument. Participants are free to make their points, but these points are not decisive. They help explore edge cases, and sometimes the points are made just for the sake of completeness.

I’m not enforcing the same argument every single time?

It’s been different each time? I showed you why an exception workaround would be inconvenient to implementers of a library, I then made an argument about how async iterators support this but not async generators which is a seperate kind of inconsistency.

I gave you a workaround in my first post because, in the best-case scenario, you could use the proposed feature after a year, in Python 3.14. Are you willing to wait a year?

Of course I’m willing to wait a year for this feature, I didn’t come here to solve my problem immediately, I came here because I saw a deficiency in the language and thought I could help by being a champion for a new change.

The mere fact that there has been no interest in implementing all synchronous generator features into the asynchronous ones would indicate that these features are not highly desired or have been replaced by asyncio asynchronous frameworks.

This is just not true. I suspect most people run into this issue and then have to find a work around.

If you read @bluetech’s post (a maintainer of pytest) they also run into this problem.
Here is another post Return with value allowed in generators but not async generators? - #3 by MadcowD
Here is another `yield from` in async functions - #12 by yoavdw

I understand that you might not like the ideas here, but I would appreciate it if you would please kindly respond to the points in my counter arguments instead of trying to shut off conversation here based on “interest” or “my writing style”. I want to discuss the actual meat and bones here and I think you have some really good ideas so I’d love to hear what your genuine thoughts are to my responses.

alex-dixon · October 9, 2024, 2:38pm

With respect to consistency, synchronous does not equal asynchronous, but we find value in equivalent representations of synchronous and asynchronous code. Features like async/await are motivated by the desire to make async code look and behave like synchronous code.

The examples from pytest and others suggest a benefit to async generators looking and behaving like sync generators because it enables straightforward refactoring to async. We believe users should be able to add the word “async” to their function, with no other code changes or new constructs required relative to the synchronous version.

Arguments for simplicity in refactoring “are behind most of the semantics” of the yield from PEP (PEP 380):

It should be possible to take a section of code containing one or more yield expressions, move it into a separate function (using the usual techniques to deal with references to variables in the surrounding scope, etc.), and call the new function using a yield from expression.

The behaviour of the resulting compound generator should be, as far as reasonably practicable, the same as the original unfactored generator in all situations, including calls to __next__(), send(),throw() and close().

github.com

python/peps/blob/fb134c39c42a8d461083b42b0507f0c6aee2d0d4/peps/pep-0380.rst#the-refactoring-principle

PEP: 380
Title: Syntax for Delegating to a Subgenerator
Version: $Revision$
Last-Modified: $Date$
Author: Gregory Ewing <greg.ewing@canterbury.ac.nz>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Feb-2009
Python-Version: 3.3
Post-History:
Resolution: https://mail.python.org/pipermail/python-dev/2011-June/112010.html


Abstract
========

A syntax is proposed for a generator to delegate part of its
operations to another generator.  This allows a section of code
containing 'yield' to be factored out and placed in another generator.

This file has been truncated. show original

If we consider a sync generator being refactored into an async generator, this is exactly our argument.

I am still unable to locate the historic rationale for async generators not having return values.

Additional arguments at this time that it would be good to have feedback on:

This is a backward-compatible change that will not break any existing code

mikeshardmind · October 9, 2024, 3:23pm

While it may not have been explicitly documented, when you understand that synchronous generators have been used as coroutines and should be viewed as analogs to async coroutines, not to async generators, and yield from being a tool to delegate synchronous generators, and not simply repeated “yield in a loop”, the reason not to have yield from or return values is obvious.

I tried to assist with that by linking the actual history since the parallel you have drawn between async generators and synchronous ones isn’t actually historically accurate.

yselivanov · October 9, 2024, 5:16pm

I was thinking about this when working on PEP 525 but ultimately decided that allowing asynchronous generators to return values isn’t a worthy feature unless an asynchronous form of yield from is also implemented. And that’s just way too much complexity for a relatively niche feature.

Allowing return value in asynchronous generators without asynchronous yield from will cause user confusion, since writing except AsyncStopIteration as e: e.value is extremely unergonomic.

FWIW asynchronous generators as defined in PEP 525 100% follow the behavior of regular Python generators in the pre yield from era.

Lastly, in an ideal world, I do think that for the sake of completeness yield from should be implemented for asynchronous generators. But it will require some non-trivial work in Python internals and will further complicate the already overcomplicated implementation of Python generators that few people actually understand and can reason about.

MadcowD · October 9, 2024, 5:20pm

So if we were able to provide a reference or full mplementation of async yield from then this would be an acceptable PEP? Would you be willing to sponsor if we got to an acceptable implementation of yield from?

yselivanov · October 9, 2024, 6:42pm

Basically I’m personally torn between two things: on the one hand I’d love asynchronous generators to be as powerful as regular generators, on the other hand I’m afraid that the implementation/maintenance cost might be too much.

I’d start with the reference implementation and tests. Only after that we’ll be able to assess how big of a change this would be. It might turn out to be a few lines long diff, or it could be another 1000 lines of very complicated C code. Keep in mind, that this would be acceptable (IMO, obviously) only if we have a full proper yield from implementation working with agen.asend() and agen.athrow().

If, once having the reference implementation, you’re still personally convinced this is a good idea, you should draft a PEP and publish it. I might be the sponsor or maybe bdfl-delegate (if the SC considers me) of the PEP.

ambv · October 15, 2024, 4:19pm

4 posts were split to a new topic: PEP 533 Discussion