PEP 828: Supporting 'yield from' in asynchronous generators

Hello,

I’ve written a PEP proposing support for yield from in async generators, as well as an async yield from statement.

This is currently targeted towards 3.15, but with the beta freeze coming up soon, this might not make that cutoff if accepted. If discussion gets heavy, I’ll defer the PEP to 3.16.

The full text is available here:

9 Likes

Hmmmm… I haven’t read your PEP, but I’m already uneasy with async generators (due to the need to do a little dance that includes registering a callback), and I wonder if allowing yield from isn’t a bridge too far.

(Honestly, now that we have await, I feel that yield from isn’t needed very often and feels more like a historical dead end – a stepping stone on the way to async/await.)

3 Likes

I think yield from is still a pretty common pattern, even when you don’t need the subgenerator semantics, because it’s handy syntactic sugar that saves you a line of code when writing generators.

For example, when writing a code generator, I’ll often do something like this:

yield "if (whatever != NULL) {"
yield from _generate_body(...)
yield "}

My primary goal with this PEP is to extend that syntactic sugar to async generators. The subgenerator delegation is definitely less useful nowadays, but I wouldn’t go as far to say that it’s unused. (I’ve always found gen.send with a non-None value to be a particularly cool feature with a lot of potential outside of concurrency; the lack of yield from makes that less feasible for async generators.)

7 Likes

Ok fine. Just don’t ask me for a review or endorsement. :slight_smile:

2 Likes

The ability to use yield from and the subgenerator delegation both seem like relative positives, but I don’t particularly know if the return value of an async generator should be supported for use.

The main reason that particular part exists for synchronous generators was coroutines, there’s not a clear parallel need here for async generators, and this appears to be something implemented because it can be implemented consistently with sync generators, and not because there’s a compelling use for it.

In terms of parity and consistency, if implementing it, I do see the argument, but I have a feeling that the return value of an aysnc generator is likely to be more of an API nuisance where people should be implementing it in a different way, but use this because it’s there.

If subgenerator delegation is added to async generators, I don’t see any reason not to support return values. Every function in Python has a return value, even if it’s never meaningful, so it should be fine to support this even if most async generators don’t make use of it.

1 Like

Prior to this proposal, there was an entire discussion thread about adding async generator return values solely by catching the StopAsyncIteration exception. That tells me there’s some demand for this.

But as you said, adding a return value to async generators improves consistency. I agree that it’s not super useful, among other things, but I don’t think we should arbitrarily disallow it. There are plenty of ways to design bad APIs.

1 Like

All the PEP 380 arguments about correct delegations apply to async generators as well, so if the implementation complexity challenges have been overcome, it makes sense to revisit this (the way coroutines and generators are implemented changed significantly in the past few years, so I find it entirely plausible that this is feasible now despite being impractical back when async generators were first added).

2 Likes

IIRC I discussed this with Peter at the previous core sprint. My own opinion is that implementing yield from is a separate concern from async generators having problems with with blocks and cancellation. I think we should implement this, happy to be BDFL-delegate for this one (since I added async/await and async generators in the first place).

6 Likes

I made the case against use in the way people asked for in that thread, so it shouldn’t be a surprise that I don’t find the request in the thread compelling and that it may be better to lead people to API decisions that produce better developer experiences.

It’s also why I wouldn’t view it as arbitrary to disallow: We have preexisting better ways for the problems the people in those prior threads are trying to solve by catching the exception and then using the return value.

Meanwhile, the subgenerator delegation and ability to yield from other iterators, async or not, is a clear improvement with or without allowing capturing the return value.

I personally value the consistency and teachability on about the same level as not enabling things that don’t have a strong use case, but do create nuisances, so as a concern, I don’t feel strongly enough to say it shouldn’t exist, but I do think we have evidence that people will reach for it when better options exist.

1 Like

My main feedback is that async yield from should become just yield from (drop async and the yield from <sync gen> idea entirely).

In the years of using async generators and observing them in others code I’ve never seen the need of yielding from a regular generator inside an async generator. Real-world code just doesn’t want to compose that way. And since we don’t have async yield I expect people intuitively trying to attempt using yield from <async gen> all the time and abruptly discovering it’s not what they expect it to be. The very narrow (maybe non-existant?) use case of yield from <sync gen> inside async generators doesn’t justify having that feature.

Aside from that the PEP seems fairly straightforward. The only reason this isn’t part of my original async generator PEP was lack of time. Huge thanks for pushing this.

4 Likes

Hm, okay, this is a good discussion point.

Not disagreeing, but in terms of language design, it feels painful for yield from x() to have a different meaning depending on the function in which it was defined.

That, and it might be a bold decision to assume that all async generators won’t ever need to delegate to a synchronous generator, especially in the long term. If in 20 years this is a common pattern, it would be very difficult to change the defaults here. We already have async variations of existing language constructs, so I don’t think it’s too big a stretch to do that for yield from as well.

Yeah, I added a neat exception message for this instead. If you try to yield from an async generator, you’ll get a message like TypeError: async_generator object is not iterable. Did you mean 'async yield from'?

9 Likes

That would break the general pattern that everything inside an async function is done synchronously unless it’s marked with or . It would also preclude ever adding this feature in the future.

12 Likes

Lack of async yield from is one of the major flaws messing up async pluggy

2 Likes

-1 here. (Edit: withdrawn)

Firstly, from my personal experience, async generators in Python have (/had?) so many footguns that I just… stopped using them. I now just use asyncio.Queue and/or callback functions, and as a result, I haven’t had a use case for async generators for years. So, speaking as a Python user, I can’t find a reason to support further development of async generators, and in fact, I feel it might be better if they were deprecated and removed from the language entirely! (I am likely missing something, of course: perhaps there are frameworks that make good use of async generators and manage to do it properly… but in the lack of this knowledge, were I asked to advise someone on how to do this kind of thing I would always say “avoid async generators, use asyncio.Queue and/or callback functions”.)

Secondly, if yield from is going to be allowed in an async generator at all, I think (maybe similar to @yselivanov’s feedback) you should be able to use it to mean async for item in agenerator(): yield item, and async yield from should not exist. If, hypothetically, yield from x in an async generator could auto-detect whether x was an async generator or a non-async generator and behave as appropriate, then that would be the best meaning; but giving it either the meaning “it only works for sync generators” or “it only works for async generators” rules out ever switching it to the other.

Leaving things as they are for now is probably the best course of action; if one day the numerous asyncio footguns (exceptions, cancellation, with blocks, catching BaseException, forgetting to await, dare I add function coloring, …) can be resolved, then maybe there would be a case to implement this (and then maybe it’d be possible to implement the “auto-detect sync vs async generator” behavior too) but for now I think it’s better to keep “yield from in async generator” as a SyntaxError.

1 Like

I don’t mean to be rude, but this is an incredibly naive assertion to make on a proposal specifically about async generators. Asynchronous generators are used everywhere, and I definitely don’t think they’re enough of a “footgun” to deprecate or to avoid further development.

If you’re not convinced, here are a number of major projects that use asynchronous generators:

  1. Langchain
  2. Strawberry
  3. Home Assistant
  4. Ray
  5. aiohttp
  6. Starlette
  7. Django
  8. HTTPX
  9. Apache Airflow
  10. Anything using @contextlib.asynccontextmanager (though these cases won’t really benefit from this proposal)

This is possible, but it comes with a few other issues. We would have to check if an object implements __aiter__ after failing(?) to look up __iter__ on a given object, but on its own, this is ambiguous:

  1. If an object implements both __iter__ and __aiter__, which should be chosen? Seemingly, __iter__ would be chosen, which would likely either silently perform blocking I/O or otherwise raise some spurious error.
  2. What happens if the lookup for __iter__ raises an exception? Do we still proceed to check for __aiter__? What if that raises an exception too?

It also has the downside of being slower due to an extra lookup, but I won’t get into that.

More generally, we don’t have any existing language features that can operate on both a synchronous and asynchronous object, at least that I’m aware of. I’m inclined to quote the Zen here: “Explicit is better than implicit,” and “In the face of ambiguity, refuse the temptation to guess.”

For future reference, please don’t pursue the narrative that “asyncio is not useful” or “asyncio is too hard to use” further here. It’s irrelevant to the discussion and disrespectful, given that you’re discussing with the people who maintain and contribute to asyncio.

11 Likes

My points were:

  1. This random Python user over here has battled with asyncio generators for years, trying to make good use of them, and ended up with numerous nasty problems in production that were ultimately solved by avoiding using this language feature entirely. Yes, you can easily argue that that’s all my fault for just being bad at coding or something :person_shrugging:, but I do feel there’s also an argument that if a language feature is this hard to use correctly, maybe there’s some flaws with its design. I’m not posting this to have a random bash at Python or asyncio. I love Python and asyncio! I’m posting this because I felt maybe just maybe it would be valuable feedback, that you wouldn’t likely often receive. I didn’t really want to post it, tbh, because any way of putting it feels rude, but despite that, I felt raising it was still important to try and improve Python.
  2. Despite whether or not async generators are a good idea, having a yield from syntax with the specific behavior of “yield from a sync generator within an async generator” and a separate async yield from syntax with the specific behavior of “yield from an async generator within an async generator” feels like incredibly counterintuitive design, so even if I was super happy with async generators and used them all the time, and would want this syntax, I would still fully object to the proposal as written on this ground alone.
  3. There are, imo, much more important problems to solve within asyncio right now than this one. Adding yet more syntax that is specifically tailored to how asyncio works right now makes it harder to improve asyncio in more fundamental ways, if they would have to rip out that syntax again. Hence why I felt status quo was best.

Regarding the problems with auto-detecting sync vs async - yes, I’m aware that actually implementing such a thing would be impractical if not impossible with how Python asyncio currently works. That’s why I said hypothetically. I was pointing out that from the user’s POV, the best, and intuitive, syntax is “yield from works regardless”, and anything else will inevitably fall short of that, which needs to at least be taken into account as a potential downside. Of course, it’s fine to say “we’re aware of this downside and we feel that the benefits outweigh the cost”.

For clarity: imo, yield from meaning “yield from async in async” and async yield from not existing (and there being no way to write “yield from sync in async”) would be much better (more intuitive) than the proposal written. I still don’t think it’s needed - but when I saw the PEP, I thought that that was what it was going to propose, and had that been the case I would have let it be.

Do you know WHAT the hard-to-use parts were? If a language feature is in fact flawed, the solution is to improve on it.

That all depends on intent. If you want to improve the language, let’s talk about improvements. (Though I suspect that anything based on the issues you’re having with current features is off-topic for this thread, so it’d be better to start a new thread.) I have definitely had some issues with asyncio in Python, and that’s based on having used asynchronous I/O in multiple different languages. I won’t post about specifics until I’ve had a chance to check how those issues play under eager tasks, since that may make a big difference.

Hmm. I haven’t much thought about this as an issue, since “yield” and “async yield” feel like sync and async counterparts, perfectly intuitively. Can you elaborate?

1 Like

Well, do you consider async for on an async iterator to be a counterintuitive design as well? In my eyes, I’d think that yield from range(3) not working in async generators would be counterintuitive.

I think you misunderstand the difference between coroutines and what asyncio does.

In Python, a coroutine is just a blanket term for a generator that yields some kind of magic object that the given event loop knows how to handle. That’s all there is to it. The coroutine implementation (as well as the asynchronous generator implementation) is not tied to the implementation of asyncio; this is by design, because it allows users to write alternative implementations while still using the async/await syntax (such as trio).

This means that asyncio, nor any async implementation, really cares (or knows) whether the thing it’s awaiting uses an async generator. The event loop only ever looks at the objects that have been returned by send(). If asyncio were to get a complete rewrite, we wouldn’t have to touch the coroutine implementation at all.

1 Like

yield from iterable is roughly equivalent to for x in iterable: yield x (identical to its expansion in synchronous generators)

async yield from iterable is roughly equivalent to async for x in iterable: yield x (which would be illegal in a synchronous generator, and hence the expression form wouldn’t be legal either)

The additional async keyword is there to modify the iteration loop rather than anything else.

If all for loops in async generators had been defined as implicitly async, or as magically switching based on the supplied value, that would be a different story, but they’re not, they perform synchronous iteration unless modified with the async keyword.

It’s also worth considering that one of the reasons async generators are hard to get right is because they can’t actually delegate their exception handling to subiterators (whether synchronous or asynchronous) properly (the main reason yield from expressions were added in the first place).

10 Likes