Overloads of async generators: inconsistent Coroutine wrapping

sterliakov · June 25, 2024, 2:55pm

The MRE below is taken from this mypy issue. That issue and POC mypy implementation can provide more context into this question.

Consider the following:

from contextlib import asynccontextmanager, contextmanager
from typing import AsyncIterator, Iterator, overload

@overload
@asynccontextmanager  # Error here
async def test_async() -> AsyncIterator[str]: ...

@overload
@asynccontextmanager  # Error here
async def test_async(val: int) -> AsyncIterator[int]: ...

@asynccontextmanager
async def test_async(val: int = 1) -> AsyncIterator[int | str]:
    yield val if val != 1 else 'a'


@overload
@contextmanager
def test_sync() -> Iterator[str]: ...

@overload
@contextmanager
def test_sync(val: int) -> Iterator[int]: ...

@contextmanager
def test_sync(val: int = 1) -> Iterator[int | str]:
    yield val if val != 1 else 'a'

This happens because async functions return types are wrapped with Coroutine unless these functions are generators, and this condition is detected by yield presence in the body. overload definitions have usually only trivial bodies (ellipsis or pass) and do not contain yield. I’d argue that “generator-ness” should be propagated from the implementation to the overloads, marking overloads as async generators if the implementation is, because other ways to make this (correct) snippet pass are counterintuitive (either removing the decorator and constructing the type manually or adding yield to overloads bodies, I don’t see any other reasonable approach)

Type checkers survey:

mypy: wraps with Coroutine, error.

Argument 1 to "asynccontextmanager" has incompatible type "Callable[[], Coroutine[Any, Any, AsyncIterator[int]]]"; expected "Callable[[], AsyncIterator[Never]]"  [arg-type]

pyright: same as mypy

Argument of type "() -> Coroutine[Any, Any, AsyncIterator[str]]" cannot be assigned to parameter "func" of type "(**_P@asynccontextmanager) -> AsyncIterator[_T_co@asynccontextmanager]" in function "asynccontextmanager"

pyre: accepts, produces expected reveal_type

pytype: rejects, but also rejects trivial bodies of test_sync.

 Function contextlib.asynccontextmanager was called with the wrong arguments [wrong-arg-types]

This issue is not currently addressed in typing specification. @hauntsaninja advised to start a topic here to discuss this potential change and bring it to sync with other typecheckers if agreed upon.

erictraut · June 25, 2024, 3:20pm

I’d argue that “generator-ness” should be propagated from the implementation to the overloads

I don’t think that’s the right answer. There are legitimate cases where no implementation is given — notably, in stub files and in protocol definitions.

An alternative solution that appears to work already with all the major type checkers (and in stubs, etc.) is to omit the async on the overloads.

@overload
@asynccontextmanager
def test_async() -> AsyncIterator[str]: ...

@overload
@asynccontextmanager
def test_async(val: int) -> AsyncIterator[int]: ...

Alternatively, you can provide a minimal implementation consisting only of a yield statement.

@overload
@asynccontextmanager
async def test_async() -> AsyncIterator[str]:
    yield ""

@overload
@asynccontextmanager
async def test_async(val: int) -> AsyncIterator[int]:
    yield 0

sterliakov · June 25, 2024, 3:23pm

Yes, definitely. But the question is a bit different: are there any cases where an implementation is present, but such propagation is not desired? If there is no implementation, we don’t have any status to propagate, so the behaviour won’t change there.

Omitting async looks like the most appropriate idea among existing solutions, though, if such usage can be documented and explained somewhere (is typing spec the right place for that?). However, the discrepancy between overload and implementation signature will still be confusing for the user IMO.

Liz · June 25, 2024, 3:34pm

Doesn’t pyre have the most correct behavior here? Why not just trust the user provided annotation with a trivial body?

sterliakov · June 25, 2024, 3:39pm

Well, I agree that pyre is correct here.

@overload
async def foo() -> str: ...
@overload
async def foo(x: int) -> int: ...
async def foo(x: int | None = None) -> int | str:
    return x or ''

async def main() -> None:
    await foo()

Type checker shouldn’t interpret the snippet above as a function with non-awaitable returns. pyre handles this case correctly as well.

Liz · June 25, 2024, 3:42pm

I didn’t mean to suggest it would there.

The presence of async def can be a coroutine function or an asynchronous generator function. it can’t be a non-awaitable function. Rather than say “yield isn’t here, so it must be a coroutine function”, why not trust the return annotation to differentiate between the possible options for a trivial body?

sterliakov · June 25, 2024, 3:57pm

Hm, and that’s actually a good question, thanks!

async def fn() -> AsyncIterator[int]

I parse the line above as signature for async def fn(): yield 1. However, what should substitute X in signature for the following:

async def outer() -> X:
    return fn()

async def main() -> None:
    print([x async for x in await outer()])

So, outer really returns a Coroutine[Any, Any, AsyncIterator]. If X was AsyncIterator[int], then general rule says to wrap it with Coroutine, obtaining the desired result. However, with the change you suggest it won’t be wrapped, and await outer() will be falsely rejected. We can require Coroutine to be explicit in such case, but that would be incompatible with all other async typing explanations, causing more confusion.

The example above doesn’t have overload - to make it complete, treat these functions as declared in a stub file without bodies. It’s exactly what Eric Traut mentioned in the first comment: we cannot afford looking at the return type of overloads without implementation to make this decision.

The following are indistinguishable in a stub file/overload signature:

async def fn() -> AsyncIterator[int]: ...  # `yield 1` in implementation
async def outer() -> AsyncIterator[int]: ...  # `return fn()` in implementation, no `yield`

Liz · June 25, 2024, 4:12pm

Stanislav Terliakov:

I parse the line above as signature for async def fn(): yield 1. However, what should substitute X in signature for the following:
async def outer() -> X:
    return fn()

async def main() -> None:
    print([x async for x in await outer()])

Nothing, automatically. If you’re doing it yourself, the type of fn() This isn’t a trivial body though, so I fail to see the relevance of this to what I implied would be an appropriate and easy fix.

Why not? If you don’t have an implementation, why not trust the user provided annotation? If you do have an implementation, rule out possibilities.

sterliakov · June 25, 2024, 4:32pm

Sorry, perhaps it wasn’t clear enough - assume that type checker runs on a stub file containing two functions without bodies, shown in the last snippet, to type check main function. fn and outer are fundamentally different: fn() produces something that can be iterated with async for, outer() produces something awaitable. If we choose “always wrap” route, as it is now, the story ends here in a fatal fashion: only removing async would help. If we choose “wrap unless typed as AsyncIterator” route that you suggest, we can at least wrap with Coroutine manually, but this also isn’t intuitive IMO given that python typing never required that for async functions. So neither route is of least confusion, and thus sticking to the existing behaviour can be more reasonable.

And backwards compatibility, of course: your suggestion will break any existing stub file with a function like outer. I’m not sure if such usage is popular, though.

I still feel that restricting my proposal to overloads with an implementation provided is somewhere near the right balance. It doesn’t break any existing code, except for removing unused ignore comments. And, more importantly, I cannot imagine any legit use case where treating (post-processing) overloaded and implementation signature differently would be desired - and no other overload usage assumes that, to the best of my knowledge.

If the overload is provided without implementation (stub file), the consistency reason is less relevant: we already expect that something not following implementation code is going on in stubs.

Liz · June 25, 2024, 4:51pm

async def name([parameters]):
    body

can have two results: a coroutine function or an asynchronous generator function.

The current general rule is that if there is a yield, it is an asynchrounous generator function, this matches the behavior of runtime, https://peps.python.org/pep-0525/, but doesn’t account for trivial bodies used in stubs. We can’t make bare yield a trivial body, because there are valid reasons to yield None and do nothing else (This is used in a few systems for reasons related to ensuring the event loop is yielding to other tasks)

In the absence of a body to look at, determining which of those was intended is not possible except by looking at the return annotation, so trust the return annotation.

This requires a situation like:

async def foo() -> AsyncIterator[int]:
    yield 1

async def bar() -> AsyncIterator[int]:
    return foo()

I don’t see any breakage here.

async def foo() -> AsyncIterator[int]:
    ...

async def bar() -> AsyncIterator[int]:
    ...

The annotations remain ~~correct~~ wrong whether the body is there or not the presence of overloads doesn’t change this either. It just means that if an overload isn’t compatible with the implementation, it’s an incompatible overload.

mikeshardmind · June 25, 2024, 6:32pm

I think pyre has the most human behavior here, but that this is a specification issue here.

consider these three functions:

async def a() -> AsyncIterator[int]:
    for i in range(10):
        yield i

async def b() -> AsyncIterator[int]:
    async for i in a():
        yield i

async def c() -> AsyncIterator[int]:
    return a()

The automatic wrapping kicks in on c, and results in a very surprising type to most people. The obvious thing would be for c’s body to be an error.

I think the best fix for obvious results would be to change the automatic wrapping to only wrap types which cannot naturally be the correct type without wrapping.

This would prevent starting with an exported AsynchronousIterator, and an erroneous change changing the type without changing the annotations.

This would mean correctly typing c’s body would be:

async def c() -> Coroutine[Any, Any, AsyncIterator[int]]:
    return a()

which is probably a significantly better outcome to begin with as this is very non-standard code, and to have your functions change types without changing annotations is not something users would generally want (for reference, if anyone I knew wrote this, I’d tell them to change it, code like in the body of c needing to await to then asynchronously iterate should set up inside the first iteration, or within an async context manager instead)

hauntsaninja · June 25, 2024, 8:05pm

To anyone missing context, here’s the relevant mypy documentation: More types - mypy 1.10.1 documentation

This general confusion has come up a few times on the mypy issue tracker (a, b, c, d, e, f, g, h) and enough for mypy to have some special case diagnostics for this issue, so the question of how to reduce confusion here is a great one and I’m grateful that sterliakov is exploring this.

Like Eric, I’m wary of this special casing. It doesn’t work in the context of base classes with dummy impl, protocols or stubs — these account for almost all of the complaints in the mypy issues I linked.

So I kind of feel it doesn’t solve the general problem enough to justify its specialness, and maybe we should lean into the direction of making type checkers emit more helpful diagnostics here instead of reinterpreting the type.

MegaIng · June 25, 2024, 8:18pm

Maybe it would be a good idea to have a generator dummy implementation yield ... which would be treated as if the function had just ..., except it’s a generator, whatever that means for the type checker.

Liz · June 25, 2024, 9:18pm

I had to play around with this a bit to understand it a bit better, but I think this works and has consistent results? I couldn’t find any cases where it wouldn’t.

Can we change how type checkers treat this so that it is a consistently applicable rule that doesn’t depend on the function body? This would be not automatically wrapping a return type of Coroutine, AsyncIterator or AsyncGenerator in Coroutine for functions with async def, I think.

sterliakov · June 26, 2024, 2:23am

Hm, and only two of these issues (if I’m not mistaken) are related to overload (so have implementation body to check). Thanks for this context - it really means that something hould be solved on a deeper level.

Now, what will go wrong if we declare that {AsyncIterator, AsyncGenerator} and probably Coroutine and Awaitable as well are never wrapped? If the spec can be formulated as “any return type except those is assumed to be implicitly wrapped with Coroutine”, it obviously breaks backwards compat, but brings more freedom in definitions.

First of all, anyone who is now using async def fn() -> Coroutine[...] to indicate a “twice awaitable” something will be harmed. However, this is hardly a popular scenario, and probably breakage to this extent is permitted.

The main question is: can any user writing an async function with non-generator body (we can’t say “trivial” any longer if we want to apply that logic unconditionally) and AsyncIterator return type really mean implicit Coroutine? If we also include Coroutine itself into non-wrapped types, can any user reasonably want a function that returns a “twice awaitable” object? Are there any use cases for this pattern? If there are, how difficult it is to recover from this behaviour change?

Trying to self-answer this: I can’t imagine a production setting where such pathological cases arise. Any such use can be trivially recovered by wrapping with Coroutine by hand. IMO, this change can be justified, perhaps hidden under a typechecker flag for a couple of minor releases.

Extra benefit: new/inexperienced users coming from TypeScript background are used to explicit Promise for any async function return type. So telling them that it is mandatory in some cases won’t be a big surprise.

This introduces a point of confusion. Two definitions below will become equivalent under new rules:

async def fn() -> Coroutine[Any, Any, int]: ...
async def fn() > int: ...

However, this confusion is at least “safe”. Shall someone decide to wrap everything async with a Coroutine or Awaitable, it won’t do any harm. This may be deferred to linters like ruff or flake8, warning about unnecessary explicit Coroutine wrapper. Since this will be purely a style issue, not providing “one and preferably only one way” looks justified to me.

I just attempted an implementation, and mypy_primer quickly reminded me that things aren’t that trivial. When we decide to apply this rule conditionally, a lot of problems arise. What about unbound typevars, where we don’t know if something is awaitable? What about union types of mixed kinds (e.g. Iterator[Any] | AsyncIterator[Any], like ~~here sorry wrong link~~ here, where ContentStream is a typing.Iterable[Content] | typing.AsyncIterable[Content], and it could easily have been Iterator | AsyncIterator instead)? What about Any itself? I started to think that this issue mandates a whole PEP to agree on all corner cases…

sterliakov · November 12, 2024, 2:32am

@hauntsaninja sorry for necro, I must have missed this bit.

Actually my initial proposal (propagating generator-ness from implementation to overloads) solves two of those 8 issues (namely f and h). I proposed this implementation in Propagate is_async_generator down to overloads by sterliakov · Pull Request #17435 · python/mypy · GitHub and primer also revealed one open-source case with this problem. I understand that this adds a little bit of special-casing, but I think it’s justified here (we already have special treatment for overloads, and letting them inherit “generator-ness” from the implementation sounds rather useful).

To be clear, I don’t propose (yet?) any changes related to inheritance, base classes and other places where trivial bodies may arise. I’m only suggesting the following: when there’s an @overload async def fn with a trivial body followed by async def fn with non-trivial implementation body containing yield, we should propagate this status to overload.

To illustrate, consider the following (and that’s the only case I’m trying to change - overload + async + known implementation; here’s playground):

from collections.abc import AsyncIterator
from typing import overload

@overload
async def fn() -> AsyncIterator[str]: ...
@overload
async def fn(_: str) -> AsyncIterator[str]: ...
async def fn(_: str | None = None) -> AsyncIterator[str]:
    yield "foo"

Now (and also Overloaded function implementation cannot produce return type of signature 1 [misc] at fn definition):

async def main() -> None:
    async for x in fn():  # E: "Coroutine[Any, Any, AsyncIterator[str]]" has no attribute "__aiter__" (not async iterable)  [attr-defined]
        reveal_type(x)

Proposed:

async def main() -> None:
    async for x in fn():
        reveal_type(x)  # N: revealed type is "builtins.str"

Currently this can be emulated by using non-trivial bodies in overloads (playground), but it isn’t really necessary: we already have enough information.


@overload
async def fn() -> AsyncIterator[str]: 
    yield ""
@overload
async def fn(_: str) -> AsyncIterator[str]:
    yield ""
async def fn(_: str | None = None) -> AsyncIterator[str]:
    yield "foo"

With the changes I propose two fn definitions above will be equivalent.

Could you or @erictraut elaborate on the specific problems that arise from doing this? I don’t get the argument “but it won’t work if the body is absent” - isn’t it reasonable to support common variations even if the general case isn’t solved yet?

erictraut · November 12, 2024, 3:47am

I’m -1 on propagating information from an implementation to overloads. There is no precedent for this, and I don’t think we should create such a precedent.

Overloads need to work in the absence of an implementation (e.g. in a stub file or protocol definition), so adding special-case behavior when an implementation is present introduces multiple (inconsistent) ways to declare the overload — an unnecessary complexity that will create confusion. It also requires some pretty ugly special-case behavior in multiple places within pyright, and presumably all of the other type checkers, stub generators, etc.

As I mentioned above, there’s a straightforward solution: omit the async keyword in the overloads. This solution doesn’t involve any special casing and already works with all type checkers.

If you think this is a sufficiently common use case and is confusing to developers, then I suggest that documentation is the right answer. You’re welcome to submit an update to the Type System Guides.