Ambivalent async generator function signature

ddun · November 17, 2024, 9:05am

TL;DR - async def f() -> AsyncGenerator is incoherent compared to other generator/function type annotations in Python.

After getting back to Python after almost fifteen years (last time it was Python 2!), I have quite happily adopted its type hints. I find the system, despite being a bolt-on feature, quite coherent and integrates well with Python’s dynamic nature and strong reflective capabilities.

Even for async functions, I believe Python’s handling of the return type is better than that of JS/TS by making async a modifier:

async def f() -> int: ...
await f()

# alternatively
def f() -> Awaitable[int]: ...
await f()

vs.

async function f(): Promise<number>;
await f();

// alternatively
function f(): Promise<number>;
await f();

In JS/TS, while the async keyword gives the function’s body extra capability (to use await inside), it does not impact how the function should be called.

However, I noticed some discrepancy with regard to how an async generator function’s signature should be interpreted.

Before diving into the details, I would like to recap what role a function’s signature plays (for me). Echoing Steve Klabnik’s view, the signature:

For its body or implementation, dictates what parameters are available for use, and what type of value it should return.
For its caller, dictates how the function should be used.

If we consider async to be part of a Python function’s signature, then Python meets the above criteria pretty well:

Signature	Use case
`async def f() -> int`	`await f()` as an int
`def f() -> Awaitable[int]`	`await f()` as an int
`def g() -> Generator[int, None, None]`	`for x in g()` where x’s are int
`async def j() -> AsyncGenerator[int, None]`	?

But what should we put at the question mark? There are actually two possible answers, depending on the body of j!

See below:

async def j() -> AsyncGenerator[int, None]:
    yield 0
    
async for x in j():
    assert x == 0
    
async def k() -> AsyncGenerator[int, None]:
    return j()
    
async for x in (await k()):
    assert x == 0

Here j and k seemingly have the same signature, while their use cases are quite different. An unequivocal and equivalent signature of j is actually

def j() -> AsyncGenerator[int, None]

and that of k is

def k() -> Awaitable[AsyncGenerator[int, None]]

Note that the interpretation of k aligns better with what we’ve seen with async and Awaitable in f.

In reality, however, we’ve rarely seen anyone define a function like k. The rather incoherent case of j is prevalent—incoherent because the async keyword does not modify the actual signature here, and what actually plays the crucial role is the yield inside its body.

Why?

There are several factors in play here:

While def j() -> AsyncGenerator[int, None] more precisely captures the use case for j, without the async keyword we will not be able to use await inside its body.
For a normal function like f, putting async on the LHS ‘cancels out’ the Awaitable on the RHS, but in the case of j , there is nothing to cancel out.
For a normal generator function, its signature is also informed by the use of yield inside the body.

Let’s look at factor #3 here more closely. What it means is that

def g() -> Generator[int, None, None]: 
    yield 1  # This makes `g` a generator

def h() -> Generator[int, None, None]:
    return g()  # This make `h` a function

But a caveat for normal generator functions is that the semantic difference in their signatures does not imply a practical one:

for x in g():
    assert x == 1

for x in h():
    assert x == 1

Which is, unfortunately, not true for async generator functions.

On the other hand, JS/TS does better than Python on this issue, with their * marker for generators:

async function* j(): AsyncGenerator {
    yield 1;
}
async function k(): Promise<AsyncGenerator> {
    return j();
}

Admittedly, from PEP 362’s perspective, a function’s signature has nothing to do with the async keyword:

from inspect import signature
def a() -> int: ...
async def b() -> int: ...
assert signature(a) == signature(b)

But I believe this is because PEP 362 predates the introduction of async/await.

hauntsaninja · November 17, 2024, 9:59pm

Thanks for the writeup. Note you can see mypy’s documentation that covers this over here: More types - mypy 1.13.0 documentation

Given the nested Callable[..., Coroutine[Any, Any, AsyncGenerator[X, None]]] type is rare in practice (but not non-existent), I think it’s usually still intuitive to readers of the signature what the type is. The main difficulty I’ve seen comes up in the cases when users omit the body (e.g. protocols, base classes with dummy impls, stubs, overloads). See also Overloads of async generators: inconsistent Coroutine wrapping - #12 by hauntsaninja

I think the options here are:

Rely on documentation and special cased type checker diagnostics (mypy gained some, but could have more)
Add special casing in tools that consume annotations so that async def never adds a wrapping to AsyncGenerator return type. See sterliakov’s post here for some discussion of this: Overloads of async generators: inconsistent Coroutine wrapping - #15 by sterliakov
Add some new special form or syntax that type checkers would interpret as if there was a yield in the body. I think this could be confusing for backward compat and runtime type checking reasons.

mikeshardmind · November 18, 2024, 4:26am

I said it in the thread you linked, but the problem with reading the function body isn’t limited to backwards compatibility or stubs. It also means someone can erroneously change the function body without touching the type signature and have the type of the function change.

In line with your second listed option, I think this is the only reasonable option.

for idiomatic async code, nothing changes. for odd cases like a coroutine function that returns an AsyncIterator without being one itself, you’d need:

async def foo() -> Coroutine[Any, Any, AsyncIterator[int]]:
    ...

ddun · November 18, 2024, 5:29am

It also means someone can erroneously change the function body without touching the type signature and have the type of the function change.

Agree–this is also my concern. The type signature is not reliable without peeking inside the body of the function.

The problem with backward compatibility here is

The semantic difference between async function and async generator function already exists:
async def f() -> int works differently from async def g() -> AsyncGenerator[int] because the former asynchronous returns an int while the latter synchronously returns an AsyncGenerator. So when explaining the async return types one we need to say “an async function’s de facto return type is its return type expression wrapped in a coroutine, except for when the return type involves an AsyncGenerator, in which case the return type should be treated as is.”
There are existing code base relying on async def g() -> AsyncGenerator defining an async generator while some other (rare) code base relying on it defining an async function that asynchronously returns an async generator. Changing the semantic interpretation of this type signature will have to break one of the two use cases anyway, and in the end we may need to pick the lesser evil.

So my two cents is maybe we leave async def g() -> AsyncGenerator ambivalent and up to two valid interpretations. But we could encourage the use of a new marker in the signature to clearly signify when a function is a generator, without having to use a yield inside the body.

For example, the following would be equivalent:

async def g() -> *AsyncGenerator: ...

async def g() -> AsyncGenerator: yield

And async def g() -> AsyncGenerator without yield is just a quirky async function. (Shameless plug: I find the use of * here quite fitting as it implies the function gives out individual items of an iterator)