Mixing async with operators for collection ABCs

jace · September 16, 2024, 8:22am

Hello! I’m adding async support to code that makes liberal use of collection ABCs for IO-bound operations (eg: 'key' in cache). There doesn’t seem to be a good way to mix operators with async. await 'key' in cache won’t work because __contains__ cannot return an awaitable.

If I have to drop magic methods and rewrite everything as function and method calls, I’d like to adopt standard conventions. Complementing built-in functions for async seems straightforward enough:

class AsyncSized(Sized):
    @abstractmethod
    async def __alen__(self) -> int:
        """Default implementation defers to sync, to be overridden."""
        return len(self)
    # Implement __subclasshook__ to check for `__alen__`

async def alen(obj: AsyncSized | Sized) -> int:
    if isinstance(obj, AsyncSized):
        return await obj.__alen__()
    return len(obj)

But what about operators? Should the in operator be replaced with a method as async def ahas(self, item), or should it be another top-level function ain(container, item)? This is not clear.

I have two questions:

Is anyone working on a PEP for async operators that adds __alen__, __acontains__, __agetitem__, etc, with proposed syntax?
Are there examples of libraries that have tackled this in a good way?

jace · September 16, 2024, 8:55am

Found this old comment from Nick Coghlan which suggests operators are unlikely to get async variants because their rules are too complicated already: forward and reverse methods, fallbacks, tuple unpacking, etc:

But what does an “asynchronous assignment” do? Do we need __asetattr__ and __asetitem__ protocols, and only allow it when the target is a subscript operation or an attribute? What if we’re assigning to multiple targets, do the run in parallel? How is tuple unpacking handled? How is augmented assignment handled?

If we allow asynchronous assignment, do we allow asynchronous deletion as well?

As you start working through some of those possible implications of offering an asynchronous assignment syntax, the explicit method based “await table.set(key, value)” construct may not look so bad after all

jace · September 17, 2024, 5:30am

While I’m at a dead end, maybe I can engage in hypotheticals. Here is my understanding of how the in operator currently works:

If __contains__ is defined, return bool(type(rhs).__contains__(rhs, lhs)). Since bool(awaitable) will not await it and the refcount will drop to zero right after, it’ll cause a RuntimeWarning.
If __iter__ is defined, return any(lhs is item or lhs == item for item in type(rhs).__iter__(rhs)). This too is not an async for.
If __getitem__ is defined, it’s called with sequentially increasing integers until there’s a match or an IndexError. I’m not really sure how this works because I’ve not seen any custom __getitem__ implementation that raises IndexError.

A hypothetical await in operator could tweak this to be aware of awaitables being returned:

If __contains__ is defined, return bool(await type(rhs).__contains__(rhs, lhs)). It should be an error if a non-awaitable is returned.
If __aiter__ is defined, use that (not __iter__) and return any(lhs is item or lhs == item async for item in type(rhs).__aiter__(rhs)).
If __getitem__ is defined, call in a loop and await the return value of each.

This has problems:

If __contains__ returns an awaitable, it’ll break the regular in operator.
Using __aiter__ for the fallback implies x await in y is a different operator from await (x in y). However, x not in y has identical behaviour to not (x in y), so this is confusing.
It is not possible to produce an awaitable object that is awaited later.

A hypothetical async in operator could have entirely distinct behaviour:

If __acontains__ (new) is defined, return type(rhs).__acontains__(rhs, lhs) without awaiting it.
If __aiter__ is defined, return an awaitable construct that will iterate looking for a match (effectively an async lambda?).
No longer support old-style iteration with __getitem__ and incrementing integers (same behaviour as async for).

Usage (looks verbose!):

await lhs async in rhs
await lhs async not in rhs
callback(lhs async in rhs)

Because this async in operator does not have an implicit await, it can even be used in a sync context.

Thoughts?

Edit: Corrected to note that magic methods are called on the type, not the instance.

jace · September 17, 2024, 7:20am

I noticed the Sage library has something called an infix operator, to create objects that behave like operators (eg: u *dot* v), so this gave me an idea. Can I implement an async_in operator? I tried with the relatively-unused @ matmul operator here:

from typing import Any, AsyncGenerator, Awaitable, Callable

_marker = object()


class AsyncOp:
    def __init__(
        self,
        func: Callable[[Any, Any], Awaitable[bool]],
        /,
        lhs: Any = _marker,
        rhs: Any = _marker,
    ) -> None:
        self._func = func
        self._lhs = lhs
        self._rhs = rhs

    def __matmul__(self, rhs: Any):
        if self._lhs is not _marker:
            return self._func(self._lhs, rhs)
        return AsyncOp(self._func, rhs=rhs)

    def __rmatmul__(self, lhs: Any):
        if self._rhs is not _marker:
            return self._func(lhs, self._rhs)
        return AsyncOp(self._func, lhs=lhs)

    async def __call__(self, lhs: Any, rhs: Any) -> Awaitable[bool]:
        return await self._func(lhs, rhs)


@AsyncOp
async def async_in(lhs: Any, rhs: Any) -> bool:
    """Async `in` operator."""
    cls = type(rhs)
    if hasattr(cls, "__acontains__"):
        return bool(await cls.__acontains__(rhs, lhs))
    async for item in rhs:
        if item is lhs or item == lhs:
            return True
    return False


if __name__ == "__main__":
    import asyncio

    class EvenContainer:
        async def __acontains__(self, item: Any) -> bool:
            if isinstance(item, int):
                return item % 2 == 0

    class EvenIterator:
        def __init__(self, limit: int = 100) -> None:
            self.limit = limit

        def __aiter__(self) -> AsyncGenerator[int, None]:
            return self.iterator()

        async def iterator(self) -> AsyncGenerator[int, None]:
            for value in range(0, self.limit, 2):
                yield value

    class NotIterable:
        pass

    ec = EvenContainer()
    ei = EvenIterator(100)
    ni = NotIterable()

    async def test():
        # Functional syntax with `__acontains__`
        print(await async_in(1, ec), "== False")
        print(await async_in(10, ec), "== True")
        # Operator syntax with `__acontains__`
        print(await (1 @async_in@ ec), "== False")
        print(await (10 @async_in@ ec), "== True")
        # Functional syntax with `__aiter__`
        print(await async_in(1, ei), "== False")
        print(await async_in(10, ei), "== True")
        # Operator syntax with `__aiter__`
        print(await (1 @async_in@ ei), "== False")
        print(await (10 @async_in@ ei), "== True")

        # Non iterables
        try:
            await async_in(1, ni)
        except TypeError as exc:
            print(f"{exc.__class__.__name__}: {exc}")
        else:
            print("TypeError was not raised")
        try:
            await (1 @async_in@ ni)
        except TypeError as exc:
            print(f"{exc.__class__.__name__}: {exc}")
        else:
            print("TypeError was not raised")

    asyncio.run(test())

This works, but with caveats:

Code formatters will put spaces around the binary operators. x @ async_in @ y is not obvious at all.
The @ operator is lower precedence than the await operator, so the expression needs brackets. As per the operator precedence chart, await is higher than all binary operators. The @ operator is also far above in and can cause other unintended mistakes. (| is just above.)
This will need some refactoring for type checkers to understand.