How can async support dispatch between sync and async variants of the same code?

sirosen · April 12, 2022, 6:00pm

Problem

A common scenario for library authors is that they accept some callable as a callback for user-defined logic.

If the library author wants to add support for async methods, some high-level changes are usually needed, but there’s a problem which ends up percolating down through all sorts of utility functions.

Rather than toy examples, I’ll use some of the code I’ve been working on in a branch of the webargs library:

    def _load_location_data(self, *, schema, req, location):
        loader_func = self._get_loader(location)
        return loader_func(req, schema)

    async def _async_load_location_data(self, *, schema, req, location):
        loader_func = self._get_loader(location)
        if asyncio.iscoroutinefunction(loader_func):
            data = await loader_func(req, schema)
        else:
            data = loader_func(req, schema)
        return data

Both of these functions are “the same”, but we need them both. This isn’t so bad for a single function, but stack up a few distinct hooks and methods, and you end up effectively doubling the size of a lot of the plumbing in the project to allow a completely async call path alongside the sync one.

Existing solution for decorators

The interface provided to users sometimes needs to show the difference between the two versions of the same code, e.g. Parser.parse is sync, Parser.async_parse is async. That happens anywhere that the library exposes a bare function call which must become async-capable.
But we can hide it a lot of the time using a decorator and a quick check:

def decorator(func):
    if asyncio.iscoroutinefunction(func):

        @functools.wraps(func)
        async def wrapper(*args, **kwargs): ...

    else:

        @functools.wraps(func)
        def wrapper(*args, **kwargs): ...

    return wrapper

This works great for cases like

import flask
from webargs.flaskparser import parser

app = flask.Flask(__name__)

@app.route("/foo")
@parser.use_args(...)
async def foo(...): ...

I am therefore not super interested in trying to find a better way of presenting an interface for users to call sync or async variants of library code. Decorators solve this pretty well where we can use them. And it’s okay to have to support foo() and async_foo() as different entry points into library code where necessary. The problem is that it’s not just a matter of having foo() and async_foo() at the top level, but a “shadow copy” of your code inside the library to keep the sync and async paths separate.

Past discussions

This issue has been discussed before, in particular

both seem relevant.

However, I don’t see anyone asking for what – as a library author – seems like the best solution:
Is there a way in which the language could be changed such that building the async and non-async variants of the same function could be automated or simplified?

If there’s another past thread I should read, please let me know.

Ideal solution

Today I have this:

class Parser:
    def _load_location_data(self, *, schema, req, location):
        loader_func = self._get_loader(location)
        return loader_func(req, schema)

    async def _async_load_location_data(self, *, schema, req, location):
        loader_func = self._get_loader(location)
        if asyncio.iscoroutinefunction(loader_func):
            data = await loader_func(req, schema)
        else:
            data = loader_func(req, schema)
        return data

    async def _async_other_helper_func(self, ...):
        return await self._async_load_location_data(...)

    def _other_helper_func(self, ...):
        return self._load_location_data(...)

    def public_func(self, ...):
        return self._other_helper_func.call_sync(...)

    async def async_public_func(self, ...):
        return await self._other_helper_func.call_sync(...)

and what I want to write instead is this:

class Parser:
    maybe_async def _load_location_data(self, *, schema, req, location):
        loader_func = self._get_loader(location)
        if (
            asyncio.iscoroutinefunction(loader_func) and
            MAGIC_is_currently_async
        ):
            data = await loader_func(req, schema)
        else:
            data = loader_func(req, schema)
        return data

    maybe_async def _other_helper_func(self, ...):
        # other magic -- strip the await in synchronous calls
        return await self._load_location_data(...)

    def public_func(self, ...):
        return self._other_helper_func(...)

    async def async_public_func(self, ...):
        return await self._other_helper_func.call_async(...)

    # why limit 'maybe_async' to internal methods?
    # if it's part of the lanaguage, we also get to avoid the split in public
    maybe_async def alternative_public_func(self, ...): ...

I’m aware that some of this could be done with code generation. However, maintaining maybe_async codegen would be quite difficult for any individual library maintainer. Certainly harder than finding ways of sharing code between my own internal sync and async variants of the same set of functions.

Conclusion and final question

Is there a solution which can be written to do the above (obviously with less syntactic sugar) in the language today? Or would this require language changes as I think it would?

The goal is to improve library maintenance. So adding runtime dependencies on other pypi packages or very complex solutions don’t really solve it.

Are there known techniques for doing code-sharing between the two paths which make this problem less severe? Perhaps some clever method of passing around and chaining calls on object which may be awaitable?

CAM-Gerlach · April 12, 2022, 8:59pm

Not to imply any opinion on the proposal as I’m not well-informed on the topic, but you might want to at least consider moving this to the async category, which might reach more experienced async-using devs, before proposing here. But that’s your call, ultimately.

sirosen:

def decorator(func):
    if asyncio.iscoroutinefunction(func):

        @functools.wraps(func)
        async def wrapper(*args, **kwargs): ...

    else:

        @functools.wraps(func)
        async def wrapper(*args, **kwargs): ...

    return wrapper

Minor sidenote, but it looks like both of your branches appear to be identical. Did you mean to omit the async keyword in one, or make some other change?

sirosen · April 13, 2022, 3:03am

I wasn’t really sure. async-sig seems very quiet, relative to the higher-traffic Ideas forum. Maybe that’s a positive reason to use async-sig? I’m happy to move this, if that’s possible on discourse.

Exactly that, thank you for the catch. I’ve adjusted the example to drop async in one branch.

CAM-Gerlach · April 13, 2022, 3:36am

Its up to you; if so, you can use the Edit (pencil) button next to your post title, and then change the category in the dropdown to the left. It should be possible for regular users on their own posts, but I can do it for you if you’d like, just in case its not.

guido · April 13, 2022, 4:17am

I think the reason this hasn’t been solved in its full generality is that there’s no perfect solution. Adding another keyword to the language (maybe_async) just isn’t in the cards.

Library and framework authors are usually best off having an opinionated convention aided by a decorator or metaclass fitted to the needs of the library or framework. (I believe I’ve seen a metaclass that looked for methods named async_spam and added a synchronous version named spam for each such.)

sirosen · April 13, 2022, 7:04pm

I’m not surprised; if it were as “easy” as adding a keyword, it would probably have been part of the original design of async. But if people are finding their own ways of achieving this today, is there any possibility of getting one of those conventions + helpers into the stdlib?

The following is as good a solution as any other if it works:

async def async_spam(): ...
spam = create_sync_variant(async_spam)

But I don’t know of a way to do that. Generating a sync variant from an async function is a good step, but it’s only part of the problem. If we have

async def async_spam():
    return await async_eggs()

and create_sync_variant renames and strips the await, we’d get

# the generated function from create_sync_variant(async_spam)
def spam():
    return async_eggs()  # <-- but we wanted eggs() !

I would be tremendously thankful for links to any existing tooling which does this, just for the purpose of learning. I’ve already looked at asgiref a bit, but it seems to mostly wrap calls in background threads.

Maybe there are not that many people who would benefit from the addition of create_sync_variant? It seems that a lot of people struggle when joining together async and sync code, but perhaps not in this particular way.

On the other topic, I’m not able to move this to async-sig. I get “You are not permitted to view the requested resource.” Perhaps I’m not allowed to post there?

guido · April 13, 2022, 8:18pm

I moved this thread to Async-SIG.

On the main problem, I feel like I have to repeat myself – you’re better off inventing your own solution that works right for the framework.

One trick I’ve seen is a decorator that takes an async function, and adds a function attribute (e.g. named ‘sync’) that is a wrapper that calls the async version and waits for the result. E.g.

def add_sync_version(func):
  assert asyncio.iscoroutine(func)
  def wrapper(*args, **kwds):
    return asyncio.new_event_loop().run(func, *args, **kwds)
  func.sync = wrapper
  return func

I haven’t tested this version and there are dangers associated with creating a new event loop for this purpose, but you get the idea.

guido · April 13, 2022, 8:32pm

I should probably add how this is used.

For the library developer, you just add @add_sync_version to those (public) async functions and methods for which you want to add a sync version. E.g.

@add_sync_version
async def spam(): ...

For the user of the library, if they want the async version they can just write

    x = await spam()

If they want the sync version they can write

    x = spam.sync()

sirosen · April 14, 2022, 12:31am

Thanks for moving this to async-sig!

I don’t want to give the impression that I’m not listening, and I apologize if I said anything to suggest that.
I’m trying to understand what the best way of handling this scenario is. And I’d like to codify that – perhaps in asyncio docs or somewhere else appropriate – so that anyone else trying to do similar things has that same best practice available.

If the solution were as simple as a 6 line function, there would be no reason not to add it to asyncio. So those hidden dangers are actually the hard part. Am I at least following the situation correctly up to this point?

If nothing else, we have to be concerned about the caller already having a running loop. asgiref’s AsyncToSync uses a background thread to run a separate loop and clocks in at around 200 LOC. Has that team found a safe and reliable workaround for most cases? The underlying question is: is it impossible to hope for AsyncToSync to make it into asyncio?

guido · April 14, 2022, 1:13am

Yeah, you’re following; the reality is messy, and that’s why we don’t want to put a solution in the stdlib – there are different compromises possible and you will have to choose based on the characteristics of your library. Indeed, you may have to offer a less than perfect solution and warn your users about possible downsides. Ultimately it’s better to wean your users off synchronous calls altogether.

blink1073 · April 14, 2022, 2:12pm

Thanks for pointing out asgiref @sirosen. I had come up with something similar while exploring how to support both sync and async classes for MongoDB and Jupyter Client: Wrap an Asynchronous Class · GitHub

sirosen · April 14, 2022, 3:03pm

I want to support sync and async usage without introducing potential fragility just to save me lines of source. I would have thought this shows up for all sorts of use-cases. If any HTTP-based client lib (elasticsearch comes to mind) wanted to support use of aiohttp, they face the same issue.

Maybe my case is unusually bad for sync vs async. I don’t think I can use the background thread strategy because one of the contexts in which I want the library to work is under uwsgi, where threading is often disabled. I could say “you need to set --enable-threads or make your application async”, but I don’t feel that I can justify that demand of users.

If the goal is to be “async capable” and play nicely with whatever stack users are already using then I don’t think it’s wise to make a previously synchronous library fully async.

However, it seems like the background thread strategy is a common one. At least, it has been invented independently twice! Is there room for documenting this strategy as part of the asyncio docs, in narrative doc like the logging howto?

hauntsaninja · April 19, 2022, 5:16am

You may also be interested in some of the writeup at Network protocols, sans I/O — Sans I/O 1.0.0 documentation

renejsum · April 20, 2022, 10:49am

Do you see a future where most (all) of Python moves towards async/await as a general model? Or is this just a recommendation to decide upfront whether to use plain python or the async python version?

guido · April 20, 2022, 2:31pm

I do not foresee such a future. The synchronous model is here to stay. Python is not JavaScript.

Async has a place, and sometimes you need to convert from sync APIs to async APIs for a particular scenario. But trying to offer both at the same time is fraught with difficulties and is at best seen as a transitional approach.

shanemcd · November 13, 2022, 7:45pm

Hello - I tried using this and had to change iscoroutine to iscoroutinefunction.

rtpg · September 12, 2024, 12:01am

Apologies for reviving this, but as someone who has been looking at Django’s async internals a lot recently trying to think of ways forward, I have a couple thoughts:

If we want sync programming to stick around, then it’s good for underlying libraries having ways to support sync and async, since otherwise they have to choose. Psycopg supported sync and async through a purpose-built script that generates a sync version of a lib from an async version
some of the non-codegen tricks like asgiref seem to impose a pretty real latency barrier (asgiref async_to_sync costs me about 3ms-5ms per call, though I could be doing something wrong). I imagine it’s possible to keep these costs low when you do a single jump from one domain to the other, but the cost is high enough to where if you have a deep call stack you really can’t jump between both worlds. Makes sense of course.

I think libraries want to offer sync and async APIs, but in practice it’s very hard to extract “the bit that is sync or async” from the rest of the logic just by writing code, at least not without really doing a number to code legibility.

Having said that, I think codegen feels sufficient for this problem, because the problem is something that is a problem for tricky internal libraries rather than top-level code. Probably the one “core Python” ask that would make this easier would be one of the following:

make it easier to do source code -> AST -> source code transformations that preserve things like comments (I think comments are the big missing piece there in ast).
Given the current philosophy around the standard library I don’t think this would happen but… I think there’s a place for something akin to the CPython’s argument clinic “extract blocks from this file, do some transformation, then write out some things” process (+ the hashing/staleness stuff). But in this day and age it feels like this would be an external package.

But at the very least with codegen it looks like there is a pattern that libraries can reach for. Even if an “ideal” thing (from my perspective) would be await some_async_function() or some_async_function.sync() being “transparently” cheap, if at worst you have a fixed number of things to tie together it’s not too tough.

guido · September 12, 2024, 12:49am

If it can be done using codegen that’s great – it can be a 3rd party tool.

Instead of asking the ast module to preserve comments (which is technically very complicated and unlikely to happen) I recommend looking at a different library – for example LibCST, which IIRC is built for this kind of thing. It supports Python 3.12 and I presume it will start supporting 3.13 once it’s out.

Liz · September 12, 2024, 4:46am

I think wsproto is a good example of what is possible. It’s just a state machine, bring your own networking. If you can contain the core logic to something usable by both a sync or async pipeline, maintaining both a sync and async user api is just wrapping the same core. It might still be useful to pair with codegen, but it might be easier to generate the sync and async paths if the core logic and state is decoupled.