Adding a C API for coroutines/awaitables

As of now, there isn’t really any good way to work with coroutines from extensions, or return a C function as awaitable. This makes working with anything that needs async really tedious, as you have to either write a wrapper Python function that does async for you or try and mess with asyncio from C.

An API for this could be added, possibly looking something like this:

static PyObject* module_do_something(PyObject* self, PyObject* args) {
    PyObject* awaitable;
    if (!PyArg_ParseTuple(args, "O", &awaitable))
        return NULL;

    PyObject* coro = PyAwaitable_NewCoro();
    PyObject* awaitable_coro = PyAwaitable_GetCoro(awaitable /* ... */);
    PyAwaitable_AddAwait(coro, awaitable_coro);

    return Py_NewRef(coro);

Ideally, this would do the following:

  • Build some magic “awaitable” object (PyAwaitable_NewCoro in my example)
  • Get the coroutine from the awaitable passed in (PyAwaitable_GetCoro)
  • Add the coroutine to be awaited by the event loop (PyAwaitable_AddAwait)
  • Return our magic awaitable, to be later awaited by the event loop

Well, the problem with such an API is that C doesn’t have syntactic support for suspending and then resuming a function partway through - you’d really need to write a class operating as a state machine. The other issue is that it’d be difficult to make this a generic API - asyncio is not the only event loop implementation being used, it’d be bad to make an API which privileges its decisions as the only ones.

1 Like

Yeah, that’s kinda what the “magic awaitable” would do. In my example, the idea is to automatically build a state machine at runtime.

I’m not sure I follow what your use case is or what module_do_something represents. I see that you are postulating three new APIs with a PyAwaitable_ prefix: NewCoro, GetCoro and AddAwait. But your description of what those do is lacking precision – there’s a bunch of magic, and something about an event loop.

Perhaps you can expand on your example by showing equivalent Python code and by explaining more what each of those APIs is supposed to do?

Also, I would guess that this hasn’t been requested before because in a typical use case the C code just does some synchronous computation and the async code is all in Python – what is it about the application you have in mind that doesn’t fit this model?

Have you actually written an application or library that would benefit from what you’re proposing, or are just just speculating?

The use case here is for really anything in C that needs to work with async Python code. Most of my code here was just a draft of what it could look like, not a definite working idea. Essentially, I was proposing what could be a C version of the following Python code:

async def module_do_something(awaitable: Callable[[], Awaitable[Any]]) -> None:
    coro = awaitable()
    await coro

I should have been clearer on what exactly the PyAwaitable_ methods did, that’s my bad. Here’s what I was thinking:

  • First, we generate a utility object that holds our awaits (PyAwaitable_NewCoro).
  • Next, we get the actual coroutine object from PyAwaitable_GetCoro. Ideally, we shouldn’t actually even need a new function for that, since calling it normally should return the coroutine anyway.
  • Then, the real magic comes in with PyAwaitable_AddAwait. I’m thinking this could store the coroutine in our utility for when module_do_something is awaited later. Then, once our C coroutine is awaited, our special utility will go through the coros added from PyAwaitable_AddAwait and can suspend properly. This does make it impossible to use values from the awaited coroutines, though, since it won’t be awaited until after the C function has exited. Perhaps there could be a callback function for when it’s ready?
  • The utility is then returned as a coroutine to be accessed from Python. I do realize now that we probably couldn’t just return it directly, since we want a whole new object to store our C state machine, and that object is not a coroutine. Instead, we could maybe just convert it to a coroutine through another method (with my imaginary API, it could be something like PyAwaitable_AsCoro) and then return that.

TL;DR PyAwaitable_NewCoro just builds a ready state-machine that can be used from Python, and PyAwaitable_AddAwait adds functions to be awaited by it later.

Now, the hope here is that these methods are magic to the developer, and don’t force them to try and understand the internals of async. I do realize that this may still be a bit difficult to understand, but I’ll work on some proof-of-concept code to try and demonstrate what this might look like.

Regarding your last question, I’m proposing this as I was writing a library that relies heavily on C extensions, but also needs to interop with async Python code (specifically ASGI), and found that trying to work with async from the C API was more complicated than it should be.

1 Like

I think we’re getting closer to the reason why such an API doesn’t exist and is likely going to be quite a bit more complicated than you thought.

Your Python code creates an async function that uses the await primitive. If you wanted literally that in C, you’d have to write the body of the async function as a separate “callback” function, so that the caller of module_do_something, when they call it, get a coroutine back, because the caller of an async function isn’t required to immediately await it – it may save it in a variable, do some other work (maybe create more coroutines), and then await it (or pass several coroutines to an operation like gather() or wait() which wraps the coroutines in tasks and awaits them collectively).

Moreover, when the body of your coroutine finally gets to run, for every await in that body you need another callback that gets executed once the I/O loop decides to resume it. Basically if you have

async def module_do_something():
    await foo()
    await bar()

you need one callback for the initial part, which calls setup() and foo() but doesn’t await the latter; another callback for middle() and bar(), and a third callback for final().

I’ll give you that your original example is simple enough that it really only requires a single callback (we can cheat a little and do the setup synchronously). You could also argue that since you’re not using the result from await there’s no need for a callback function. But it’s not very satisfactory to have a primitive that emulates await except that it doesn’t let you do anything with the return value from await: usually await has a return value and the code wants to do something with it.

So if you wanted to design a C API that provides access to the await primitive, at the very least it should handle this case:

async def something():
    x = await foo()
    return bar(x)

This could be a primitive that takes an awaitable (the result of foo() in the example) and a callback (representing return bar(x)). It can return another awaitable (not a coroutine but something that can be awaited using await).

I’m not sure why it should be a “state machine” – that’s a pretty general word that in my mind doesn’t mean much more than “object”.

I’ll leave the next step to you (I’m personally not at all convinced that such an API is actually needed. Maybe you can elaborate a bit more on the complications you encountered when interactive with async from C.

1 Like

Ok, I see your point. I do see how this can get complicated pretty quickly, but I still believe that having some API for this is better than having nothing at all. I came up with an API that looks like this:

static int module_something_callback(PyObject* awaitable, PyObject* result) {
    int a;
    int b;
    PyObject* bar;

    PyAwaitable_Unpack(awaitable, &a, &b, NULL, &bar);
    PyObject* result = PyObject_Call(bar, PyLong_FromLong(PyLong_AsLong(result) + (a + b)), NULL);

    PyAwaitable_SetResult(awaitable, result);

static PyObject* module_something(PyObject* self, PyObject* args) {
    PyObject* awaitable = PyAwaitable_New();
    int a;
    int b;
    PyObject* foo;
    PyObject* bar;
    if (!PyAwaitable_ParseTuple(awaitable, args, "iiOO", &a, &b, &foo, &bar))
        return NULL;

    PyObject* foo_coro = PyObject_Call(foo, PyTuple_New(0), NULL);
    // lets pretend that PyAwaitable_AddAwait checks that its a coroutine, just for this example
    PyAwaitable_AddAwait(awaitable, foo_coro, module_something_callback);
    return PyAwaitable_AsCoroutine(awaitable);

The above would be equivalent to the following Python code:

async def something(a: int, b: int, foo: Callable[[], Awaitable[int]], bar: Callable[[int], int]):
    foo_coro = foo()
    result = await foo
    return bar(result + (a + b))
  • Starting in module_something, we build our “magic awaitable” through PyAwaitable_New. Then, we initalize our variables for PyAwaitable_ParseTuple. The idea with that is to do the same job as PyArg_ParseTuple, but also store the parsed arguments on the awaitable object to be used in callbacks later (PyAwaitable_Unpack).
  • Next, I changed the original idea of PyAwaitable_GetCoro to a simple PyObject_Call, since all we are doing is calling the object.
  • Now, with PyAwaitable_AddAwait, it’s essentially just adding the coroutine object to an array on the object to be used later, as well as store the callback. More on this later.
  • Finally, we return our magic awaitable as some sort of coroutine. At this point, all awaits have been registered to our awaitable and will be managed by it.
  • In module_something_callback, I stated above that the arguments parsed by PyAwaitable_ParseTuple would be stored on the object. Ideally, this would be how you access them, passing NULL for unneeded parameters.
  • We take the result of foo (result parameter), and then add it with parameters a and b.
  • Then, we take the result of bar, and set it to the “result” of our coroutine. PyAwaitable_RETURN_SET_RESULT would be a macro that just tells our awaitable that a return value was set inside the callback.

Regarding the “state machine”, that’s what Python coroutines are, right? From some research I did before , I thought coroutines were just generators with a state value that keeps track of it.

I came up with an object model that looks like this:

typedef int (*awaitcallback)(PyObject*, PyObject*);

typedef struct {
    PyObject* coro; // python coroutine object
    awaitcallback callback; // actual callback function
} AwaitableCallback;

typedef struct _Awaitable {
    AwaitableCallback** callbacks; // array containing callbacks and their coroutines
    Py_ssize_t callback_size; // size of the array above
    int state; // current state of the coro
    PyObject* result; // final return value
    void** tuple; // arguments saved by PyAwaitable_ParseTuple
    Py_ssize_t tuple_size; // size of the array above
} Awaitable;

I do realize that this API isn’t exactly perfect, but more just a start of what could be something useful. I’m still looking for some feedback on what needs to be improved and/or fixed here.

In terms of why this is needed, take a look at ASGI as an example:

async def app(scope, receive, send):
    await send(...)

In my case, I would like to write my app function in C, but I need to await send, so I can only mix C in with the actual function.

Also, when using C, a lot of the time you end up performing operations that block I/O, so we would want something like asyncio.to_thread, which is completely ok, but may be inconvenient for some cases, and in my opinion having to access a module just to use normal Python syntax seems a little bit tedious.

Technically, asyncio.to_thread will actually slow down your code as well. A big reason people use the C API is to well, make their code faster. I get that we shouldn’t really care about performance in Python, but it’s always nice when we can make it a little bit faster.

Above all, I just think it’s a little silly to have to access Python code to use async/await from C.

Have you profiled code and seen that calling out to python appears to cause significant overhead? Could you share an example?

Gut feel (though I have worked through this problem in the past), the best thing to do here is to go and work on pybind11’s projection of C++20 coroutines: [FEAT] def_async · Issue #2658 · pybind/pybind11 · GitHub

C doesn’t natively have any coroutine support, which is what would be needed to make this at all usable. So you’re going to end up with a very complicated class with one function for each step and the boilerplate in between to handle the “next step” calls.[1] The macro gymnastics required to make this a “nice” experience for developers would be completely spoiled by the macro gymnastics :wink: Almost certainly users would rather write their own synchronous functions and wrap them up in a Python (or better yet, Cython) coroutine that calls them as needed.

However, recent C++ does have native coroutines. For users who want to write those, they’ll have a good experience (I assume - not having used them myself), and pybind11 support would mean they could be made available easily in Python. Much easier than trying to somehow add coroutines into C.

I’ll also note that there should be nothing stopping the hypothetical macro gymnastics being implemented as a totally separate project. It’s probably not clearly documented in the C API, but all the entry points needed to implement an awaitable value exist, and as others have pointed out, an “async function” is really just one that returns an awaitable result.

  1. This is the “state machine” mentioned earlier. ↩︎

Ok, I think this should be stated somewhere in the documentation then. Right now, the only resources I’ve had on this issue is this thread and some old stack overflow posts.

Can you propose a documentation update? Where would you have been looking for this information, and what would it have said?

“C does not have native coroutine support” isn’t really our thing to document.

I think there just should be something regarding async and the C API. Something saying “you can’t do coroutines from the C API, just wrap it with a Python function” should do.

Rather of an array of callbacks, why not use a single function with a switch inside?

I think that would be more of a proposal for a new generator API as opposed to async. Granted async is just generators, but still I’m not too sure if it would fit here.

Apologies for taking a few days on this, here’s an (unfinished) implementation of this:

As of now, it can be accessed with the following C code:

static int
awaitable_callback(PyObject *awaitable, PyObject *result) {
    puts("hello, world!");
    PyObject_Print(result, stdout, Py_PRINT_RAW);
    if (PyAwaitable_SetResult(awaitable, PyLong_FromLong(10)) < 0) return -1;
    return 0;

static PyObject *
awaitable_add_await(PyObject *self, PyObject *args) {
    PyObject *awaitable;
    PyObject *coro;
    if (!PyArg_ParseTuple(args, "O!O", &PyAwaitable_Type, &awaitable, &coro)) return NULL;

    if (PyCoro_CheckExact(coro) < 0) {
        PyErr_SetString(PyExc_TypeError, "argument 2 must be a coroutine");
        return NULL;

    PyAwaitable_AddAwait(awaitable, coro, awaitable_callback);

Then, awaitable_add_await may be used from Python like a normal coroutine:

async def _proxy():
    return 42

class TestAwaitable(unittest.TestCase):
    def test_add_await(self):
        coro = _testcapi.awaitable_new()
        _testcapi.awaitable_add_await(coro, _proxy())

        async def transport():
            result = await coro
            print("result:", result)

This then outputs:

hello, world!
result: 10

Following up here, what do you guys think? I think this API would probably fit more into CPython directly opposed to being just implemented in a project, as my earlier idea of PyAwaitable_ParseTuple isn’t really possible without modifying the source code of getargs.c