What are the advantages of asyncio over threads?

The purpose of both async methods and threads is to make it possible to process several tasks concurrently.

Threads approach looks simple and intuitive. If (f.e. python) program processes several tasks concurrently we have a thread (may be with sub-threads) for each task, the stack of each thread reflects the current stage of processing of corresponding task. It is operating system who manages the threads and theirs call stacks.

Asyncio approach is not very much different. Instead of actual threads we have coroutines, so it is coroutine stack which reflects the current stage of processing of corresponding task. No os threads are involved, and the call stacks are python objects managed by python code (event loop).

I can see several inconveniences with async approach. I’ll name couple of them only. We have now two types of methods: usual methods and async methods. 90% of the time the only difference is that you need to remember that this method is async and do not forget to use await keyword when calling this method. And yes, you can’t call async method from normal ones.

I agree, that writing multi-threaded code is difficult and error prone. But using asyncio approch does not make writing code any easier (at least for me). Well, asyncio multitasking is cooperative multitasking, so I can be sure that my code will not be interrupted between await calls (which, by the way, may be disguised under async for, async with, etc.). But as application programmer I have never took advantage of this feature. For me await something always means “the function I am calling is async, so I have to use await keyword here”. Situations when concurrent tasks can conflict accessing some python resource are very rare (according to my experience) and I am totally ok to use mutexes in these situations. Much more often tasks have to synchronize access to some external resource (database records or files), and asyncio approach does not help in these cases.

I understand that asyncio approach allows to process much more concurrent tasks than threads approach does. And I suppose that the reason is that using os threads is expensive.

So my question: is the only reason for using asyncio that os threads are expensive? I can see no other advantages of asyncio over threads.

If there are no other advantages, wouldn’t it be better to decouple python threads from os-threads? Whenever python code starts new thread no new os-thread would be created, just a python structure (let’s call it pythread). Python interpreter or some standard python library would manage/schedule these pythreads.

Sorry if this post is offtopic here. It is more a question than a suggestion. I really hope that there are good reasons why asyncio approach was chosen and want to understand these reasons. Before writing here I have read lot’s of articles but found no answers. I’ve posted similar question to stackoverflow quite a long time ago but all I’ve got was several upvotes. Discussions with other developers usually end up with my explanations that x = await some_method() does not start parallel processing of the method.

For what its worth, these are usually called green threads and you can find plenty of Python libraries that implement them.

I think the key thing is that for me personally, you reversed what you’re talking about with those statements. :wink: I find all the locking in threading a pain and reasoning about race conditions a pain while reasoning with async code since it’s inherently single threaded much easier to work with.

And that’s why I personally find async better: you get concurrency while reasoning in a single threaded context.

1 Like

Well, I know. But still some very popular frameworks use asyncio and I want to understand why is it so.

Ironically, the very first day I started to work in a project using asyncio I investigated a deadlock problem. Deadlock happened on acqiring some external resources. So no, it’s ok to reason in a single threaded context.

It’s very rarely that I encountered situations when concurrent tasks fight for some common python object(s). And as I mentioned I am totally ok to pay attention and guard such pieces of code with mutexes. And I do not think the projects I am working with are uncommon: service, database, processing incoming requests.

There’s no “etc” here :-). There are exactly three syntactic forms that can do an await, and those are await, async for, and async with.

Anyway, in Python, the three fundamental advantages of async/await over threads are:

  • Cooperative multi-tasking is much lighter-weight than OS threads, so you can reasonably have millions of concurrent tasks, versus maybe a dozen or two threads at best.
  • Using await makes visible where the schedule points are. This has two advantages:
    • It makes it easier to reason about data races
    • A downside of cooperative multi-tasking is that if a task doesn’t yield then it can accidentally block all other tasks from running; if schedule points were invisible it would be more difficult to debug this issue.
  • Tasks can support cancellation. (Which also benefits from making await visible, because it makes it possible to reason about which points are cancellation points.)

You might not find these reasons compelling for your particular situation. That’s fine. You can still use threads if those are more appropriate. If you only want lighter-weight cooperative multitasking and don’t care about the other issues, then you can still use gevent. But those are the reasons that async/await works the way it does.

2 Likes

You mean there is still no aync if? Interesting :slight_smile:

But ok, you mentioned three fundamental advantages:

I know very little about green threads implementation(s) so I want to clarify: can more or less same level of performance be achieved green threads approach?

I guess this is the main argument. Sorry, I do not understand how a downside of cooperative multi-tasking suddenly became an advantage. But I somewhat agree that it can be easier to reason data races when the points where your code can be interrupted are clearly visible.

Isn’t it possible to cancel tasks with green-threads approach?

Yes.

If you’re doing cooperative multi-tasking (either via async/await, or via green threads), then this downside is something you have to deal with. If on top of that, you decide to use explicit syntax to mark schedule points, then that helps you deal with it.

Yes, though it’s harder to keep track of the points where cancellation can happen.

So the summary.

The major advantage of asyncio approach vs green-threads approach is that with asyncio we have cooperative multitasking and places in code where some task can yield control are clearly indicated by await, async with and async for. It makes it easier to reason about common concurrency problem of data races.

It is very important, that concurrency problems are not gone completely. You can not simply ignore other concurrent tasks. With asyncio approach it is much a less common situation when you need to use mutexes to guard “critical sections” of your code. But you should understand that every await call breaks your critical section.

To disadvantages now.

asyncio multitasking is a cooperative multitasking.

Now we have two types of functions: usual and async. This is a very big feature of a language. I would say it is quite a strange feature because it is almost impossible to use it without a carefully crafted library. (Give usual python programmer an async function and ask him to execute it without any libraries - I bet most of them would need to read documentation to do it).

This feature makes language more complicated. Many library decorators now have to check what type of function is being decorated: if it is usual function let’s do this, but if it is an async function let’s do something different.

As I mentioned previously, it is not possible to call async functions from usual functions. Looks like this is a major problem. I’ve seen at least one proposal to implement workarounds to this problem, but of course any workaround will break the only advantage of asyncio approach. You will not be sure any more that your code will not switch to another task somewhere between awaits.

If you have any library with async interface you simply can’t use this library unless your application uses some asyncio framework. At least I do not know any simple way to do it. Should Sqlalchemy provide async interface? Probably not, because than they would need to provide two interfaces or force people to switch to asyncio frameworks. But what if some database requests are heavy? My coroutine would stuck waiting for results blocking all other tasks (because multitasking is cooperative). So I have to run some requests handlers in separate threads and the only potential advantage of asyncio approach disappears.

And you have to use await and async keywords all over your code even when this particular piece of code is completely concurrecncy-safe.

I can see many problems with asyncio approach and when I started this thread I sincerely hoped that there are some advantages I do not know about. Looks like there are not. One advantage, which is in my opinion not worth all the problems. Just curious, does anyone else share this opinion?

Sorry if my post was not very calm and for not very smooth english.

Cooperative multitasking always requires rewriting all your libraries in some way, because regular libraries don’t include scheduler yields and aren’t prepared to handle cancellation. (Cancellation is a tricky feature, because it’s very useful, but it only works reliably if every library author is thinking about it all the time.)

Gevent has this problem just as much as async/await libraries like asyncio. The difference is that since gevent doesn’t require different function signatures, they have another option instead of making new libraries: they can try to monkey-patch existing libraries to convert them to cooperative multitasking in-place.

This definitely has some advantages, but also a lot of disadvantages: monkey-patching is inherently fragile, it’s hard to know which libraries will work with monkey-patching and which won’t, you’re using a configuration that the library authors probably haven’t tested, and it’s random luck how well the libraries handle unexpected events like cancellation.

I’m not saying you’re wrong and async/await is always 100% the best. I’m saying the tradeoffs are complicated, and a lot of people have good reasons for deciding that async/await is the best option for their particular situation. Even before asyncio and async/await, the twisted and tornado libraries were very popular, and their APIs were much more difficult to work with than modern asyncio, because they didn’t have any help from the language and had to do everything with callbacks.

If you think that for your particular situation that threads or gevent are better, then that’s great, they still exist and lots of people still choose to use them.

This is a minor point, but it’s a common misconception so worth pointing out: trying to automatically detect whether a function is sync or async it’s almost always a bad idea, because it’s very difficult to do reliably. Instead it’s almost always better to make the user say explicitly which one they mean, for example by having two versions of a decorator and telling the user to use @mydecorator_sync on sync functions and @mydecorator_async on async functions.

This is very interesting! Could you please give an example (may be simplified) of some API improvement that became possible with asyncio approach? I sincerely hope that this would be interesting and useful not only for me.

It’ll be pretty obvious if you read any tutorials on old-school Twisted or Tornado, because all control flow has to be expressed by chains of callbacks, instead of using Python’s regular control flow constructs.

One example is at the beginning of this long post about async API design: https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/

Specifically, compare the traditional asyncio code (“Example 1”) to the async/await-based asyncio code (“Example 3”).

This is a very nice article. I want my code to look like an “Example 2”, not like “Example 3” and not at all like “Example 1”. But I do not understand why not is it possible to write a code like in “Example 2” using threads approach. The only advantage of of async/await approach that I understand (one can be sure that code is not interrupted between awaits) does not help at all here.

Ideally (“green”)thread-bases code would look almost exactly like “Example 2” with one major difference: all async and await keywords are gone. Whenever some blocking operation is executed (such as source_sock.recv or source_sock.sendall) - corresponding python thread just blocks. It could be python executable itself who understands if python thread is waiting for some particular io operation and schedules the thread when possible, or if it has to be library - some old good IPC producer-consumer mechanism could be used. As I understand async frameworks do something like this already, the “IPC mechanism” is the send method.

Is “Example 2” code is much better because it uses async-await approach, or may be because it uses better library?

One more minor question. Are you sure the server can’t accept two incoming connections? Somewhere between the first connection is received and the await main_task.cancel() in line 13 is actually processed? I have to admit that a little more code would be required to enforce this requirement using threads approach.

Cool, that was the point of the article, so I’m glad it worked :slight_smile:

You could. In Curio, and in my newer library Trio, all the APIs could work with a green thread system and just deleting all instances of async and await. One of the main reasons I got frustrated with Curio though was that it uses await somewhat idiosyncratically, and it doesn’t always mark schedule/cancel points, and I was really struggling to write correct code without race conditions or starvation problems.

The reason I brought up that article was to point out: there are a lot of people who find the pure (green) threads approach so difficult that back when their only options were (green) threads or “Example 1”-style callback chains, then they chose the callback chains. Twisted and Tornado and asyncio wouldn’t even exist if there weren’t people who wanted this enough to spend huge amounts of energy making it happen. I don’t know what you’re doing; maybe gevent is the best solution for your problems! But it seems unlikely to me that it’s the best solution for everyone’s problems. It’s more likely that they have a different experience or problems than you.

I’m not sure, maybe. It’s not really possible to guarantee that you only accept one connection because of limitations of how TCP stacks work inside operating systems, and it’s not really important to support in real applications or relevant to the concepts I was talking about in the article, so I didn’t spend a lot of time thinking about it.