What are the advantages of asyncio over threads?

akorshkov · August 9, 2019, 1:55pm

The purpose of both async methods and threads is to make it possible to process several tasks concurrently.

Threads approach looks simple and intuitive. If (f.e. python) program processes several tasks concurrently we have a thread (may be with sub-threads) for each task, the stack of each thread reflects the current stage of processing of corresponding task. It is operating system who manages the threads and theirs call stacks.

Asyncio approach is not very much different. Instead of actual threads we have coroutines, so it is coroutine stack which reflects the current stage of processing of corresponding task. No os threads are involved, and the call stacks are python objects managed by python code (event loop).

I can see several inconveniences with async approach. I’ll name couple of them only. We have now two types of methods: usual methods and async methods. 90% of the time the only difference is that you need to remember that this method is async and do not forget to use await keyword when calling this method. And yes, you can’t call async method from normal ones.

I agree, that writing multi-threaded code is difficult and error prone. But using asyncio approch does not make writing code any easier (at least for me). Well, asyncio multitasking is cooperative multitasking, so I can be sure that my code will not be interrupted between await calls (which, by the way, may be disguised under async for, async with, etc.). But as application programmer I have never took advantage of this feature. For me await something always means “the function I am calling is async, so I have to use await keyword here”. Situations when concurrent tasks can conflict accessing some python resource are very rare (according to my experience) and I am totally ok to use mutexes in these situations. Much more often tasks have to synchronize access to some external resource (database records or files), and asyncio approach does not help in these cases.

I understand that asyncio approach allows to process much more concurrent tasks than threads approach does. And I suppose that the reason is that using os threads is expensive.

So my question: is the only reason for using asyncio that os threads are expensive? I can see no other advantages of asyncio over threads.

If there are no other advantages, wouldn’t it be better to decouple python threads from os-threads? Whenever python code starts new thread no new os-thread would be created, just a python structure (let’s call it pythread). Python interpreter or some standard python library would manage/schedule these pythreads.

Sorry if this post is offtopic here. It is more a question than a suggestion. I really hope that there are good reasons why asyncio approach was chosen and want to understand these reasons. Before writing here I have read lot’s of articles but found no answers. I’ve posted similar question to stackoverflow quite a long time ago but all I’ve got was several upvotes. Discussions with other developers usually end up with my explanations that x = await some_method() does not start parallel processing of the method.

ammaraskar · August 9, 2019, 4:21pm

For what its worth, these are usually called green threads and you can find plenty of Python libraries that implement them.

brettcannon · August 9, 2019, 5:33pm

I think the key thing is that for me personally, you reversed what you’re talking about with those statements. I find all the locking in threading a pain and reasoning about race conditions a pain while reasoning with async code since it’s inherently single threaded much easier to work with.

And that’s why I personally find async better: you get concurrency while reasoning in a single threaded context.

akorshkov · August 9, 2019, 5:44pm

Well, I know. But still some very popular frameworks use asyncio and I want to understand why is it so.

akorshkov · August 9, 2019, 5:58pm

Ironically, the very first day I started to work in a project using asyncio I investigated a deadlock problem. Deadlock happened on acqiring some external resources. So no, it’s ok to reason in a single threaded context.

It’s very rarely that I encountered situations when concurrent tasks fight for some common python object(s). And as I mentioned I am totally ok to pay attention and guard such pieces of code with mutexes. And I do not think the projects I am working with are uncommon: service, database, processing incoming requests.

njs · August 9, 2019, 8:42pm

There’s no “etc” here :-). There are exactly three syntactic forms that can do an await, and those are await, async for, and async with.

Anyway, in Python, the three fundamental advantages of async/await over threads are:

Cooperative multi-tasking is much lighter-weight than OS threads, so you can reasonably have millions of concurrent tasks, versus maybe a dozen or two threads at best.
Using await makes visible where the schedule points are. This has two advantages:
- It makes it easier to reason about data races
- A downside of cooperative multi-tasking is that if a task doesn’t yield then it can accidentally block all other tasks from running; if schedule points were invisible it would be more difficult to debug this issue.
Tasks can support cancellation. (Which also benefits from making await visible, because it makes it possible to reason about which points are cancellation points.)

You might not find these reasons compelling for your particular situation. That’s fine. You can still use threads if those are more appropriate. If you only want lighter-weight cooperative multitasking and don’t care about the other issues, then you can still use gevent. But those are the reasons that async/await works the way it does.

akorshkov · August 10, 2019, 9:07am

You mean there is still no aync if? Interesting

But ok, you mentioned three fundamental advantages:

I know very little about green threads implementation(s) so I want to clarify: can more or less same level of performance be achieved green threads approach?

I guess this is the main argument. Sorry, I do not understand how a downside of cooperative multi-tasking suddenly became an advantage. But I somewhat agree that it can be easier to reason data races when the points where your code can be interrupted are clearly visible.

Isn’t it possible to cancel tasks with green-threads approach?

njs · August 11, 2019, 12:30am

Yes.

If you’re doing cooperative multi-tasking (either via async/await, or via green threads), then this downside is something you have to deal with. If on top of that, you decide to use explicit syntax to mark schedule points, then that helps you deal with it.

Yes, though it’s harder to keep track of the points where cancellation can happen.

akorshkov · August 11, 2019, 3:04pm

So the summary.

The major advantage of asyncio approach vs green-threads approach is that with asyncio we have cooperative multitasking and places in code where some task can yield control are clearly indicated by await, async with and async for. It makes it easier to reason about common concurrency problem of data races.

It is very important, that concurrency problems are not gone completely. You can not simply ignore other concurrent tasks. With asyncio approach it is much a less common situation when you need to use mutexes to guard “critical sections” of your code. But you should understand that every await call breaks your critical section.

To disadvantages now.

asyncio multitasking is a cooperative multitasking.

Now we have two types of functions: usual and async. This is a very big feature of a language. I would say it is quite a strange feature because it is almost impossible to use it without a carefully crafted library. (Give usual python programmer an async function and ask him to execute it without any libraries - I bet most of them would need to read documentation to do it).

This feature makes language more complicated. Many library decorators now have to check what type of function is being decorated: if it is usual function let’s do this, but if it is an async function let’s do something different.

As I mentioned previously, it is not possible to call async functions from usual functions. Looks like this is a major problem. I’ve seen at least one proposal to implement workarounds to this problem, but of course any workaround will break the only advantage of asyncio approach. You will not be sure any more that your code will not switch to another task somewhere between awaits.

If you have any library with async interface you simply can’t use this library unless your application uses some asyncio framework. At least I do not know any simple way to do it. Should Sqlalchemy provide async interface? Probably not, because than they would need to provide two interfaces or force people to switch to asyncio frameworks. But what if some database requests are heavy? My coroutine would stuck waiting for results blocking all other tasks (because multitasking is cooperative). So I have to run some requests handlers in separate threads and the only potential advantage of asyncio approach disappears.

And you have to use await and async keywords all over your code even when this particular piece of code is completely concurrecncy-safe.

I can see many problems with asyncio approach and when I started this thread I sincerely hoped that there are some advantages I do not know about. Looks like there are not. One advantage, which is in my opinion not worth all the problems. Just curious, does anyone else share this opinion?

Sorry if my post was not very calm and for not very smooth english.

njs · August 11, 2019, 8:19pm

Cooperative multitasking always requires rewriting all your libraries in some way, because regular libraries don’t include scheduler yields and aren’t prepared to handle cancellation. (Cancellation is a tricky feature, because it’s very useful, but it only works reliably if every library author is thinking about it all the time.)

Gevent has this problem just as much as async/await libraries like asyncio. The difference is that since gevent doesn’t require different function signatures, they have another option instead of making new libraries: they can try to monkey-patch existing libraries to convert them to cooperative multitasking in-place.

This definitely has some advantages, but also a lot of disadvantages: monkey-patching is inherently fragile, it’s hard to know which libraries will work with monkey-patching and which won’t, you’re using a configuration that the library authors probably haven’t tested, and it’s random luck how well the libraries handle unexpected events like cancellation.

I’m not saying you’re wrong and async/await is always 100% the best. I’m saying the tradeoffs are complicated, and a lot of people have good reasons for deciding that async/await is the best option for their particular situation. Even before asyncio and async/await, the twisted and tornado libraries were very popular, and their APIs were much more difficult to work with than modern asyncio, because they didn’t have any help from the language and had to do everything with callbacks.

If you think that for your particular situation that threads or gevent are better, then that’s great, they still exist and lots of people still choose to use them.

This is a minor point, but it’s a common misconception so worth pointing out: trying to automatically detect whether a function is sync or async it’s almost always a bad idea, because it’s very difficult to do reliably. Instead it’s almost always better to make the user say explicitly which one they mean, for example by having two versions of a decorator and telling the user to use @mydecorator_sync on sync functions and @mydecorator_async on async functions.

akorshkov · August 12, 2019, 7:27am

This is very interesting! Could you please give an example (may be simplified) of some API improvement that became possible with asyncio approach? I sincerely hope that this would be interesting and useful not only for me.

njs · August 12, 2019, 7:42am

It’ll be pretty obvious if you read any tutorials on old-school Twisted or Tornado, because all control flow has to be expressed by chains of callbacks, instead of using Python’s regular control flow constructs.

One example is at the beginning of this long post about async API design: https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/

Specifically, compare the traditional asyncio code (“Example 1”) to the async/await-based asyncio code (“Example 3”).

akorshkov · August 13, 2019, 8:18pm

This is a very nice article. I want my code to look like an “Example 2”, not like “Example 3” and not at all like “Example 1”. But I do not understand why not is it possible to write a code like in “Example 2” using threads approach. The only advantage of of async/await approach that I understand (one can be sure that code is not interrupted between awaits) does not help at all here.

Ideally (“green”)thread-bases code would look almost exactly like “Example 2” with one major difference: all async and await keywords are gone. Whenever some blocking operation is executed (such as source_sock.recv or source_sock.sendall) - corresponding python thread just blocks. It could be python executable itself who understands if python thread is waiting for some particular io operation and schedules the thread when possible, or if it has to be library - some old good IPC producer-consumer mechanism could be used. As I understand async frameworks do something like this already, the “IPC mechanism” is the send method.

Is “Example 2” code is much better because it uses async-await approach, or may be because it uses better library?

One more minor question. Are you sure the server can’t accept two incoming connections? Somewhere between the first connection is received and the await main_task.cancel() in line 13 is actually processed? I have to admit that a little more code would be required to enforce this requirement using threads approach.

njs · August 13, 2019, 10:19pm

Cool, that was the point of the article, so I’m glad it worked

You could. In Curio, and in my newer library Trio, all the APIs could work with a green thread system and just deleting all instances of async and await. One of the main reasons I got frustrated with Curio though was that it uses await somewhat idiosyncratically, and it doesn’t always mark schedule/cancel points, and I was really struggling to write correct code without race conditions or starvation problems.

The reason I brought up that article was to point out: there are a lot of people who find the pure (green) threads approach so difficult that back when their only options were (green) threads or “Example 1”-style callback chains, then they chose the callback chains. Twisted and Tornado and asyncio wouldn’t even exist if there weren’t people who wanted this enough to spend huge amounts of energy making it happen. I don’t know what you’re doing; maybe gevent is the best solution for your problems! But it seems unlikely to me that it’s the best solution for everyone’s problems. It’s more likely that they have a different experience or problems than you.

I’m not sure, maybe. It’s not really possible to guarantee that you only accept one connection because of limitations of how TCP stacks work inside operating systems, and it’s not really important to support in real applications or relevant to the concepts I was talking about in the article, so I didn’t spend a lot of time thinking about it.

aeros · September 15, 2019, 12:04am

It’s worth clarifying that await/async expressions are not at all tied directly to the asyncio module. The await/async expressions effectively act as their own separate API. The asyncio module is dependent on the await/async expressions, but not the other way around. As far as I’m aware, this was done intentionally to allow different approaches, such as curio.

If anything, I would consider this to be an advantage. This was done very intentionally so that generators not designed asynchronously were not mistakenly used as such. Attempting to utilize multiple points of exit and entry on something that was designed to have a single exit point would almost certainly lead to issues.

I’m not certain that I understand why this is considered a disadvantage. Asynchronous programming can be quite complex, and we fully expect for users to read the documentation. Certainly it shouldn’t be more complicated than it has to be. But, designing an API without expecting users to read to the documentation would lead to severe limitations.

I’ve yet to see an asynchronous implementation where you can interchangeably use subroutines and coroutines without unpredictable behavior, lack of thread safety, or other significant issues. This “only advantage” is quite a strong one.

Also, that being the “only advantage of asyncio” is highly subjective. To many users, asyncio provides a significantly easier to utilize implementation of asynchronous programming compared to other approaches. Especially with the more recently implemented API using asyncio.run().

That seems a bit needlessly dismissive of the massive amount of work that the active developers of asyncio have poured into the module, such as @asvetlov and @yselivanov. I’ve recently worked on asyncio myself, but it doesn’t scratch the surface of the efforts that several others have made.

It’s perfectly okay if asyncio doesn’t suit your preferences or needs, but that does not mean there is only “one advantage” to using an entire module. From my understanding, it seems more so that the other advantages are just not your main priorities. This could be more considerately phrased as something along the lines of “For my purposes, the advantages of asyncio don’t outweigh the disadvantages”.

akorshkov · September 21, 2019, 8:57am

I am an application developer, not core or framework developer. I am using features provided to me by async/await-based framework and frankly speaking quite happy about it.

But sometimes I try to imagine what my application code would be if the framework I am using was based not on async/await, but on threads (or some kind of “green threads”). As far as I can see the application code will remain pretty much the same with several differences:

I will have to watch out for race conditions more closely. With async approach I can be sure my code would not be interrupted anywhere between awaits, with executions cat switch between concurrent threads at any moment
syntax of spawning a new task will be slightly different
all the async/await keywords are gone
as a result there is no restriction that one can’t call async function from usual one

The first item of the list is definitley a disadvantage of threads approach. But I have not seen a single real-life example when this feature of async/await approach helped to deal with concurrency-related problems.

Last two items are a huge advantage of threads approach. As far as I can see. Chances are I just can’t see far enough. So I am trying to understand what other advantages async/await approach has. I do appreciate the huge amount of work invested in async/await functionality, but this is not an argument in “async vs threads” discussion.

njs · September 21, 2019, 10:17am

This is an interesting bug that caused a bunch of different mysql libraries to return incorrect results when used with gevent: https://github.com/PyMySQL/PyMySQL/issues/275

The root cause was cancellation: in gevent, green threads can be cancelled when they block on network operations, and these libraries weren’t written with that possibility in mind, so it caused corruption of internal state. One query was returning the results of another, etc. So it’s an example of how you can’t just drop in a green threads library and expect existing code to work correctly, and why it’s useful to be able to see cancellation points when reviewing code.

aeros · September 21, 2019, 10:25pm

This seems to be a commonly occurring theme when subroutines (designed to have a single point of entry and exit) are attempted to be used as coroutines (designed to have blocking/suspension and cancellation). That’s a large part of why the restriction is in place.

As far as I’m aware, there’s no practical way to safely use a subroutine (such as a standard function or method) as a coroutine (such as an async function or method) without causing significant issues. Subroutines and coroutines have fundamental design differences, and even if async is removed from the declaration, anything that properly supports concurrency should be designed or modified with it being a consideration.

aeros · September 21, 2019, 10:37pm

Also, I’m glad that you’re happy with the features. The questions you’re asking aren’t at all unreasonable. I just wanted to make sure the discussion remained constructive and the amount of work placed into it wasn’t forgotten. It’s easy to forget that there are real people behind it when criticizing a framework (or an entire language in some cases). Apologies if I misunderstood you. (:

akorshkov · September 23, 2019, 5:56pm

Thank you and Nathaniel for your attempts. But I have to confess that I do not understand the arguments in the last three messages. It’s not your fault, it’s my problem. As I mentioned I am application developer and at the moment I do not quite understand the problems core developers and framework developers have to solve. I will just trust you that async/await approach helps to deal with these more low-level tasks.