What are the advantages of asyncio over threads?

So the summary.

The major advantage of asyncio approach vs green-threads approach is that with asyncio we have cooperative multitasking and places in code where some task can yield control are clearly indicated by await, async with and async for. It makes it easier to reason about common concurrency problem of data races.

It is very important, that concurrency problems are not gone completely. You can not simply ignore other concurrent tasks. With asyncio approach it is much a less common situation when you need to use mutexes to guard “critical sections” of your code. But you should understand that every await call breaks your critical section.

To disadvantages now.

asyncio multitasking is a cooperative multitasking.

Now we have two types of functions: usual and async. This is a very big feature of a language. I would say it is quite a strange feature because it is almost impossible to use it without a carefully crafted library. (Give usual python programmer an async function and ask him to execute it without any libraries - I bet most of them would need to read documentation to do it).

This feature makes language more complicated. Many library decorators now have to check what type of function is being decorated: if it is usual function let’s do this, but if it is an async function let’s do something different.

As I mentioned previously, it is not possible to call async functions from usual functions. Looks like this is a major problem. I’ve seen at least one proposal to implement workarounds to this problem, but of course any workaround will break the only advantage of asyncio approach. You will not be sure any more that your code will not switch to another task somewhere between awaits.

If you have any library with async interface you simply can’t use this library unless your application uses some asyncio framework. At least I do not know any simple way to do it. Should Sqlalchemy provide async interface? Probably not, because than they would need to provide two interfaces or force people to switch to asyncio frameworks. But what if some database requests are heavy? My coroutine would stuck waiting for results blocking all other tasks (because multitasking is cooperative). So I have to run some requests handlers in separate threads and the only potential advantage of asyncio approach disappears.

And you have to use await and async keywords all over your code even when this particular piece of code is completely concurrecncy-safe.

I can see many problems with asyncio approach and when I started this thread I sincerely hoped that there are some advantages I do not know about. Looks like there are not. One advantage, which is in my opinion not worth all the problems. Just curious, does anyone else share this opinion?

Sorry if my post was not very calm and for not very smooth english.

1 Like

Cooperative multitasking always requires rewriting all your libraries in some way, because regular libraries don’t include scheduler yields and aren’t prepared to handle cancellation. (Cancellation is a tricky feature, because it’s very useful, but it only works reliably if every library author is thinking about it all the time.)

Gevent has this problem just as much as async/await libraries like asyncio. The difference is that since gevent doesn’t require different function signatures, they have another option instead of making new libraries: they can try to monkey-patch existing libraries to convert them to cooperative multitasking in-place.

This definitely has some advantages, but also a lot of disadvantages: monkey-patching is inherently fragile, it’s hard to know which libraries will work with monkey-patching and which won’t, you’re using a configuration that the library authors probably haven’t tested, and it’s random luck how well the libraries handle unexpected events like cancellation.

I’m not saying you’re wrong and async/await is always 100% the best. I’m saying the tradeoffs are complicated, and a lot of people have good reasons for deciding that async/await is the best option for their particular situation. Even before asyncio and async/await, the twisted and tornado libraries were very popular, and their APIs were much more difficult to work with than modern asyncio, because they didn’t have any help from the language and had to do everything with callbacks.

If you think that for your particular situation that threads or gevent are better, then that’s great, they still exist and lots of people still choose to use them.

This is a minor point, but it’s a common misconception so worth pointing out: trying to automatically detect whether a function is sync or async it’s almost always a bad idea, because it’s very difficult to do reliably. Instead it’s almost always better to make the user say explicitly which one they mean, for example by having two versions of a decorator and telling the user to use @mydecorator_sync on sync functions and @mydecorator_async on async functions.

1 Like

This is very interesting! Could you please give an example (may be simplified) of some API improvement that became possible with asyncio approach? I sincerely hope that this would be interesting and useful not only for me.

It’ll be pretty obvious if you read any tutorials on old-school Twisted or Tornado, because all control flow has to be expressed by chains of callbacks, instead of using Python’s regular control flow constructs.

One example is at the beginning of this long post about async API design: https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/

Specifically, compare the traditional asyncio code (“Example 1”) to the async/await-based asyncio code (“Example 3”).

This is a very nice article. I want my code to look like an “Example 2”, not like “Example 3” and not at all like “Example 1”. But I do not understand why not is it possible to write a code like in “Example 2” using threads approach. The only advantage of of async/await approach that I understand (one can be sure that code is not interrupted between awaits) does not help at all here.

Ideally (“green”)thread-bases code would look almost exactly like “Example 2” with one major difference: all async and await keywords are gone. Whenever some blocking operation is executed (such as source_sock.recv or source_sock.sendall) - corresponding python thread just blocks. It could be python executable itself who understands if python thread is waiting for some particular io operation and schedules the thread when possible, or if it has to be library - some old good IPC producer-consumer mechanism could be used. As I understand async frameworks do something like this already, the “IPC mechanism” is the send method.

Is “Example 2” code is much better because it uses async-await approach, or may be because it uses better library?

One more minor question. Are you sure the server can’t accept two incoming connections? Somewhere between the first connection is received and the await main_task.cancel() in line 13 is actually processed? I have to admit that a little more code would be required to enforce this requirement using threads approach.

Cool, that was the point of the article, so I’m glad it worked :slight_smile:

You could. In Curio, and in my newer library Trio, all the APIs could work with a green thread system and just deleting all instances of async and await. One of the main reasons I got frustrated with Curio though was that it uses await somewhat idiosyncratically, and it doesn’t always mark schedule/cancel points, and I was really struggling to write correct code without race conditions or starvation problems.

The reason I brought up that article was to point out: there are a lot of people who find the pure (green) threads approach so difficult that back when their only options were (green) threads or “Example 1”-style callback chains, then they chose the callback chains. Twisted and Tornado and asyncio wouldn’t even exist if there weren’t people who wanted this enough to spend huge amounts of energy making it happen. I don’t know what you’re doing; maybe gevent is the best solution for your problems! But it seems unlikely to me that it’s the best solution for everyone’s problems. It’s more likely that they have a different experience or problems than you.

I’m not sure, maybe. It’s not really possible to guarantee that you only accept one connection because of limitations of how TCP stacks work inside operating systems, and it’s not really important to support in real applications or relevant to the concepts I was talking about in the article, so I didn’t spend a lot of time thinking about it.

1 Like

It’s worth clarifying that await/async expressions are not at all tied directly to the asyncio module. The await/async expressions effectively act as their own separate API. The asyncio module is dependent on the await/async expressions, but not the other way around. As far as I’m aware, this was done intentionally to allow different approaches, such as curio.

If anything, I would consider this to be an advantage. This was done very intentionally so that generators not designed asynchronously were not mistakenly used as such. Attempting to utilize multiple points of exit and entry on something that was designed to have a single exit point would almost certainly lead to issues.

I’m not certain that I understand why this is considered a disadvantage. Asynchronous programming can be quite complex, and we fully expect for users to read the documentation. Certainly it shouldn’t be more complicated than it has to be. But, designing an API without expecting users to read to the documentation would lead to severe limitations.

I’ve yet to see an asynchronous implementation where you can interchangeably use subroutines and coroutines without unpredictable behavior, lack of thread safety, or other significant issues. This “only advantage” is quite a strong one.

Also, that being the “only advantage of asyncio” is highly subjective. To many users, asyncio provides a significantly easier to utilize implementation of asynchronous programming compared to other approaches. Especially with the more recently implemented API using asyncio.run().

That seems a bit needlessly dismissive of the massive amount of work that the active developers of asyncio have poured into the module, such as @asvetlov and @yselivanov. I’ve recently worked on asyncio myself, but it doesn’t scratch the surface of the efforts that several others have made.

It’s perfectly okay if asyncio doesn’t suit your preferences or needs, but that does not mean there is only “one advantage” to using an entire module. From my understanding, it seems more so that the other advantages are just not your main priorities. This could be more considerately phrased as something along the lines of “For my purposes, the advantages of asyncio don’t outweigh the disadvantages”.

2 Likes

I am an application developer, not core or framework developer. I am using features provided to me by async/await-based framework and frankly speaking quite happy about it.

But sometimes I try to imagine what my application code would be if the framework I am using was based not on async/await, but on threads (or some kind of “green threads”). As far as I can see the application code will remain pretty much the same with several differences:

  • I will have to watch out for race conditions more closely. With async approach I can be sure my code would not be interrupted anywhere between awaits, with executions cat switch between concurrent threads at any moment
  • syntax of spawning a new task will be slightly different
  • all the async/await keywords are gone
  • as a result there is no restriction that one can’t call async function from usual one

The first item of the list is definitley a disadvantage of threads approach. But I have not seen a single real-life example when this feature of async/await approach helped to deal with concurrency-related problems.

Last two items are a huge advantage of threads approach. As far as I can see. Chances are I just can’t see far enough. So I am trying to understand what other advantages async/await approach has. I do appreciate the huge amount of work invested in async/await functionality, but this is not an argument in “async vs threads” discussion.

This is an interesting bug that caused a bunch of different mysql libraries to return incorrect results when used with gevent: https://github.com/PyMySQL/PyMySQL/issues/275

The root cause was cancellation: in gevent, green threads can be cancelled when they block on network operations, and these libraries weren’t written with that possibility in mind, so it caused corruption of internal state. One query was returning the results of another, etc. So it’s an example of how you can’t just drop in a green threads library and expect existing code to work correctly, and why it’s useful to be able to see cancellation points when reviewing code.

This seems to be a commonly occurring theme when subroutines (designed to have a single point of entry and exit) are attempted to be used as coroutines (designed to have blocking/suspension and cancellation). That’s a large part of why the restriction is in place.

As far as I’m aware, there’s no practical way to safely use a subroutine (such as a standard function or method) as a coroutine (such as an async function or method) without causing significant issues. Subroutines and coroutines have fundamental design differences, and even if async is removed from the declaration, anything that properly supports concurrency should be designed or modified with it being a consideration.

Also, I’m glad that you’re happy with the features. The questions you’re asking aren’t at all unreasonable. I just wanted to make sure the discussion remained constructive and the amount of work placed into it wasn’t forgotten. It’s easy to forget that there are real people behind it when criticizing a framework (or an entire language in some cases). Apologies if I misunderstood you. (:

1 Like

Thank you and Nathaniel for your attempts. But I have to confess that I do not understand the arguments in the last three messages. It’s not your fault, it’s my problem. As I mentioned I am application developer and at the moment I do not quite understand the problems core developers and framework developers have to solve. I will just trust you that async/await approach helps to deal with these more low-level tasks.

1 Like

Well, asyncio was developed to get rid of the GIL, for what I know. I mean, I don’t know how green thread works, but threads suffers from GIL, if there’s IO.

I usually try to use multiprocessing. It’s simple to use and there’s no GIL problem. I want also to investigate a library, ray, that promises a faster implementation of multiprocessing, and easy support for remote machines. The “only” problem of multiprocessing is that all object must be picklable.

I developed with asyncio for 2 years, and I must say it’s really interesting… the problem is it breaks encapsulation. I mean, an asynchronous function must have in its signature the async keyword. This is very problematic, because if you change your mind (and this happened me very often) and you need a “normal” function instead, you have to change the signature and the code of that function and of all functions that calls that function.

Furthermore, I’ve not investigated it very well, but it seems that you can’t mix “old style code” with asyncio. Or your .py uses asyncio for everything, or you can’t use it. Maybe it’s me that does not know asyncio very well and I missed the latest improvements.

It’s only partially that I agree with you, Marco Sulla. I’ll try to explain now. If there are some inaccuracies in my explanations I encourage others to correct me.

Asyncio does not solve the GIL problem and it was not designed to solve it. asyncio is good in cases when your application needs to process concurrently many tasks, but each of this tasks does not require much computations from your application. That is processing of each task may require a long time, but most of that time your application is just waiting for external parties: f.e. IO operations or response from other applications. Your application starts processing a task, makes some external request, and instead of just waiting for response it could (partially) process other tasks meanwhile.

In order to do such a concurrency your application have to do some bookkeeping - it should remember the tasks it’s processing and the stage of processing of each task. One (but not the only) approach is to start a new thread for each task. All the bookkeping comes almost automatically - point of execution of each thread corresponds to stage of processing of corresponding task. When you write such code you “only” have to remember about concurrency when code of different threads can potentially use common resources (either “internal” python structures or external - such as database). I enclosed word “only” in quotes because it’s not at all easy.

Because of the GIL only one thread is executed at a given moment of time. But the purpose of using threads in this case is not to make calculations simultaneously in several threads, but to organize the bookkeeping of the tasks. Whenever the thread blocks waiting for some IO the operating system can switch your application to another thread.

The problem with threads approach is that threads are expensive for operating system. Your application can’t create too many threads.

asyncio approach is quite similar to threads, but it does not actually use threads provided by operating system. Instead there are coruotines - purely python structures representing the same thing as a thread - some code partially executed and execution of that code could be resumed. The scheduling in this case is done not by the operating system, but by the framework your application is using.

asyncio does not solve GIL problem. There is still no more than one task being processed by your application at any given moment of time. Other tasks may be being processed at that moment, but not by your application - your application waits for results.

The price for using not threads but coroutines are all the inconveniences you mentioned.

2 Likes

I think you have the right general idea, but I want to clarify a few points.

  1. Coroutines within Python are not specifically associated with asyncio. With how PEP 492 [1] was implemented (and the legacy generator-based coroutines implemented with PEP 342 [2]), any library or framework can make use of them, as well as the associated async/await syntax.

The main purpose of asyncio is to provide a high-level API for implementing IO-bound concurrency through asynchronous programming. This often comes in the form of coroutines or other objects that use them, but my point is that coroutines are not dependent on asyncio.

  1. While they can be used for a similar purpose, I wouldn’t say that coroutines necessarily “represent the same thing as a thread”. OS threads have their own individual program counters and separate stacks from one another; this is not true for coroutines.

A bit more clear of a way to describe coroutines at a high-level is that they’re essentially an object that represents the state of a function/method (subroutine), and can be suspended and resumed (through usage of await) at multiple points. This is unlike a subroutine [3], which only has only one point of entry and exit.

Also, OS threads do still have a use case within asyncio. Specifically, if it is desired to run an IO-bound subroutine without blocking the event loop, they can be ran within the event loop’s ThreadPoolExecutor (from concurrent.futures) through loop.run_in_executor() [4]. This is especially useful when implementing concurrency for existing code or libraries that were not implemented with async in mind.

Not only does asyncio not solve the GIL problem, it’s also not a significant factor when dealing with IO-bound tasks. The GIL only becomes significant when implementing concurrency for CPU-bound tasks, which is not the primary focus of asyncio.

For CPU-bound concurrency in Python, we have subprocesses. Process pools can be used in asyncio via loop.run_in_executor(), by passing an instance of concurrent.futures.ProcessPoolExecutor to the executor parameter (instead of using the default one, which is ThreadPoolExecutor).

Note: We’re currently planning on improving the API for using pools in asyncio in Python 3.9. The goal is to provide a more intuitive and user friendly way of using thread pools and process pools, instead of using loop.run_in_executor(). I’m currently in the early stages of implementing an asyncio.ThreadPool().

[1] https://www.python.org/dev/peps/pep-0492/

[2] https://www.python.org/dev/peps/pep-0342/

[3] A generator also has more than one point of entry/exit and can suspend via yield, but unlike a coroutine, it can’t pass values or raise exceptions when the function is resumed.

[4] https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor

1 Like

FWIW I’ve run python programs with 1000s of threads. They use some GB of memory but work fine. I haven’t benchmarked threads against async though. I like threads and have avoided lock hazards by having them never share mutable data, but only communicate through queues, like Erlang does with mailboxes.

Erlang and GHC use green threads that are transparent to the user, so you get the advantages of lightweight concurrency and the illusion of single path blocking i/o. It would be great to have Python work that way but adapting CPython to that model doesn’t sound practical off the top of my head. It could be a new Python implementation running on the Erlang BEAM, sort of like Elixir is a Ruby dialect running on BEAM. Maybe PyPy could also do something like that. I once imagined Python 4 could work this way, but it doesn’t seem like a realistic hope.

1 Like

Kyle Stanley
if it is desired to run an IO-bound subroutine without blocking the event loop, they can be ran within the event loop’s ThreadPoolExecutor (from concurrent.futures ) through loop.run_in_executor() [4]. This is especially useful when implementing concurrency for existing code or libraries that were not implemented with async in mind

Hi, I’d like to have a precision on this. If I had to implement an async work mostly I/O time bound, and that I would use loop.run_in_executor() on some existing code instead of rewriting it with proper async methods, because of lack of time or because of lazyness, What would be the cost of this ?

I mean, apart the fact that threads would be more costly, what would be the drawback ?

In other word, should we always recommend writing async code instead of loop.run_in_executor() usage when it is possible and why ? What arguments as an architect, should we provide to a developer to make it understand the benefit / necessity of this, when it is possible to rewrite existing code with the async / await paradigm ?

Thanks.

1 Like

I would not necessarily recommended re-writing as async/await instead of using run_in_executor() in all situations. For example, if you have a perfectly working program with threads and don’t anticipate a that a significant number of concurrent workers (100s to 1000s+) will be needed in the future based on its use case, sticking with the current approach instead of re-writing to async/await is a perfectly viable option.

However, if it is reasonable to expect that the number of concurrent workers in the program will eventually scale to the 1000s+, you will benefit from using coroutines over threads by using significantly less overhead memory resources, and the faster context switching speed of coroutines (switching between threads has the overhead of interfacing with the OS scheduler, unlike coroutines).

It’s important though to keep in mind that it will result in lower long-term maintenance to go with async/await if you expect the concurrent workers to continue to scale, rather than starting with threads and switching the async/await once it becomes unreasonable to use threads. IMO, it’s much better done as a gradual process early on, rather than as a last-minute decision when you start to reach bottlenecks.

Also, it can be beneficial to have more explicit control over exactly when in the program flow the context switch happens with async/await, instead of threads where the context switching occurs largely outside of your control. You can set sys.setswitchinterval() to configure the duration between thread switches within the CPython interpreter, but not where the switch happens. So, proper handling of resource contention can be more complicated when working with threads. Although, this can be a drawback in some simple programs where resource contention isn’t an issue, and you can simply allow the thread context switching to occur without much thought.

@yselivanov, @asvetlov, and @njs might have more to add about the potential architectural pros/cons of using coroutines vs threads.

PS: In Python 3.9+, I fairly recently added asyncio.to_thread() that is a bit more simple to work with than loop.run_in_executor() for working with threads in asyncio.

3 Likes

Thanks for the reply, @aeros. This is the kind of answer I was expecting. Also, couldn’t we say that with a thread pool, the number of concurrent operations would be limited by the amount of threads available in the pool, with some incoming operation waiting for some thread to be freed, while with the loop, any incoming request would be scheduled immediately ?

I mean, the coroutine with the thread would be scheduled on the loop immediately, but the operation wouldn’t actually start until a thread would be free to process it. So an I/O bound operation would still need to wait for the end of one of the previously I/O operations already occuring on the loop before actually starting, for instance, by opening a socket and initiate a request.

Yep, that’s definitely a factor to consider as well. If for example you have a threadpool with a set maximum of 100 threads that is continuously near the peak, you’ll experience a delay in the I/O bound operation starting until a thread is free (within both ThreadPoolExecutor and ProcessPoolExecutor, this is implemented via semaphore that starts at 0 and increments when there is a free thread/process – if the workers are below the maximum, it creates a new one, but otherwise it blocks until there is a free thread).

In the case of using an event loop within a single thread, there is no such limitation, because you can have a nearly indefinite number of coroutines compared to threads, and they use no resources other than memory at the OS level (IOW there’s no OS limit on coroutines, since they exist purely as Python objects).