What are the advantages of asyncio over threads?

Marco_Sulla · December 16, 2019, 2:51am

Well, asyncio was developed to get rid of the GIL, for what I know. I mean, I don’t know how green thread works, but threads suffers from GIL, if there’s IO.

I usually try to use multiprocessing. It’s simple to use and there’s no GIL problem. I want also to investigate a library, ray, that promises a faster implementation of multiprocessing, and easy support for remote machines. The “only” problem of multiprocessing is that all object must be picklable.

I developed with asyncio for 2 years, and I must say it’s really interesting… the problem is it breaks encapsulation. I mean, an asynchronous function must have in its signature the async keyword. This is very problematic, because if you change your mind (and this happened me very often) and you need a “normal” function instead, you have to change the signature and the code of that function and of all functions that calls that function.

Furthermore, I’ve not investigated it very well, but it seems that you can’t mix “old style code” with asyncio. Or your .py uses asyncio for everything, or you can’t use it. Maybe it’s me that does not know asyncio very well and I missed the latest improvements.

akorshkov · December 16, 2019, 5:47pm

It’s only partially that I agree with you, Marco Sulla. I’ll try to explain now. If there are some inaccuracies in my explanations I encourage others to correct me.

Asyncio does not solve the GIL problem and it was not designed to solve it. asyncio is good in cases when your application needs to process concurrently many tasks, but each of this tasks does not require much computations from your application. That is processing of each task may require a long time, but most of that time your application is just waiting for external parties: f.e. IO operations or response from other applications. Your application starts processing a task, makes some external request, and instead of just waiting for response it could (partially) process other tasks meanwhile.

In order to do such a concurrency your application have to do some bookkeeping - it should remember the tasks it’s processing and the stage of processing of each task. One (but not the only) approach is to start a new thread for each task. All the bookkeping comes almost automatically - point of execution of each thread corresponds to stage of processing of corresponding task. When you write such code you “only” have to remember about concurrency when code of different threads can potentially use common resources (either “internal” python structures or external - such as database). I enclosed word “only” in quotes because it’s not at all easy.

Because of the GIL only one thread is executed at a given moment of time. But the purpose of using threads in this case is not to make calculations simultaneously in several threads, but to organize the bookkeeping of the tasks. Whenever the thread blocks waiting for some IO the operating system can switch your application to another thread.

The problem with threads approach is that threads are expensive for operating system. Your application can’t create too many threads.

asyncio approach is quite similar to threads, but it does not actually use threads provided by operating system. Instead there are coruotines - purely python structures representing the same thing as a thread - some code partially executed and execution of that code could be resumed. The scheduling in this case is done not by the operating system, but by the framework your application is using.

asyncio does not solve GIL problem. There is still no more than one task being processed by your application at any given moment of time. Other tasks may be being processed at that moment, but not by your application - your application waits for results.

The price for using not threads but coroutines are all the inconveniences you mentioned.

aeros · December 17, 2019, 9:21pm

I think you have the right general idea, but I want to clarify a few points.

Coroutines within Python are not specifically associated with asyncio. With how PEP 492 [1] was implemented (and the legacy generator-based coroutines implemented with PEP 342 [2]), any library or framework can make use of them, as well as the associated async/await syntax.

The main purpose of asyncio is to provide a high-level API for implementing IO-bound concurrency through asynchronous programming. This often comes in the form of coroutines or other objects that use them, but my point is that coroutines are not dependent on asyncio.

While they can be used for a similar purpose, I wouldn’t say that coroutines necessarily “represent the same thing as a thread”. OS threads have their own individual program counters and separate stacks from one another; this is not true for coroutines.

A bit more clear of a way to describe coroutines at a high-level is that they’re essentially an object that represents the state of a function/method (subroutine), and can be suspended and resumed (through usage of await) at multiple points. This is unlike a subroutine [3], which only has only one point of entry and exit.

Also, OS threads do still have a use case within asyncio. Specifically, if it is desired to run an IO-bound subroutine without blocking the event loop, they can be ran within the event loop’s ThreadPoolExecutor (from concurrent.futures) through loop.run_in_executor() [4]. This is especially useful when implementing concurrency for existing code or libraries that were not implemented with async in mind.

Not only does asyncio not solve the GIL problem, it’s also not a significant factor when dealing with IO-bound tasks. The GIL only becomes significant when implementing concurrency for CPU-bound tasks, which is not the primary focus of asyncio.

For CPU-bound concurrency in Python, we have subprocesses. Process pools can be used in asyncio via loop.run_in_executor(), by passing an instance of concurrent.futures.ProcessPoolExecutor to the executor parameter (instead of using the default one, which is ThreadPoolExecutor).

Note: We’re currently planning on improving the API for using pools in asyncio in Python 3.9. The goal is to provide a more intuitive and user friendly way of using thread pools and process pools, instead of using loop.run_in_executor(). I’m currently in the early stages of implementing an asyncio.ThreadPool().

[1] PEP 492 – Coroutines with async and await syntax | peps.python.org

[2] PEP 342 – Coroutines via Enhanced Generators | peps.python.org

[3] A generator also has more than one point of entry/exit and can suspend via yield, but unlike a coroutine, it can’t pass values or raise exceptions when the function is resumed.

[4] Event Loop — Python 3.12.1 documentation

phr · May 15, 2020, 5:08am

FWIW I’ve run python programs with 1000s of threads. They use some GB of memory but work fine. I haven’t benchmarked threads against async though. I like threads and have avoided lock hazards by having them never share mutable data, but only communicate through queues, like Erlang does with mailboxes.

Erlang and GHC use green threads that are transparent to the user, so you get the advantages of lightweight concurrency and the illusion of single path blocking i/o. It would be great to have Python work that way but adapting CPython to that model doesn’t sound practical off the top of my head. It could be a new Python implementation running on the Erlang BEAM, sort of like Elixir is a Ruby dialect running on BEAM. Maybe PyPy could also do something like that. I once imagined Python 4 could work this way, but it doesn’t seem like a realistic hope.

jpl · September 16, 2020, 9:41am

Kyle Stanley
if it is desired to run an IO-bound subroutine without blocking the event loop, they can be ran within the event loop’s ThreadPoolExecutor (from concurrent.futures ) through loop.run_in_executor() [4]. This is especially useful when implementing concurrency for existing code or libraries that were not implemented with async in mind

Hi, I’d like to have a precision on this. If I had to implement an async work mostly I/O time bound, and that I would use loop.run_in_executor() on some existing code instead of rewriting it with proper async methods, because of lack of time or because of lazyness, What would be the cost of this ?

I mean, apart the fact that threads would be more costly, what would be the drawback ?

In other word, should we always recommend writing async code instead of loop.run_in_executor() usage when it is possible and why ? What arguments as an architect, should we provide to a developer to make it understand the benefit / necessity of this, when it is possible to rewrite existing code with the async / await paradigm ?

Thanks.

aeros · September 16, 2020, 7:58pm

I would not necessarily recommended re-writing as async/await instead of using run_in_executor() in all situations. For example, if you have a perfectly working program with threads and don’t anticipate a that a significant number of concurrent workers (100s to 1000s+) will be needed in the future based on its use case, sticking with the current approach instead of re-writing to async/await is a perfectly viable option.

However, if it is reasonable to expect that the number of concurrent workers in the program will eventually scale to the 1000s+, you will benefit from using coroutines over threads by using significantly less overhead memory resources, and the faster context switching speed of coroutines (switching between threads has the overhead of interfacing with the OS scheduler, unlike coroutines).

It’s important though to keep in mind that it will result in lower long-term maintenance to go with async/await if you expect the concurrent workers to continue to scale, rather than starting with threads and switching the async/await once it becomes unreasonable to use threads. IMO, it’s much better done as a gradual process early on, rather than as a last-minute decision when you start to reach bottlenecks.

Also, it can be beneficial to have more explicit control over exactly when in the program flow the context switch happens with async/await, instead of threads where the context switching occurs largely outside of your control. You can set sys.setswitchinterval() to configure the duration between thread switches within the CPython interpreter, but not where the switch happens. So, proper handling of resource contention can be more complicated when working with threads. Although, this can be a drawback in some simple programs where resource contention isn’t an issue, and you can simply allow the thread context switching to occur without much thought.

@yselivanov, @asvetlov, and @njs might have more to add about the potential architectural pros/cons of using coroutines vs threads.

PS: In Python 3.9+, I fairly recently added asyncio.to_thread() that is a bit more simple to work with than loop.run_in_executor() for working with threads in asyncio.

jpl · September 17, 2020, 6:40am

Thanks for the reply, @aeros. This is the kind of answer I was expecting. Also, couldn’t we say that with a thread pool, the number of concurrent operations would be limited by the amount of threads available in the pool, with some incoming operation waiting for some thread to be freed, while with the loop, any incoming request would be scheduled immediately ?

I mean, the coroutine with the thread would be scheduled on the loop immediately, but the operation wouldn’t actually start until a thread would be free to process it. So an I/O bound operation would still need to wait for the end of one of the previously I/O operations already occuring on the loop before actually starting, for instance, by opening a socket and initiate a request.

aeros · September 17, 2020, 8:19pm

Yep, that’s definitely a factor to consider as well. If for example you have a threadpool with a set maximum of 100 threads that is continuously near the peak, you’ll experience a delay in the I/O bound operation starting until a thread is free (within both ThreadPoolExecutor and ProcessPoolExecutor, this is implemented via semaphore that starts at 0 and increments when there is a free thread/process – if the workers are below the maximum, it creates a new one, but otherwise it blocks until there is a free thread).

In the case of using an event loop within a single thread, there is no such limitation, because you can have a nearly indefinite number of coroutines compared to threads, and they use no resources other than memory at the OS level (IOW there’s no OS limit on coroutines, since they exist purely as Python objects).