I am very curious to learn what the community thinks about asyncio’s place in the upcoming nogil world.
More precisely, I mean the ‘asyncio’ style of network programming involving non-blocking I/O, cooperative green threads (tasks) and colored functions (two flavors of functions [sync and async] and explicit suspension points marked by await); so frameworks like Trio and anyio are also included.
Alternatives would include:
just using large threadpools with sync I/O. I haven’t written Java for a long time, but IIRC they do this by default?
async programming but with non-colored functions; Go and gevent-style.
Interestingly, there are easy examples of languages supporting async programming with colored functions while having free threading: C# and Rust. So this gives me faith colored functions still make sense in a free-threaded world.
The reason it occurred to me to reconsider colored functions in a nogil world is because this style has two benefits: it’s clear where a suspension point is (that will still hold), and between suspension points the world (i.e. state) isn’t expected to change (this might not hold anymore). These constraints meant many operations that might need locks using a different style don’t need locks; the critical section can be performed between suspension points. This is very natural and efficient.
Ok, and if we as a community decide to stick with asyncio and try adapting it to take advantage of nogil, I see the following strategies available to us:
continue running asyncio services essentially single-threaded; the main difference is running CPU-bound things in a threadpool becomes more straightforward (before you had to use a ProcessPoolExecutor if your workload didn’t release the GIL). So just like now, except a little easier.
run multiple event loops in multiple threads. It’s a little more efficient than running multiple processes since the memory cost of the interpreter and stdlib can be shared. Resources scoped to the process (ports, signals?) become more complex to manage.
write an event loop that can multiplex active tasks across a number of threads, also known as M:N threading. I think this approach is the most efficient in theory but using it loses the fundamental asyncio assumption of no state changes between suspension points. I would be surprised if existing asyncio libraries could run without issues using this model, so I’d call this model something else (nogil-asyncio?) instead.
more complex models higher up in the application layer. For example, I could imagine an actor-like framework being able to bridge between the existing asyncio ecosystem and coroutines multiplexed on many threads in parallel. The community could probably come up with innovative stuff.
Since there are so many options, I’m really curious what other folks interested in this space are thinking makes sense. A part of me is also sure, our community being so diverse and creative, all of these will probably see light of day in some form or another.
Threads have overhead. Let’s say you want to run a web server where you spawn a task for each incoming request (read request, process it, send response); done with a thread pool, the pool size will limit your number of concurrent requests. And since it’s extremely possible for an attacker to delay this (open a ton of connections, start them all, and don’t finish them), this will quickly result in either a huge thread pool with most of them idle, or requests getting dropped. So asyncio will still have a place there, since it scales to infinity far better than threads do.
I would be VERY curious to see whether a nogil Python would allow a hybrid whereby you have a ThreadEventLoopPool that has some number of threads, each running an asyncio event lop, and thus able to scale up to vast numbers of tasks (since idle tasks aren’t consuming much), while also able to run multiple actual jobs concurrently (because threads), with minimal overhead for moving data between threads (unlike a process pool). Basically this
but much much simpler and better abstracted. I personally think that the “event loop that can multiplex active tasks” approach is more clunky than simply having independent event loops on the separate threads, although I’m open to examples showing otherwise.
Yeah, threads + sync IO are not the best. You’d also lose asyncio cancellation semantics which are great. I’m personally also not enthused by this approach, but folks are using it I guess.
Your tasks will presumably be using a bunch of libraries (like aiohttp, sqlalchemy, httpx, aioredis…) to do their work. All of these libraries maintain connections pools internally. These connection pools are bound to an event loop. So if you have N independent event loops, you’ll need a connection pool per loop. While it’s not the end of the world, it’s still pretty bad for a bunch of reasons. For example, you risk having a connection pool starved in one thread while an identical connection pool has available connections in another thread. Likewise, (depending on your library) you risk hogging database resources with unnecessary idle connections. All of this would be alleviated with a pool that’s a little bigger but shared between threads.
That said, I mentioned I think none of these libraries are “nogil-asyncio” safe today. But they could be adapted, if that’s what we decide on.
Hmm, that’s fair. I’m not sure whether that could be solved, but it’s probably a good reason to go with the single event loop, yeah. I’m just not sure what the mental model should be here - it’s a bit of a weird hybrid between threads and tasks. Will it end up feeling like “asyncio, but with more concurrency”, which would be great? Or will it be “oh <bleep> there’s a bug that only happens when this half of the task runs on a different thread”?
In any case, I’m excited for the future, and hopeful of being able to put this to some real-world use soon!
The basic trade off between async and threads is the cost of the threads.
Each thread needs its own stack. When there are 1,000’s of threads that
memory will be huge. There is also the cost of context switching between
the threads that can use more CPU then async.
And this is why I plan to continue to do async programming even with free threading when it’s I/O-bound and threading doesn’t win me some massive performance win. I find it way easier to reason about async programming than worrying about locks and race conditions.
We never figured out why though, right? I remember Guido had a hunch, but that’s about it. Also presumably @ambv was benchmarking it using the stdlib event loop, which no one who cares about performance uses in prod anyway.