I’ve been thinking about this a bit today, particularly around compatibility and how we can get there from here.
- We want existing code to “just work”.
- Existing code has blocking OS calls.
- Existing calls use a C stack.
- A virtual threading model must allow multiple virtual threads to multiplex to a single OS thread.
- Calls using a C stack that call a python callback must be pinned to a thread to preserve the C stack.
- Existing code that has blocking OS calls makes the whole thread block.
- Pinning two virtual thread to the same virtual thread causes unwanted contention.
- asyncio and at least some other async/await run loops are single-threaded, limiting parallelism while enabling concurrency. Virtual threads can simultaneously unlock both, with the same benefits and drawbacks as with freethreading.
Taken together, what this means to me is that virtual threads, by default, should quickly fall back to have behavior that is very similar to regular threading. As far as the virtual thread scheduler would be concerned, a virtual thread that blocks the OS is still running.
So, you might create a virtual thread, then if you use synchronization from the existing implementations of the threading module, it would block the OS thread. We’d need the virtual thread scheduler to be on a separate thread to manage that.
Virtual threads, when using the same primitives as regular threads, would end up with performance similar to, but almost always worse than regular threads, because it would incur both the overhead of managing the OS thread as well as the overhead of dealing with the virtual thread run loop.
If that’s as far as we go, we’d end up mostly worse off, but in relatively marginal ways. But it’s where we could go from there that could be exciting.
We can either create new virtual-threading aware APIs, or extend some of our standard APIs to be virtual-threading aware, which means that they would delegate their blocking calls and IO to the virtual thread run loop, much like asyncio, but without asyncio’s single-threaded requirement or differently colored functions. I’m not sure exactly what measurement to use to determine which things could be rewritten to natively support virtual threading, and which would require alternative implementations.
For example, if we could rewrite queues to support virtual threading, that would make enabling virtual threading much easier. Better yet if we could rewrite locking from the threading module, which would unlock even more things automatically.
We shouldn’t want every program to require a separate thread for the virtual thread scheduler, especially if there is no concurrency in the program anyway. My basic thinking is that as soon as there are multiple threads or virtual threads working with virtual threading suspension calls, that a dedicated thread for the virtual threading runloop should be created.
When blocking calls are done virtual-threading aware, and when no C stack to python callback has caused a virtual thread to be pinned to an OS thread, virtual threads should be free to move between threads. We’d need to be able to notice when C implementations call into python callbacks, so that we can mark a virtual thread as being pinned to the particular thread, and unmark it as pinned when the same callback completes. While pinned to a thread, even when suspended using a virtual-thread aware blocking API, no other virtual threads should be run on that OS thread in order to avoid undue contention.
It would be great if we could offer a native C way to cooperate with virtual threading. I don’t know enough about implementation details to know what this would look like. Probably a bit of a song and dance to avoid the C stack in C extensions.
The virtual thread runloop would have some (handwave, handwave) algorithm for deciding when to re-use threads, wait for threads, or spin up new threads to handle these requirements of pinning and existing non-virtual-thread-aware APIs blocking. Maybe as simple as “if there’s no thread free, make some new threads”.
Virtual threads may not be for everyone, and that’s OK. People who prefer to rely on asyncio’s single-threading requirement should be free to do so. But I think that a virtual threading API gives Python some real opportunities to simplify the number of concepts that a beginner needs to learn in order to do concurrent and parallel work, and for much of the work that I do really hits the right middle ground for why I reach for Python instead of something that needs more semantic declaration, like Rust.
async/await isn’t bad. I think it’s very useful for writing state machines in a way that’s more approachable for many developers. If we ever figure out a good way to reasonably serialize async/await functions, that would be a killer feature, and one that virtual threads could not reasonably copy, afaict.