Add Virtual Threads to Python

tl;dr

Java has virtual threads. Virtual threads are a better way of doing concurrency than Python’s async and await. We should add virtual threads to Python.

Virtual Threads

Virtual threads were added to Java a few years ago.
Virtual Threads combine the best of async Tasks and normal threads.

Like normal threads, virtual threads:

  • require no new syntax
  • provide a more intuitive mode of execution than async/await.

Like tasks, virtual threads:

  • only switch execution at well defined points in the code
  • can support structured concurrency
  • are lightweight

Unlike Python’s coroutines, virtual threads:

IMO, virtual threads offer a superior programming model to adding async and await all over your code and having to duplicate all your libraries.
But don’t just believe me, Armin Ronacher says so too.

Implementation

There is no point in suggesting a new feature if we can’t implement it.
There are two tricky parts to implementing virtual threads:

  • Context switching: switching from one thread to another
  • Not blocking on apparently blocking calls.

Continuations for context switching

Virtual threads in Java are implemented as pure Java objects using Continuation objects provided by the JVM to do the context switching.
We can do the same in Python. By adding Continuation objects to the CPython VM, we can implement Virtual Threads as pure Python objects.

Continuations, strictly “delimited continuations”, are a form of coroutine. Unlike Python’s coroutines which are stackless and asymmetric, continuations are stackful and symmetric.
Being stackful means that they can yield from within calls, not just at the top level.
Being symmetric means that they can yield to each other, not just their caller.

Although I expect Continuations to be mostly hidden, it might be informative to see an example using them to context switch.
This program prints “ping”, then “pong”, then “ping” and so on, until interrupted:

def bounce(other, msg):
    while True:
        print(other.send(msg))

pinger = Continuation(bounce)
ponger = Continuation(bounce)
pinger.start(ponger, "ping")
ponger.start(pinger, "pong")
ponger.run(None)

Non blocking calls

Suppose we want to send some data over a socket.
To do that, we would normally call socket.send(data), but socket.send blocks.
If we are using traditional threads, the operating system will run another thread for us, and it doesn’t matter that this thread is blocked.

If we are using asyncio, or equivalent, then we cannot make blocking calls, so instead of
calling socket.send(data) we must await loop.sock_sendall(socket, data) from an async function.

With virtual threads we want to be able to call socket.send(data) without all the async and await baggage and not have it block.

We can do this by using continuations. We can have one continuation running an async event loop and have that continuation manage the asynchronous operations. Something like this:

#module virtual_threads
class Socket:
   def send(self, data):
       return event_loop_continuation.send((SOCKET_SEND, self, data))
44 Likes

This sounds great. While I don’t expect the lower level continuation object to be something that people would routinely use, I’d support exposing that to Python as well (for specialised or advanced use cases).

+1

Would this be similar to Fibers in Ruby and PHP?

I ask because for many years I’ve been wishing that Python had Fibers instead of asyncio.

And I say that because, despite being a big fan of Python asyncio, I am very much aware of the “What color is your function?” problem and constantly have to work around it. I spend far too much time in my day-to-day refactoring code between various models (blocking, threaded, asyncio, evented, …), mostly because of this one problem. I considered using trio at some point, since after casually spending a few hours reading their docs one day, I felt it’s a great idea, but I ultimately decided against bothering to learn it precisely because I felt it’d just add yet another model to that set of models I’m so frequently refactoring between.

(Side note - was NodeJS the inspiration for asyncio in Python? Was NodeJS where it originally came from, or did they take it from somewhere else too?)

4 Likes

Sounds cool, but what’s the actual proposal? This?

Do you plan on writing this up as a PEP?

3 Likes

I basically like this idea. But could you please briefly explain main differences between it and greenlet / eventlet libraries approach?

2 Likes

This sounds like a good idea, but would still require a separate set of e.g. networking stdlib modules or implementations, right ?

It reminds me a lot of the gevent library I used years ago, which also implements a coroutine based approach for networking, building on the low level greenlets, which Armin and Christian created a longer while ago (based on work Christian had done with Stackless).

The downside with gevent is that it requires replacing stdlib implementations with new ones supporting the greenlet approach.

If your suggestion can overcome this, it’d be a really good idea.

Also: Could you comment on how C extensions would be able to play nicely with these Continuations ?

The nice thing about greenlets is that they support stack context switches with C functions on the stack, which makes them very versatile when dealing with e.g. callbacks.

6 Likes

You can’t just say “add continuations to CPython” without specifiying
exactly HOW said continuations are to be implemented.

There have been projects along these lines before – see Stackless
Python, greenlets, eventlets.

The core difficulty with these kinds of things is that Python code can
call C code which can call Python code, etc., so that the Python and C
stacks become intertwined. There are ways to deal with that, but they
tend to be somewhat fragile and platform-specific, so it’s unlikely
they’ll ever be accepted into CPython.

If you have an idea for implementing continuations that is reliable,
portable across platforms, doesn’t require rewriting large parts of
CPython, and is compatible with all existing C extensions, we’ll be
interested, please explain.

5 Likes

Having used all of the options available to me in python’s standard library for concurrency+parallelism (threading, async await, subinterpreters, multiprocessing), and several others options in other languages not mentioned for comparison in this proposal, I don’t see async as “baggage”, and it’s pretty hard to take a proposal that sees it that way seriously.

You can call sync code from async code, and if you have a need for async code, then you don’t have the problem of needing to call sync code from async code.

We have threading, async/await, and subinterpreters already[1]. By far, the most useful way to use these is together not separately, and there’s really no function color problem for most cases here.

Ignoring that I disagree with function coloring being a real problem or “baggage”, if I take your stance on it, In the best case scenario you’ve just added another concurrency method that won’t solve the problem for 5+ years, and the lower bound on that is assuming it’s so successful that everyone agrees this should immediately replace asyncio, and result in fast tracking deprecation and removal of async/await syntax.

During those 5 years, you have to ensure that this coexists with all of the other concurrency methods correctly. This is already challenging, but manageable with the current concurrency methods.

Your 2nd hypothetical continuation example appears to require an event loop. If this is required for all useful non-blocking IO operations, this could become something I’d actually call a problem if it couldn’t coexist with async event loops.[2]


  1. Multiprocessing usually isn’t worth mixing with these, but you can, and I’ve seen it done to positive effect. ↩︎

  2. In fact, the one place I’ve found issue with here, I’ve written a utility to help with. That being placing an asyncio event loop in a separate thread, yet communicating with it because gui frameworks often want to own the main thread ↩︎

4 Likes

This is how I would like coroutines to work. Generator functions introduced in 2.2 disappointed me. One suggestion: do not use methods as interface. Use global functions which call dunder methods. So we would be able to add special slots for these dunder methods in future, this may non-trivially reduce overhead. Later, global functions can be replaced by special syntax.

But there is a C stack issue. How do you solve it without going beyond the limitation of C?

Would not it block when we want to send data to several sockets? Or we will need a separate OS thread per connection?

1 Like

This might be some silly questions but what’s the difference between a virtual thread and something like a goroutine in golang?

I sort of sounds like that with Barriers.

If you using today’s threading, don’t you get the same abilities already? (The GIL is released for io calls already without you having to yield to the other thread. You also have barriers to force checkpoints.)

Is it just more lightweight since each thread is no longer an os thread?

Edit since I forgot to mention: I don’t really understand how adding more implementations of things like socket (since it looks like you would need that here) helps the multicolor problem. It sounds like it makes it even worse. If somehow the regular modules could be used as is maybe it would be better.

Hi. For a little context, I’ve been involved with asyncio for a long time now: I’m the author of aiofiles, original author of pytest-asyncio, worked on asyncio.timeout in 3.11, among other efforts.

I’m completely on board with this idea. I think in a nogil world function coloring doesn’t make sense in the long term.

A few comments: if we want to support structured concurrency in a usable way (and we do), I believe we need to support something similar to asyncio cancellation semantics. So virtual threads need to be cancellable. If we want to avoid function coloring, normal threads need to at least approximate the idea of cancellability too. Otherwise, we still have function coloring, but hidden, which I think is objectively worse.

You also say that virtual threads “only switch execution at well defined points in the code”. I don’t think this should be a design goal, and I also don’t think it’s a very useful property without async / await. It would, again, introduce hidden coloring versus normal, non-virtual-thread code. If memory serves, the Golang runtime can, and will, preempt a Goroutine that’s hogging a native thread. I think that’s a very useful property to potentially have.

And finally: I think it would behoove us to design this system with free threading in mind from the start. In particular, the possibility of the event loop to do M:N scheduling of virtual threads onto native threads. I don’t see this happening in a straightforward way for asyncio since a change in execution semantics of this magnitude would, most likely, break the entire ecosystem.

19 Likes

I’m a little confused as to the goals of “virtual threads”. If they’re supposed to function like normal threads and “just work”, and not demand function colouring, then they should preempt just like regular threads do; and if that’s the case, their semantics would be identical to regular threads. In other words, this wouldn’t be a new API, it would simply be a change [1] to the threading module to use virtual threads rather than OS threads, giving a performance improvement.

But if they are to “only switch execution at well defined points in the code”, how can that be done without function colouring? Either a function has such well-defined points in it, or it doesn’t. Or is it that every function becomes red under this plan, and thus all code pays a price for the mere existence of virtual threads, even if it doesn’t use them?

Sorry if this is a dumb question.


  1. possibly an optional change ↩︎

13 Likes

I’m having trouble following this argument. It seems like one which could make sense for a new language, but if the only benefit is avoiding function coloring, we will still have it from async/await.

I’m also not sure how you’d simultaneously:

  • have clearly defined context switches
  • not have function coloring (even excluding the interaction with async/await)
  • not have an issue with ffi

Any two of these are accomplishable trivially, but all 3 needs an actual demonstration of it being possible. People have explored this before, and I remember libraries like gevent monkeypatching the standard library in ways that caused hard to diagnose problems when it went wrong.

6 Likes

Sounds cool, but what’s the actual proposal?

The proposal is that would should add virtual threads to Python. The details are up for discussion.

Do you plan on writing this up as a PEP?

Let’s have a discussion first, but adding continuations would need a PEP.

Greenlets support switching the whole stack, C stack included.

Personally, I don’t think it is worth the extra memory use and maintenance burden to do that, but adopting greenlets into CPython is not a terrible idea.

Continuations are just Python, so if there is C code in the stack when you context switch, it will raise an Exception.
I think that is an acceptable limitation, but might be too limiting for some.

Also: Could you comment on how C extensions would be able to play nicely with these Continuations ?

They would be able to call into Python code which then does a context switch. Anything else should be fine.

1 Like

You can’t just say “add continuations to CPython” without specifiying
exactly HOW said continuations are to be implemented.

Clearly I can. I just did :wink:
Each continuation would contain a stack, much like the thread state does.
Continuations form a cactus stack, like coroutines. The difference between continuations and coroutines is that the entire stack is detached at once instead of doing so frame-by-frame as coroutines do. One continuation can switch to another by detaching its own stack and attaching the stack of the other.

The core difficulty with these kinds of things is that Python code can call C code which can call Python code, etc.

Not supporting switching of the mixed C/Python stack is a limitation of the continuations I am proposing. Allowing that would involve doing what greenlets does, or providing a C API that supports suspension (which would not be easy to use, where do you put the state?)

Do you think that without the ability to support interleaved C and Python calls, this would be useless?

we’ll be interested, please explain.

OOI, who are “we”?

Would not it block when we want to send data to several sockets? Or we will need a separate OS thread per connection?

If you want to send data to 3 sockets, you’ll need 3 threads. They could be virtual threads or OS threads. Virtual threads don’t block in the same way that asyncio doesn’t block when sending data to a socket.

Goroutines are, I believe, OS threads with some nice syntax.

If you using today’s threading, don’t you get the same abilities already?

Yes you do. But you also have the problems of using today’s threads. More memory consumption, a higher risk of race conditions and no support for structured concurrency.

1 Like

The two color problem is not from the number of implementations, but from syntax and what functions are allowed to call what. Bob Nystrom explains it well
We have many ways to format a string, but which one you choose doesn’t impact what other functions you can call.

So virtual threads need to be cancellable.

Yes. Absolutely.

If we want to avoid function coloring, normal threads need to at least approximate the idea of cancellability too.

It might be tricky to add cancellability to normal threads, but I wouldn’t object at all to adding it.

You also say that virtual threads “only switch execution at well defined points in the code”. I don’t think this should be a design goal, and I also don’t think it’s a very useful property without async / await. It would, again, introduce hidden coloring versus normal, non-virtual-thread code. If memory serves, the Golang runtime can, and will, preempt a Goroutine that’s hogging a native thread. I think that’s a very useful property to potentially have.

Only switching at well defined points does not introduce coloring. Syntax introduces coloring.

I don’t think making virtual threads preemptable is a good idea. Partly because it might introduce race conditions, but mainly because it adds a lot of complexity.