Add Virtual Threads to Python

How does virtual threads solve this? Every concurrent pattern has its own locks: OS threads has threading.Lock, and asyncio has asyncio.Lock. The virtual threads would have its own lock too.

I’m not sure what do you mean if virtual threads would be introduced: do you expect virtual threads to replace coroutines or OS threads? Or add another layer of concurrency? I guess you want to keep OS threads for performance benefits though, but then you need the OS lock to access shared data structures.

class Pool:
  async def get_connection(self):
    while True:
      with self.lock: # this is an OS thread lock
        if conn := self._get_free_connection():
          return conn
        fu = asyncio.Future()
        # when a connection is available, one future in self._waiters should
        # be waken up
        self._waiters.append(fu)
      await fu

The suspension points help me to not put them inside the with self.lock part.


I call the “virtual threads” “user-mode threads” actually as they are entirely implemented in userspace, and the OS threads “kernel-mode threads” as the OS kernel provides support for them. For the developers using them, they basically has the same feel except the latter can run in parallel (which is too useful to give up nowadays).

If functions have no colors, how do I know the “well defined points” that execution is switched? greenlets and Lua coroutines can all switch execution deep inside a function call without me noticing, which makes every function call (including implicit ones) a possible switching point. (See also unyielding.)

Lua coroutines do not run across OS threads, but I guess the proposed “virtual threads” needs to? Then you have two kinds of locks, just like the multi-threaded asyncio example above. Or you use only one kind of lock and go the “Go” route that let some OS threads to block and run tasks on other threads.

1 Like

For @A5rocks’ example:

def f(func: Callable[[], None]) -> None:
  func()  # is this a schedule point?

What else can we say about func?

  • Can it switch tasks?
  • Can it raise an exception?
  • Can it deadlock? _exit? Segfault?
  • Will it reliably handle task cancellation?
  • Does it run its own event loop?
  • Can it randomly rewrite its caller’s locals?
  • Is it pure?
  • Is it actually callable?

There are a lot of properties could have explicit syntax. If the entire ecosystem meticulously tracked any of these properties, our programs would be easier to reason about.

Maybe async is Python’s version of checked exceptions. Sad. I like async.

1 Like

Yeah. They would need their own sockets too, and time.sleep, and lots of other things. The idea is you wouldn’t have to know about it, you would just use threading.Lock and the runtime does the right thing. That’s what’s happening with Java virtual threads from what I gather. The goal is to have the same code run correctly on OS threads and virtual threads, thus avoiding coloring.

They would exist on the layer of coroutines, being a replacement. Whether Python would completely remove coroutines is above my pay grade. Normal threading would remain, there’s no chance of that going away. I think normal threading would have to change a little to become cancellable in some way, again to avoid coloring.

Sure. I use virtual threads in this thread since it was originally motivated by Java’s virtual threads.

You don’t really, just like you don’t when programming for multi-threaded environments. You decide where your critical sections begin and end and protect them.

They don’t need to, but I think the case for them is significantly weaker if they don’t eventually. If this is going to just be another flavor of asyncio, we already have asyncio.

2 Likes

Fun fact: if you’re using an eager task factory, a function being sync doesn’t necessarily mean it cannot switch tasks. :innocent:

I feel like this is a little weird to suddenly say – if one of the claimed benefits is that virtual threads (to quote the original post) “only switch execution at well defined points in the code”, I feel like I should be able to look at the code and see how well defined those points are :slight_smile:

This is, in my opinion, one of the biggest benefits of the async/await style: you can identify at a glance the points in a given coroutine where execution might suspend and resume, because they stand out syntactically. Giving that up in exchange for the traditional threading approach of “anything might happen any time, anywhere, and it’s the programmer’s fault for not anticipating that” would be a big step backwards in my view, especially since the function-color problem isn’t (for me) much of a problem in actual practice.

4 Likes

In my opinion, it’s not even a step backwards, it’s just… exactly what we already have. We HAVE threads. They can context-swap at equivalently well-defined points - namely, between any two bytecode instructions, and if you don’t like that, use critical sections (or, more commonly, a less sledgehammery lock).

There is one way to avoid the function colouring problem, while still having the syntactically-obvious yield points. EVERY function becomes red. We just change all functions to be asynchronous, change all calls to be await points unless otherwise specified, and now we have a monocoloured language with obvious yields in it. But I don’t think that would be an improvement. It would mean that all kinds of actions would now hard-lock the entire process, and we’re back to 1990s levels of cooperative multitasking. I’d much rather have raw preemption, with all its attendant costs, than that.

2 Likes

Thanks for the explanation, I now understand what it is supposed to be. (Basically the same as Go: use user-mode threads so that the performance is much better for I/O tasks.)

I’m sorry that my reaction is: I don’t like it. You know what? I try to avoid multithreading as much as I can, with only two exceptions: Rust and concurrent.futures. I don’t know why, but I have the feel that locking is much more frequently used with multithreading than explicit coroutine code, and I often get them wrong. Heck, I still don’t know for sure what objects in requests are threadsafe.

Perhaps with structured concurrency in mind when designing the APIs, this issue will be much less severe. I don’t know.

1 Like

Supporting greenlets like in gevent but natively would be amazing, and honestly it’s something that should’ve been done a long time ago. In my opinion, Python’s asyncio design is terrible and has likely disrupted the ecosystem in a way comparable to the transition from Python 2 to Python 3.

And I’ve seen a lot of people agree with this opinion.

Big thanks to the maintainers of gevent, their work is appreciated. It simply works flawlessly. I hope we can bring similar functionality to native Python in the future.

7 Likes

I will go a bit further than I did in my original post: the “function color” problem is too often presented in isolation. Sure, if the only options were “have a problem or don’t have a problem”, we choose “don’t have a problem” every time. But the actual alternative being presented is not “don’t have a problem”. Usually, the alternative being presented is “have the problems of free threading”. And those are some pretty big problems! The brightest minds in our industry have spent decades trying to solve those, so to me, having two “colors” of function with different calling conventions seems pretty minor in comparison.

Which is all to say that I don’t see threads in any form as “a superior programming model”, and I feel like trying to push people to threading instead of async/await is asking them to take on much larger and more difficult problems than “function color” and so is on net a negative, rather than a positive “solution”.

2 Likes

Having something like this would sound very exciting for libraries like Django. Django could use this to support async and sync APIs (using virtual threads under the hood to support a “clean”-ish single implementation, while still maintaining some performance guarantees in both spaces).

Without a way to have single implementations of async and sync APIs, various libraries have landed on using codegen to support both sides. we are asking all libraries to basically do codegen to support both APIs. What is going to happen in the end? Libraries are going to go more and more into async-only API territory for ease of maintenance. The future of sync Python depends on libraries being able to support sync and async variants cleanly.

More libs being async only there will be even more pressure to improve the (IMO) lackluster debugging support for async code (breakpoint() into an async method call and try to get any other async value).

We want good things and better suppoort in this space! But do we want libraries across the board to be doing codegen to do the good thing?

And the subtler thing: users of these libraries end up wanting codegen themselves to preserve sync and async API layering.


So my short version is that I think if you support structured concurrency through async/await, you should consider supporting this idea, because it will allow for libraries to offer good ergnomics in the space.

Libraries like Django could rely on continuations deep in the database code to have a single, “clean” implementation that yields only when the user is expecting an async variant of an API.

I had previously written a very silly idea to basically turn async “on or off”. But this would provide a pretty principled way to sidestep this problem entirely, while still giving me dual sync/async variant API support.

1 Like

I’ve been thinking about this a bit today, particularly around compatibility and how we can get there from here.

  • We want existing code to “just work”.
  • Existing code has blocking OS calls.
  • Existing calls use a C stack.
  • A virtual threading model must allow multiple virtual threads to multiplex to a single OS thread.
  • Calls using a C stack that call a python callback must be pinned to a thread to preserve the C stack.
  • Existing code that has blocking OS calls makes the whole thread block.
  • Pinning two virtual thread to the same virtual thread causes unwanted contention.
  • asyncio and at least some other async/await run loops are single-threaded, limiting parallelism while enabling concurrency. Virtual threads can simultaneously unlock both, with the same benefits and drawbacks as with freethreading.

Taken together, what this means to me is that virtual threads, by default, should quickly fall back to have behavior that is very similar to regular threading. As far as the virtual thread scheduler would be concerned, a virtual thread that blocks the OS is still running.

So, you might create a virtual thread, then if you use synchronization from the existing implementations of the threading module, it would block the OS thread. We’d need the virtual thread scheduler to be on a separate thread to manage that.

Virtual threads, when using the same primitives as regular threads, would end up with performance similar to, but almost always worse than regular threads, because it would incur both the overhead of managing the OS thread as well as the overhead of dealing with the virtual thread run loop.

If that’s as far as we go, we’d end up mostly worse off, but in relatively marginal ways. But it’s where we could go from there that could be exciting.

We can either create new virtual-threading aware APIs, or extend some of our standard APIs to be virtual-threading aware, which means that they would delegate their blocking calls and IO to the virtual thread run loop, much like asyncio, but without asyncio’s single-threaded requirement or differently colored functions. I’m not sure exactly what measurement to use to determine which things could be rewritten to natively support virtual threading, and which would require alternative implementations.

For example, if we could rewrite queues to support virtual threading, that would make enabling virtual threading much easier. Better yet if we could rewrite locking from the threading module, which would unlock even more things automatically.

We shouldn’t want every program to require a separate thread for the virtual thread scheduler, especially if there is no concurrency in the program anyway. My basic thinking is that as soon as there are multiple threads or virtual threads working with virtual threading suspension calls, that a dedicated thread for the virtual threading runloop should be created.

When blocking calls are done virtual-threading aware, and when no C stack to python callback has caused a virtual thread to be pinned to an OS thread, virtual threads should be free to move between threads. We’d need to be able to notice when C implementations call into python callbacks, so that we can mark a virtual thread as being pinned to the particular thread, and unmark it as pinned when the same callback completes. While pinned to a thread, even when suspended using a virtual-thread aware blocking API, no other virtual threads should be run on that OS thread in order to avoid undue contention.

It would be great if we could offer a native C way to cooperate with virtual threading. I don’t know enough about implementation details to know what this would look like. Probably a bit of a song and dance to avoid the C stack in C extensions.

The virtual thread runloop would have some (handwave, handwave) algorithm for deciding when to re-use threads, wait for threads, or spin up new threads to handle these requirements of pinning and existing non-virtual-thread-aware APIs blocking. Maybe as simple as “if there’s no thread free, make some new threads”.

Virtual threads may not be for everyone, and that’s OK. People who prefer to rely on asyncio’s single-threading requirement should be free to do so. But I think that a virtual threading API gives Python some real opportunities to simplify the number of concepts that a beginner needs to learn in order to do concurrent and parallel work, and for much of the work that I do really hits the right middle ground for why I reach for Python instead of something that needs more semantic declaration, like Rust.

async/await isn’t bad. I think it’s very useful for writing state machines in a way that’s more approachable for many developers. If we ever figure out a good way to reasonably serialize async/await functions, that would be a killer feature, and one that virtual threads could not reasonably copy, afaict.

3 Likes

Having worked with C#, I frequently used async/await to manage asynchronous operations.
However, I encountered a function where nearly every line was prefixed with await, including the terrible ConfigureAwait(false).

The function included:

  • var result = await Function(await HttpClient.PostAsync(url).ConfigureAwait(false), await Database.QueryAsync(connstr).ConfigureAwait(false))
  • await foreach (var item in GetNumbersAsync()) {await ProcessAsync(item).ConfigureAwait(false);}
  • await using (var resource = new AsyncDisposableResource()) { await resource.DoWorkAsync().ConfigureAwait(false); }

When the asynchronous portions of a function outweigh the synchronous ones, the abundance of await statements can become visually cluttered.
This led me to ponder: why doesn’t the language default to automatically awaiting asynchronous functions, except when concurrent execution is explicitly desired?
(I understand that the await operator is used to transform the method into a state machine.)

This realization highlighted why Go’s syntax feels more concurrency-friendly.

3 Likes

I think that even more subtly, even people who use threads might only be using it for low-level capabilities and end up using async/await for their high level code anyways. That is to say, people aren’t going to choose one or the other.

I’ve seen this in Rust, where people are “fine” using structured concurrency for the most part, but then reach for the threading capabilities when hitting some “very soft” realtime requirements (or otherwise want to get some resolution to their problems without having to have that problem touch their entire codebase).

There are definitely people who would chomp at the bit to “just” use threads for workload distribution across the board and ignore structured concurrency completely. But there are loads of people where it’s more “I want to do the ‘right thing’ most of the time except here”.

Because that brings you right back to “anything could be a context switch”. The entire point of async/await is that context switching can ONLY happen at a syntactically-clear marker, namely await [1]. Making them implicit just means that everything’s an await point.


  1. or, in Python, constructs like async for ↩︎

I don’t see how this follows – the fact that some libraries have done a thing does not, to me, imply that all libraries are forced to do it. There’s a whole range of different approaches to async that various libraries have taken, and I don’t think code generation is anywhere near being the most common one. Do you have some statistics (say, a survey of popular PyPI packages or similar) to back up your claims about it?

1 Like

I used to be of the exact same opinion as you. If you look upthread, you’ll see my first post and a short description of my asyncio work. I’m invested in asyncio.

The thing that changed my mind is free threading. I cannot think of a multithreaded asyncio event loop implementation, running multiple tasks in parallel on different OS threads, that can still maintain the asyncio invariant (no switches between suspension points). I think the best asyncio can do in a free-threaded world is run N independent event loops in N threads, sharing the interpreter and possibly the pool of threads used by asyncio.to_thread. This is a very minor gain compared to just running N independent processes, each with their event loop, like we do today.

2 Likes

For the moment, at least, I’m allowed to write single-threaded Python programs.

If you want to write multi-threaded Python programs you’re free to do so. And if you want to try to mix threads and async/await in a single program, well, you can do that, too, though I’d advise against it.

In general I think any given program should pick one approach to “have a process containing a pool of thingies (important technical term) which can all suspend and resume their execution”, and then stick to that choice, because trying to mix multiple approaches in the same program leads to sorrow. The fact that it leads to sorrow is not due to inherent shortcomings of any particular approach, but due to the inherent shortcomings of trying to mix them.

Anyway. It’s not a bad thing that Python offers you the ability to choose which approach you like, and my goal in continued posting here is pushing back on people who are either outright saying or seemingly implying that a goal of virtual threads would be to obsolete and eventually get rid of async/await, because I personally see async/await as quite good and useful even if they don’t.

(though I still don’t see how virtual threads offer the same clarity async/await does about suspend/resume points, which was where I first entered this discussion)

1 Like

I totally agree that it doesn’t. Someone pointed out that OS threads are technically “well-defined” suspend points as well (syscalls), but they’re so hidden that its irrelevant to the code author most of the time.

Someone else pointed out that awaits can get pretty infectious and the quantity can make really make them cumbersome, and I agree with that too, and is why I’m personally so enthusiastic about using virtual threads. But if async/await and it’s trade-off are working for you and feel good to you, that’s great. Digging into async/await has taught me a lot about how things work, even if I think many programmers shouldn’t need to know about those things.

I mean this earnestly: if you want to pay little performance cost on the sync flow to support your async API, what approaches have libraries taken that you have seen?

What I have seen done (purely annecdotal):

  • new libraries that just make “everything” async def, and don’t have sync APIs. At least then people subclassing your library class can use async libs!
  • libraries that do codegen (psycopg3 being my canonical example, where the sync variant is generated from the async variant)
  • hard splits of I/O from state machine logic (see wsproto) where you really embrace “split logic from I/O”, with all the API consequences that entails (and we don’t have monads to make command pattern code look imperative! We do have generators of course…)
  • The “async version” of a previously sync library, which is a just a different lib (often written by different people!), where someone took the original lib, and replaced def with async def.
  • People reaching for gevent and friends (though it’s not clear to me if that actually resolves the problem)

ranty non-facts now:

When I’ve discussed this stuff with people, nobody seems to have a satisfactory answer. On one discussion here codegen + “functional core imperative shell” was mentioned. Rust, when faced with a similar problem, started an initiative to have “maybe_async” support (which has made some progress not not much and their problems are slightly different).

Javascript has “solved” the problem by only doing async APIs for things (on the “nothing should ever block” principle).

Like if someone has a good inventory of clean ways to support an imperative API that doesn’t boil down to “spin up an event loop and use that to call into the async version” then they should give a talk on it, because I think a lot of lib developers would be interested, not just Python people!

Or maybe that’s just the answer and it’s actually performant enough to not matter. But even then, asyncio’s lack of re-entrancy means that you run into the color problem if within an async call you go back to sync world and then try to go back into async world. This is a problem for the frameworks because frameworks get called, but then call back into user code.

Usual disclaimer with my rants: I might be missing something extremely obvious. I am definitely a bit in the weeds after having looked at Django’s issues in particular on this topic for a while.

Fully agree.