Add Virtual Threads to Python

I think people are missing @ryanhiebert’s broader point. He’s not saying that threads are easier than async/await. He’s saying that learners are often faced with dealing with async/await even when they are not doing anything concurrent at all. Here are the examples on the front pages of two major HTTP libraries:

import aiohttp
import asyncio

async def main():

    async with aiohttp.ClientSession() as session:
        async with session.get('http://python.org') as response:

            print("Status:", response.status)
            print("Content-type:", response.headers['content-type'])

            html = await response.text()
            print("Body:", html[:15], "...")

asyncio.run(main())
>>> import httpx
>>> r = httpx.get('https://www.example.org/')
>>> r
<Response [200 OK]>
>>> r.status_code
200
>>> r.headers['content-type']
'text/html; charset=UTF-8'
>>> r.text
'<!doctype html>\n<html>\n<head>\n<title>Example Domain</title>...'

A student using aiohttp has no choice but to deal with concurrency and function coloring. A student using httpx can simply focus on the procedure. Of course, the httpx devs also have to support asyncio, so some portion of their effort is dedicated to maintaining two APIs.

2 Likes

With respect, I understood that just fine. I just don’t agree.

If you could teleport into a parallel universe where Python never had async/await and always had either free threading or a built-in lightweight threadlike construct with similar semantics, you would find plenty of cases where people wrote code they thought was OK but turned out not to be because some other code, that they didn’t notice or didn’t know about, decided to do threading with concurrent access to a resource that turned out not to be thread-safe.

So let’s take the HTTP client example since everyone seems to like it: imagine in this parallel universe I’m showing someone how to write a script which takes a data structure containing a bunch of URLs, and turns that into a data structure containing a bunch of responses (or results of processing the responses). Even if the developer-facing public API is as simple as get_batch(urls, response_hook), I don’t get to ignore threading! I have to worry about the thread-safety of the data structure I use to pass in the URLs, because it’s not obvious from that signature whether the author of get_batch() worried about it. I have to worry about the thread safety of my response hook and any data structures it works with. I do not ever get to just gloss over it and pretend these problems don’t exist.

Doing batched URL fetching and response processing in a thread-safe way is actually not trivial! In a free-threaded, or virtual-threaded, or green-threaded, or whatever-threaded world there is no escaping thread safety concerns even for this “simple’“ example. So no, it is not automatically easier to teach or to learn than an async/await version – either way you have to teach the problems that can come from concurrent access to a data structure or resource. But in one world–the async/await world–at least there is a syntactic indicator of where this might happen. While in the threading world there is no indicator whatsoever (in Python; as mentioned previously, some languages do have syntax to mark threading-sensitive functions/methods/code-blocks).

Have you noticed Armin in Playground Wisdom: Threads Beat Async/Await | Armin Ronacher's Thoughts and Writings mentioning Scratch? Have you noticed that the whole thing is doing multitaskinig all the time without anybody (especially kids using it) noticing? That’s the point, I believe: we need a way better ways how to make things palatable for normals. I don’t think the discussion should be that much about threading versus event-loop (and yes, many present it as such, and that’s a mistake IMHO) as against different ways how to make multi-tasking useful and easy to use. What’s going in the background (multithreading, multiprocessing, or event-loop) should be IMHO secondary.

1 Like
  1. Library maintainers are supposed to know more what they are doing. Who really matters and where the ease-of-use should be oriented towards are end users.
  2. End-users doing sequential things fortunately don’t need to know anything about threads. Users of aio* libraries do need to know a lot about multitasking even when they don’t need it.
2 Likes

Yes, we are off-topic here, so I will just note that if you say semaphore, then you completely missed my point. I was talking about ways how to hide multitasking complexities behind some better API. When you pull out the most internal technicality of the multitasking world possible (semaphores), you are going straight against what I meant.

Scratch does an enormous amount of hidden work to pull that off, and mostly is able to get away with it because it’s a visual programming environment with a preset catalog of “blocks” whose implementations are designed to be safe for the concurrency model in use.

Building something with the broader expressiveness of a language like Python and the safety guarantees of Scratch is not a trivial or easy task. Again I’ll point to Rust as an instructive example: it provides a lot of safety guarantees and a lot of freedom and expressiveness to the developer, at the cost of a learning experience which is not generally considered simple or easy.

You could build a language where every variable, every data structure, every resource secretly does locking and synchronization behind the scenes, I suppose, but that feels likely to wind up with performance issues on par with early garbage-collected runtimes. I don’t think we’re at the point yet where things like network daemons could be written in such a language with acceptable performance. And even if you did that, it would not look like the threading interfaces people are asking for.

1 Like

I haven’t noticed. Have you ever written a web server in Scratch, and if so, how many transactions per second did it achieve?

If those questions make no sense even to ask, perhaps that’s a key difference between Python and Scratch.

Who cares? Do they care in java land (I haven’t read it in /r/java/.)? I don’t care most of the time when I code with gevent.

Feel free to provide as complex example as possible.

Don’t allow virtual threads to move between physical threads. And provide ability to “spawn virtual thread in current physical thread”.

Then you have 2 types of objects, global that can be changed/seen by any thread and need locking, and physical-thread-local, that can only be accessed by virtual threads of the same thread where you don’t need locking most of the time.

1 Like

Do you ever care about whether there are switch points? If not, then yay, you have threading. Python already has this as an option.

I’m still very confused as to what “virtual threads” are. They seem to be … just threads.

1 Like

I try to split tasks so I don’t have to care most of the time. Tasks are either completely separate (a task per request) or have a “merge” operation in the end.

Threads have overhead & I will need to add locking (especially with free-threading where 2 threads will be executed at the same time). I also can’t mimic the “keep these threads in 1 vcore so I know this variable will only be accessed by 1 green-thread at all times because it’s “local” and I don’t need to add locking” (kinda like shared-nothing in seastar framework).

They are green threads with less overhead than normal threads and automatic suspension on blocking calls, so keeping synchronous code. You can spawn 10K+ threads (I’ve spawned 5K+ on a crawl job with gevent)

At the (rare?) points when you have to care about switch-points you do locking.

I think that’s mostly true. There are APIs that could be better, but they apply to real threads as well. Historically, I think there has been overhead and OS limitations with regard to spinning up real threads that an N:M threading model can help alleviate. It’s not clear to me what those limits are.

If I’m thinking about it right (which I’m not at all sure I am) I think that Linux will time-share based on the process (and threads are lightweight processes) so if you use a zillion threads then Linux will give you a zillion timeshares where an N:M system can allow you to tune how many total threads your program will use and thereby limit how your program affects other programs on the same machine.

There are also some problems that I think might end up being a little more obvious how to find a solution for if the interpreter owns the loop and threading model. Particularly, unifying all the various ways you might wait for something.

async/await is able to have the awesome asyncio.gather or asyncio.wait method because it is using futures/promises/tasks to unify the different ways of waiting. Users can do that without async/await using concurrent.futures.wait but it still requires user management of futures. Go has a blessed concurrency primitive (channel) that allows for a language-level first-wait ability (select) and can avoid promises entirely to make it work.

Virtual threads in the interpreter seem like they might give a good place unify how to wait on things (like tasks or futures are currently), but at a lower level so that users might not usually have to have any knowledge they exist, where colored functions do require that knowledge. I acknowledge you might not need virtual threads to do that, but I do wonder whether that almost ends up happening out of necessity from implementing virtual threads.

Semaphores are a pretty basic and fundamental concept. If someone can’t use them appropriately, they are in for a hard time.

You’re right that in the particular case of simulataneous http connections that they could be the wrong tool, but the executor is definitely the wrong place to limit them. If you care about the number of simultaneous connections, and aren’t making the lowest level requests possible yourself and managing them, you should explicitly limit that in your http library of choice[1], not elsewhere. Otherwise, it’s possible for an http library to decide to itself spawn worker threads/tasks to try and do more at once, or keep connections alive longer than you are thinking it will for potential session reuse[2]

These kinds of issues that people already don’t do correctly already and think are properly limited by limiting something else worry me when it comes to the future with free-threading, and to actually determining whether people’s concerns about future directions for concurrency in python are best handled by certain APIs and concurrency models.

People want something that “looks easier”, but inadvertently get multiple things scoped incorrectly, or incorrectly limit resource use with the current tools we have, and I don’t think trying to hide more of the details helps people in this case.


  1. eg. httpx or aiohttp ↩︎

  2. an example of this likely to come up with some people writing it with threading is found in request’s use of urllib3 which keeps an internal connection pool and keeps connections alive. ↩︎

1 Like

This is a strange argument. I have written thousands of scripts without using a single lock or await. You can write an entire data pipeline that way. If calling get_batch() requires you to worry about threading then that is a terrible API. If a library uses threads internally, that is the library’s problem. It’s why we advertise whether classes are thread-safe, or not. async/await will not magically make threads safe for you. All of the “concerns” that you mention are exactly as likely whether you await or not.

Here is MIT’s Intro to CS course curriculum. Where are threads? Where are continuations? This is how it’s been done for decades. Why are you now trying to claim that we need to put this front and center for every student?

There seems to be an idea in this thread that if we just use await everywhere, everything will be safe. Cooperative scheduling does not mean you don’t have to worry about locks anymore. You just don’t have to worry about them as often.

Threads are not going anywhere. They are a perfectly reasonable solution to a far broader range of problems than continuations will ever provide. In fact, we’re about to get a whole lot more threading in Python.

1 Like

If they’re just “threads but more efficient”, then this isn’t something to add to Python. It could very well be a completely invisible change to the way that threads already work. But somehow I can hear, off in the distance, a horde of people screaming “GIL! GIL! GIL! GIL!” and refusing to believe that Python can do threading. To an extent, they’re correct. If you want to magically have “threads but more efficient”, what you need is a way to guarantee that they aren’t going to trample on each other’s data, and that means you need locks. Locks are inefficient; a single broad lock is therefore much cheaper than a huge number of locks; and so we get the highly efficient global lock. But any given lock can only be held by one thread - virtual or otherwise - at a time. So we’re right back where we started. Maybe with less overhead, but right back where we started with concurrency.

Hmm. It depends somewhat what “blocking calls” means. Either that’s literally just threads, or you’re going to run into a fundamental problem with gethostbyname(). How do you deal with slow system calls if not by using OS threads? Can you spawn 5K threads and have them all simultaneously inside a call to gethostbyname()?

1 Like

Virtual threads are just like asyncio coroutines without the explicit async/await syntax and with a thread-like API. In that sense, they are already in the language. This thread is suggesting that the virtual thread interface is a better way to work with them.

For any question just look at what gevent does. It’s been around for 15 years, it has most of the answers. For cpu-heavy tasks that I can’t “yield” in the middle of computation I use native-threads.

The gevent.socket.gethostbyname() function has the same interface as the standard socket.gethostbyname() but it does not block the whole interpreter and thus lets the other greenlets proceed with their requests unhindered.

I have? What can be the issue here? gethostbyname() can have multiple levels of cache (one in-process, one in os) (I haven’t seen internals) and you protect against cache stampede with a dogpile lock with a key on domain name. You spawn them all, they all have a cache miss, OS makes 1 DNS request, they all get from cache next time they resume.

Blocking is every call outside the process, db, syscall, subprocess, http, db, dns, socket, filesystem, etc where the cpu will wait.

3 Likes

But if that’s all they are, then (a) they can context-switch between any Python bytecodes, which means they’re NOT like async/await; and (b) they can NOT context-switch in the middle of a slow system call like gethostbyname. So they’re worse than threads AND worse than async/await.

When you use async/await, you are guaranteed to not context switch until you explicitly choose to. This is efficient and easy to get your head around. It means that x[y] += 1 is safe to do and will always add one to the value atomically. So you’re losing that, and gaining very little.

They switch at IO and yields, just like asyncio does. It’s not explicit, but it’s just as safe.

1 Like

This isn’t true in the context of what you replied to. Asyncio code can often protect against data races by control flow (And not yielding when in a section that requires at most one task be doing something), the equivalent threading code, especially in the example case of incrementing a shared value, needs locks (or an atomic integer type, which isn’t provided in the standard library)

The OP explicitly mentions Java, which has recently standardized virtual threads. So if you are still confused about virtual threads you can start there.

Another useful reference would be how goroutines work in Go.

1 Like