Add Virtual Threads to Python

Since dothesum() is a function call, you already have that problem, though. That function could be replaced with a wrapper that has an await point in it. My point is that, by allowing ANY PYTHON FUNCTION to await, you get right back to the problem of not knowing where your await points are, so that if you truly need to guarantee consistency, you need locks.

I said “thread pool”, not “one thread for every task”.

You can have as many threads as you want concurrency for, and then you farm tasks off to them all.

Just a question: in Java, virtual threads are always daemons. AFAIK Python wants to remove the suppport for daemon threads. Maybe the fact that virtual threads will be daemon threads will be not a problem, or simply they will be not daemon?

For me, Virtual Threads are worth doing if they solve the following problems, at minimum:

  1. Ability to use existing “blocking” code AS IS, including 3rd-party packages with extensions, supporting callbacks to python, back to extensions, back to python, and so on. This has long been the biggest limitation of asyncio - there are many packages that don’t have asyncio twins - and such multi-color development overhead is truly unwarranted because Java’s Virtual Threads support has already proven that it’s both possible and practical to abstract all that inside the runtime layer. I would much rather that developers use their creativity and precious time to invent and produce more useful features than do busy work porting between different frameworks. There is far more python-related code being written outside of the Python interpreter than in the interpreter itself, so the interpreter is the most economical place to abstract the differences between thread-based preemptive multitasking and Virtual Threads (cooperative multitasking).
  2. Ability to create thousands, hundreds of thousands, even millions of Virtual Threads - for concurrency, as goes without saying
  3. Ability to use I/O objects - e.g. sockets - created in a Virtual Thread inside one preemptive thread by a Virtual Thread executing inside a another preemptive thread.
  4. Ability to communicate between Virtual Threads executing in one preemptive thread with Virtual Threads executing in another preemptive thread; e.g. message queues.

I used gevent in the past - huge shoutout to its maintainer(s)! - but when it comes to integration of 3rd-party packages via gevent monkey-patching, this approach is a non-starter for production grade applications in my organization (your experience may vary).

Virtual Threads is the way forward.

How do you plan to achieve both of these? You can ask for the moon, but unless you have a plan, all you’ll get is a lump of cheese.

Just curious, does this mean these problems aren’t solvable in Python right now, or are they more like pain points?

@Rosuav - it’s what Java’s Virtual Threads do, which demonstrates that it’s doable.

  1. When executing code in a Virtual Thread, the interpreter can map operations like synchronization primitives to Virtual Thread-compatible synchronization primitives, I/O operations to the event loop-based I/O operations (including Virtual Thread context switch), etc.
  2. Virtual Threads are supposed to be lightweight, so you should be able to have lots and lots of them. That’s the whole point of Virtual Threads and their application to concurrency.

Oh, so they can still context switch at any point? In that case, yes, it’s doable, but it means that anyone using virtual threads has to think about the possibility of a context switch anywhere, and so not ALL existing synchronous code will be safe. In order to use existing blocking code as is, which was one of your stated goals, you need to ensure that context switches only happen at well-defined points (which is what async/await does), or alternatively, if you want to have context switches happen anywhere, you need to go through all blocking code and ensure that it’s thread-safe.

1 Like

OK. Does anyone have a viable implementation for Python, then? Because if it’s doable, as you say, then I doubt anyone would object to having the capabilities you describe - so the big questions are likely to be:

  • Is it implementable n all the platforms Python supports?
  • Is the implementation maintainable?
  • Are there practical limitations that make it less attractive than it seems in theory?

All of those would need a real implementation to evaluate.

We can debate the theory here endlessly. And indeed, that’s what we’ve been doing so far. Everyone has their own opinions, but to make any realistic progress, someone is going to have to come up with working code, if only to answer the sorts of questions I ask above.

I’m not personally convinced that your argument “Java has it, so it must be possible” is correct. After all, Java is famed for its “Pure Java” drive to reimplement everything in Java. Python doesn’t do that - we explicitly support extensions written in C, Rust or any other low level language you might prefer. So how would a virtual thread map an operation like a SQLite database write, or a Numpy vectorised array operation, onto a “virtual thread compatible” version? But regardless of whether I agree or disagree with your assertion, you can convince me easily by providing a working implementation. Are you able to do that?

@elis.byberi, I described in my initial post what I consider to be the core problem that asyncio didn’t solve. This discussion thread started with

Java has virtual threads. Virtual threads are a better way of doing concurrency than Python’s async and await. We should add virtual threads to Python.

It went on to call out the multi-color function issue in asyncio as a primary concern because it forces all 3rd party packages to have twin implementations - one for blocking and another for asyncio. Developing and maintaining twin implementations is simply not practical, and a terrible burden for developers, considering that a huge amount of 3rd party package work is done by unpaid volunteers who work full time jobs. As a result, it’s not uncommon to run into great, mature packages that don’t have an asyncio “twin”.

The monkey-patching solutions out in the wild typically rely on monkey-patching (along with greenlet) to make blocking code run with asyncio context switching almost always have a disclaimer in their documentation along the lines of “your mileage may vary”. So, you can’t rely on something like that in production-grade application.

Just to be clear - my point was that Java showed that Virtual Threads are doable. I didn’t claim that it’s trivial. Java Virtual Threads were developed as part of Project Loom over a period of several years, beginning around 2017 and becoming a final feature in Java 21 in September 2023.

Java Virtual Threads support C extensions like JNI and most other Java extensions, likely with minor limitations, so it’s not just pure Java either.

1 Like

And you would get the exact disclaimer the moment you use any of these with virtual threads. The reason for the disclaimer is that it’s impossible to know what assumptions the code is making, so there could be something that breaks due to a context switch. Virtual threads do the exact same thing. There’s no such thing as a free lunch.

Context Switching: The premise of Virtual Threads - which are lightweight abstractions and not real os threads - is that they will context switch when invoking well defined APIs that would/could block, such as socket I/O, mutexes, semaphores, time.sleep(), etc. They work a lot like asyncio tasks in this sense, but without various asyncio limitations, so same code that uses Python API can run in a real OS thread or in a Virtual Thread.

Race Conditions: Virtual Threads don’t introduce any new race conditions. If an app executes blocking code (it’s own and/or 3rd party package) in multiple os threads, it needs to protect shared resources with the appropriate synchronization primitives. Virtual Threads map all those synchronization primitives to corresponding Virtual Thread primitives - so, if that code is sufficiently protected against race conditions when running in real os threads, then it will also be sufficiently protected when running in Virtual Threads.

Right. So, virtual threads require that your code be thread-safe. And if your code is already thread-safe, then monkeypatching and greenlet will work with it too.

So what you’re saying is… what? That I should wait 6 years and then if someone has bothered to work on this feature I’ll have my answer? Or that I should support this proposal now and encourage the Python community to invest that sort of effort with no idea if it will deliver on its claimed benefits?

At this point I don’t actually have any clue what you’re proposing. Or even if you’re proposing anything beyond “it would be nice if someone implemented virtual threads in a way that satisfied the requirements I stated”. Which is true, I guess (it would be nice) but irrelevant (because there’s no sign that anyone will).

3 Likes

This is the one concern I’ve seen that I have real agreement with, but virtual threads as proposed wouldn’t solve it, just create a third function color that looks somewhat like an existing one. (It has different properties from existing thread-naive synchronous code)

It would also still require explicit synchronization points.

I mentioned previously that I viewed solving this as orthogonal to adding another concurrency model earlier, and months later, only feel stronger about that.

The work other languages are doing to add this is also non-trivial, though having followed it with interest, to me it seems that zig’s approach of passing an io interface is possible for libraries to do today, with some slight difference that would be ergonomically worse in python, and that something more similar to rust’s keyword generic initiative would be an ergonomic win.

macros (pep 638) could also help here.

This only scales well for things implemented by or written in java, and only ends up reasonablly ergonomic feeling for developers because of java’s synchronized. It was years to add to Java. It would be more complicated to add to python given the difference in languages and the difference in existing ecosystem and use cases at time proposed to each.

This is already possible in existing concurrency models available in python, so I’m not sure why you’re mentioning it here.

There is an issue here. Blocking code is not designed to be thread-safe and therefore cannot be safely used in any type of threads, including OS threads or virtual threads. I believe this point has already been raised multiple times by several participants, and I’m not sure how to phrase it differently. Introducing virtual threads would add another function color, turning a two-function-color problem into a three-function-color problem.

Comparing virtual threads to asyncio is misleading, as they address completely different problems.


Self-Answer: Yes, all of these are possible with OS threads. The only limitation is memory usage.

From my reading of the recent posts this seems to be what people want:

  • get rid of function colouring
  • VirtualThread’s seem (correct me if I’ve got this wrong) to be cooperatively scheduling within a CPU thread, and preemptively scheduling on multiple CPU threads, and this behaviour is desired
  • python thread synchronisation primitives which are VirtualThread compatible
  • whatever is done should maximise compatibility with existing libraries
  • whatever is done should keep cross-and future- platform maintenance down

There’s two approaches which exist, both of which tackle the function colouring issue:

greenlets provides a thread safe coroutine framework. Switches between coroutines are achieved by copying sections of C stack to and from the heap, and assembler fragments.

jonathanroach/cpython-await-anywhere adapts the asyncio system to avoid the need for await myfunc() and async def and other async primitives. Switches between coroutines (asyncio.Tasks) are achieved by managing the C stack as a heap, and setjmp() & longjmp().

greenlets is well understood, with a number of client frameworks, such as gevent and eventlet, which provides plug-in alternative, greenlet-aware synchronisation and communication classes.

jonathanroach/cpython-await-anywhere, as it is asyncio adapted, retains all the support of libraries for asyncio, and all the tools built into asyncio.

However, as far as I know, greenlet hasn’t gained significant traction with the higher level libraries, such as Django. Django has gone in the direction of asyncio, probably because of ASGI.

greenlet does have the downsides of needing assembler fragments, a platform support cost, additional coroutine switch costs from the stack copying, and it is another, different system for Python users to understand.

jonathanroach/cpython-await-anywhere has the hidden limit (as implementation at time of writing) of the coroutine stacks being limited to a thread’s C stack space, being new and untested, and each Task stays in one thread and using asyncio synchronisation.

I’m happy push forward with jonathanroach/cpython-await-anywhere to remove the stack space limit, provide thread & Task aware synchronisation, and multi-thread-scheduled Tasks (ie multi-thread asyncio) if that’s what people want.

This feature set, I think, would give VirtualThread functionality to asyncio, not another new system, high compatibility with existing libraries, and low/no platform support overhead.

What are people’s thoughts?

Personally, I understand the asyncio and threading models reasonably well. With asyncio, context switches only happen at await points, and functions are coloured. With threading, context switches can happen anywhere, and functions are not coloured. Both of these seem like clear trade-offs between predictability and convenience, and feel like they are at opposite ends of the spectrum of choices.

On the other hand, I don’t understand the mental model for virtual threads. It seems to be getting presented as a “best of both worlds” solution, but it’s not clear how it would work in practice. No function colouring, fine, but then how do you get predictability of where context switches can happen? If you can’t, then we simply have the threading trade-off, so how is this better than threading? Is it just intended to be “threads, but with less overhead so you can create thousands of threads without worrying about resource limits”? If so, then doesn’t a worker thread pool handle that use case just as well?

To put this another way, when would I use virtual threads rather than asyncio or threading (maybe via a worker pool)? What use cases don’t work with one of those models?

If virtual threads are just “threads, with less overhead”, then clearly they should be implemented, as a replacement for the existing threading module. But that’s a straw-man argument, because I don’t genuinely believe that virtual threads have no downsides compared to normal threads. So what I really want is for someone to explain to me what those downsides are, and not to handwave them away with suggestions that the downsides “don’t matter in practice”. Maybe they don’t - but I’d like to decide that for myself, please :slightly_smiling_face:

6 Likes

Or as a reimplementation of the existing threading module. However, I suspect that you’re right and they can’t achieve that. “Threads, but with less overhead” is a great marketing line, but if virtual threads get “stuck” (gethostbyname being a common culprit), they need to be different.

When compared with normal threads, I see these trade-offs:

  • Making and destroying them does not require (but can cooperate with) kernel threads.
  • They don’t require allocating a C stack, so they may avoid some memory overhead.
  • There are some challenges related to interoperability with the C stack.
  • When single-threaded, they can block the process because they are not preemptive.

I’m willing to accept some of these trade-offs in many cases. In constrained environments like my son’s Lego robotics controller, system threads are not an option because micropython doesn’t support them at all, but even if it did the memory constraints would be pretty limiting to having stackful threads. Since there are no virtual threads, they are having to learn about explicit cooperation in order to do even the simplest multitasking. I find it a rough learning curve when scratch can do concurrency without introducing function colors, but Python can’t.

For systems with more resources where threads aren’t resource-prohibitive, I think threads may often be a better go-to. But even with resources, optimizing those resources by using lightweight threads can be really helpful, and a generally good trade-off.

For some problems, taking on the complexity of cooperating with the system is worth the overhead. I don’t feel microcontrollers aimed at new learners is one of those places, except that Python just doesn’t have virtual threads so there is little alternative, so I wish there was a better alternative to point to.