Add Virtual Threads to Python

I don’t see why having a thread-like interface requires that they be preemptable.

But if they are to “only switch execution at well defined points in the code”, how can that be done without function coloring?

Blocking calls (blocking the virtual thread, not the OS thread) like time.sleep() must include context switching points, but time.sleep can be called like any other function.

Or is it that every function becomes red under this plan, and thus all code pays a price for the mere existence of virtual threads, even if it doesn’t use them?

Yes, but the price is almost zero.

1 Like

And viola, function coloring here too. This places a limit on what functions may call other functions, except it’s worse than the async/await case because there’s no syntactic indication of this.

3 Likes

I’m also not sure how you’d simultaneously:

  • have clearly defined context switches
  • not have function coloring (even excluding the interaction with async/await)
  • not have an issue with ffi

It depends on what you mean by an issue with ffi. Context switching from Python code that has been called from C code, that has been called from Python within the continuation will raise.
That is a limitation of any portable implementation of continuations without significant C API extension.
If this is deemed insufficient by the community, then we can look to implement a (complex) C API to support suspension, or incorporate what gevent does and swap the C stack.

Any two of these are accomplishable trivially

I’m glad you consider it trivial, others in this discussion seemed to be implying it would be too difficult :slight_smile:

1 Like

I would find this to interact badly with the main reason I use python; It’s a high level language that’s quick to write correct code in, that I can accelerate specific tasks with native code. A limitation like this is a hidden function color, adding another one. Rather than it being syntactically obvious though, this becomes something functions now have to document, and users have to be aware of.

I’m also wary of the pairing of wanting clearly defined context switches and library code here without syntax indicating it, unless functions that use continuations are their own function type that are always callable in the same way normal functions are. This makes their type at least indicate to the caller that they may yield. Currently, this is covered by syntax for async/await as a model, and is implicitly always possible with threading as a model via preemption.

I think function coloring actually serves a useful purpose that far outweighs the costs, as very few people want to write state machines in C for cooperative userspace scheduling, there’s a reason people use python. And almost nobody writes generators with the intent of using them as coroutines currently, even though it is possible to do so.

4 Likes

I’m trying to decide if I understand the difference between a virtual thread and a standard Python thread. My impression is:

  • I would only use/encounter a virtual thread in a place where I might currently use threads. If some library I’m using is using virtual threads under the hood, that’s their business.
  • Virtual threads block and switch contexts at the same places that normal threads do (e.g. I/O, waiting for a lock, etc). Under the hood this might be implemented differently but I believe this is what I would observe as a user?
  • Python running in a virtual thread can use extensions, but potentially there is some issue with that code calling back in Python. I’m not sure what an example of such a usage might be, but I don’t write a whole lot of extensions (and never in C).

The main difference is the plumbing–a virtual thread isn’t running in its own OS thread, and so there’s almost no limit to how many I can ask for in my application.

Is all of that accurate?

1 Like

Anyone who’s interested enough in the idea to want to explore it further. That includes me – even though I invented yield-from with the express intent of using it for lightweight threading, these days I’m rather dismayed at the direction all the async/await stuff has gone in, and I’d quite like to have a viable alternative. But as always, the devil is in the details.

2 Likes

I agree that there’s no need for virtual threads to be preemptable. But
the conclusion I’ve come to is that if you’re not going to have
syntactic function colouring, then you shouldn’t rely on knowing which
parts of the code can switch contexts and which can’t. In other words,
you should approach things as though**preemption could occur at any
time, and use the same means of dealing with it – locks, queues, etc –
that you would use for native threads.

So I would say that if you’re trying to sell the idea of virtual
threads, sell them on the basis that they’re lightweight, not that the
potential context switch points are known. Because in practice, they
really aren’t.

6 Likes

That is my understanding too.

I expect the “hidden coloring” that Elizabeth noted will be be an issue. Compiling a function with Cython or Numba shouldn’t make it unusable for some contexts.

As shown by Greenlet (born 1998 as part of Stackless Python), C stack switching is possible with relatively minor changes to CPython – though there’s some trouble to adjust to current implementation details of each release, and there’s some platform-specific code involved.
IMO, if Python grows “virtual threads”, it should bite the bullet and integrate Greenlet/Stackless – and possibly only raise a “C function on stack” exception on platforms for which we don’t (yet) have the necessary blob of assembly.

13 Likes

I don’t think they are, but I’m not an expert on Go. My understanding is that the Go runtime multiplexes Goroutines onto a number of OS threads (controlled by GOMAXPROCS). It’s an example of M:N scheduling, where M goroutines are multiplexed onto N OS threads. Usually M is much larger than N.

We don’t have to make them preemptible to start, but I think the main use case for virtual threads is that code will run ~same on normal threads and virtual threads, and normal threads are preemptible. So users and library authors will need to write code that expects preemption anyway. Because there’s no coloring syntax it’ll be very hard to tell when a suspension point may be hit, so again code will need to expect it. A more complex “event loop”/runtime that does preemption can come later (or be a third-party component, like uvloop is today?), although it might need help from the runtime.

8 Likes

While this would probably provide a fast path to implement your idea, greenlets are written in C++, so they would have to be rewritten in C.

I can’t comment on how much added memory overhead this would incur, since this was never a problem for our use case (a server using gevent), but the overall experience of using gevent and greenlets in the project was excellent.

In our case, we did have callbacks from C to Python, so having these work in the gevent async context was important.

This also is something which I find lacking with the existing Python asyncio support: it should be (easily) possible to write async code in a C extension which then plays nice with the async event loop. Example: code which interfaces to a database client lib in C and performs a long running query.

AFAIK, the only safe way to get this working today is by having the C extension run a new thread to avoid blocking the event loop, which kind of misses the point of using asyncio in the first place.

It’s not the end of the world, but far from the usual elegance we have in Python, so having an improved context switching approach to build on, with support for C functions on the stack, such as greenlets, would go a long way.

3 Likes

There was a pre-PEP to add API for that, but since it didn’t need any changes to CPython, it’s now a third party library.

3 Likes

Thanks for the pointer. I wasn’t aware that something actually became of that PEP.

What I find sad, though, is that such an important addition to the C API has not made it into CPython.

Not needing changes to CPython doesn’t sound like a good argument against including that new PyAwaitable API. Or was the inclusion just postponed ?

4 Likes

First thing that I think reading this is as Erlang’s processes scheduled. Are you thinking about preemptive scheduling for virtual threads?

Starting with an external module is normal practice, in cases where it’s possible.

As the current maintainer of greenlet and gevent, I would be very happy to see greenlet, replete with its C-stack switching ability, added as a standard part of CPython, just as it is with PyPy.

The primary motivation for switching greenlet's implementations from C to C++ was to get the compiler’s help with reference counting/memory management. Prior to that, there were some leaks that were difficult to find and solve; the C++ implementation solved them. A secondary reason was to ensure correct behaviour in the event of C++ exceptions raised by native code (especially common on Windows); previously, such exceptions could result in undefined behaviour, up to and including crashing the process. Finally, I hoped that some of the syntax sugar (such as operator overloading and exceptions) would improve code readability, though I fear this goal was only partially reached. I expect most of the improvements (with the possible exception of handling C++ exceptions) can be replicated by hand in a port back to C, albeit somewhat tediously.

16 Likes

An example:

from scipy.optimize import minimize

def square(x):
    return x**2

xmin = minimize(square, x0=0.5)
4 Likes

The C-stack switching ability is really cool, and a killer feature. If I’ve understood correctly (and I very well may not have), I’ve gotten the impression somehow that gevent is not compatible with freethreading or with moving green threads between OS threads. Is that correct, and if so what are the challenges that we’d face with trying to lift those restrictions?

Making greenlet compatible with GIL-less CPython would be a matter of introducing the correct locks to substitute for the GIL. Alternately, with integration into the interpreter, it may be possible to eliminate the mutable global state currently protected with the GIL.

I doubt it would be possible to move greenlets between OS threads, specifically because of its ability to swap C-level stacks. Because there may be pointers to the C stack on the C stack, we have to guarantee that we swap stacks back to the same memory location they came from so those pointers remain valid; that means remaining in the same OS thread and its allocated stack area.

2 Likes

Thanks, that’s what I was wondering. Is that because entering the c stack ends up making Python calls from that c code pinned to the thread? I think that sounds like a really useful compromise for compatibility, but pinning is an annoying requirement in principle. I’m hoping that we’d be able to allow new C extension code that was able to be aware of this feature to maintain that mobility through a different API, so that there’s at least a path for scheduling virtual threads to any free OS thread.