Add Virtual Threads to Python

What needs to be awaited? What doesn’t? EVERY EXPRESSION could be a yield point.

I guess I’m trying to say that when you understand concurrency, you have a pretty good idea how to structure your programs such that IO switch points aren’t really a problem. Of course, it may be the case that I was working in an area of asynchronous programming that isn’t prone to these issues. Can you give me a sample of code demonstrating how a non-obvious switch point could create a bug?

How non-obvious do you want?

x[y] += 1

could cause a problem, since it’s implemented as “look up x[y], perform the aug-add, assign back to x[y]”.

Once again, my point is that I find code easier to reason about when written in the async/await style.

I didn’t say that it will “prevent bugs”. I just said I find it easier to reason about. Since this is not the first or the second time I’ve had to post a reminder of what I actually said, I’m going to ignore further responses which try to claim I said something else.

That makes me think… why don’t we just get rid of await? async def is fine as a hint your editor can use to fill in the awaits if you want them. Why can’t we approach this problem the way we did with typing?

There is Kotlin that is a prior art of what you described. It has function colouring (suspend a la async) but only on the declaration. The call site does not require an await keyword, and the IDE is responsible for marking the suspension points.

Though you don’t get these indications in e.g., GitHub pull request views or plain text editor.

On the other hand, you have languages like Swift doubling down on async-await, where they use it to signify not only I/O suspensions, but also:

  • non-blocking lock wait and thread hops in their new actor-based data isolation model.
  • region-based isolation for non-concurrency-safe mutable types (like a plain dataclass in Python for instance) at static analysis time to allow race-free data transfer between isolation domains (and indirectly threads because M:N) while preventing unintended concurrent access. (P.S., There has been a PEP discussion alluded to region-based isolation among many other things as well)

Not sure how much of these can apply or is relevant to Python as an interpreted language.

But otherwise, my two cents — coming from a long time Swift & Kotlin background — are that folks should perhaps consider:

  • Separate the discussion on async-await the language syntax (explicit suspension points) from asyncio the implementation (single threaded event loop)
  • Think about the isolation (synchronisation) story once PEP-703 fully opens up the multi-threaded world… maybe draw inspirations from other languages with prior art (e.g., Zig, Rust, Java, Swift, Kotlin), rather than trying to extrapolate more from existing Python code — most of which are de facto built around the shared-nothing, multi-process paradigm.
  • be aware that you have an option to bake virtual threads into the language, but as far as I can tell, that probably comes with another PEP-703 multi-year big surgery. (e.g., all stdlib synchronization primitives need to integrate or provide a version to work with the virtual thread scheduler for non-blocking waits)
  • be aware that you also have an option to achieve a library-level M:N cooperative task runtime on top of the async-await syntax. It likely requires only targeted language changes around coroutines to enable library-level solutions & some thinking in asyncio interop/integration. This is somewhat similar to how Kotlin Coroutines has been happily running on top of JVM as a mostly library-based solution, with fairly thin coroutine support in the Kotlin compiler/language, and without M:N support from JVM. But it does mean “doubling down” on the async function coloring — a problem for some; a blessing for others.
  • Edit: Worth noting that these two options are not mutually exclusive. They can co-exist and interop with each other.
7 Likes

It’s more like, “Danger! Everything you know so far may not be true after await.”

Great post, enjoyed reading it.

This was my thinking a few years ago, when it was increasingly becoming clear free threading was going to happen.

This property of asyncio (which I keep referring to as the asyncio invariant), where programmers can reasonably expect (some) state doesn’t change between explicit suspension points, is the problem here. (Correct me if I’m wrong, f.e. Rust does not have this and Rust programmers don’t expect it.) I don’t see how the asyncio invariant can be kept in the context of M:N scheduling, but maybe this is a failure of my imagination; counter-examples welcome.

We could say all right, let’s keep using async/await while dropping the asyncio invariant. The resulting code will not be asyncio but something different; call it asyncio++. No existing asyncio library will work correctly in this context. To my mind, we’re actually introducing a third color, instead of removing coloring via virtual threads. So now, instead of needing to hunt down the asyncio version of the library you want, you need to try and hunt down the asyncio++ version. Ugh.

For this reason, I don’t see these being orthogonal.

This isn’t actually true currently. It’s only true in a single threaded application. The better wording is one I provided above: tasks on the same event loop will not be switched between except at explicit yield points.

It’s important to call this out because even without new event loop things, I can see this coming up with free-threading if someone ends up in a situation with one library spawning threads as an implementation detail.

6 Likes

It is a problem but also not a world ending one.

A M:N runtime typically comes with the concept of “Dispatchers” where tasks are submitted to. With that, existing codes can run unconverted in a few ways:

  1. A thread-confined dispatcher — useful for e.g. representing the main event loop of GUI application frameworks, an I/O event loop, or working with e.g. C libraries that has niche thread requirements.
  2. A dispatcher “view” that enforces serial order on its task submissions, but delegates the task execution to the global M:N cooperative pool. In other words, you go for queue-based concurrency, where you can have as many serial task queues as you need that would execute concurrently.
  3. add appropriate mutual exclusion e.g., asyncio.Lock, It can be fine-grained on the objects themselves — though require refactoring — or more preferably coarse-grained on blocks of application logic & states. The challenge here is identical to that of GIL code → free-threading.

A M:N runtime tends to offer all of these tools.

That is a somewhat contrived example, but ok, it’s not entirely without basis. I have come across libaries that provide mapping interfaces to online services like redis, like redis-dict.

I was asking for an actual demonstration of the issue, but I’ll give it a shot this once. As far as I know, it’s not possible to implement __getitem__ or __iadd__ using async/await, but the following should approximate a theoretical implementation of your example:

import asyncio
from collections import defaultdict

class IODict:
    def __init__(self) -> None:
        self._d: dict[str, int] = defaultdict(int)

    async def get(self, key: str) -> None:
        await asyncio.sleep(0)
        return self._d[key]

    async def set(self, key: str, value: int) -> None:
        await asyncio.sleep(0)
        self._d[key] = value

async def incrementer(name: str, key: str, t: IODict) -> None:
    for _ in range(5):
        await t.set(key, await t.get(key) + 1)
        print(name, await t.get(key))

async def main() -> None:
    t = IODict()

    await asyncio.gather(
        incrementer('a', 'k', t),
        incrementer('b', 'k', t),
        )

asyncio.run(main())

The calls to asyncio.sleep() represent IO switch points. I’m sure you can spot the bug. What I have been trying to communicate is that async/await will not protect you from a bug like this. And that’s the thing… I can find countless examples of issues like the above for both threads and continuations. I cannot find examples where async/await actually helps at all. You can’t avoid thinking about these problems whether your switch points are explicit or not. redis-dict actually had a very similar bug with setdefault. I’m not saying these aren’t real issues. I’m saying async/await won’t save you. If anything, it gives people a false sense of security.

But all of this is secondary to my main point, which is that I can’t use the example above outside of an asyncio context. Even when I legitimately have no need for concurrency, I’m still forced to deal with it. There is a (very good) reason why most libraries also provide a synchronous API, but even if we set aside the fact that this creates an extra burden on library maintainers, there can still be compatibility issues with other event loops. My main point is that the very real cost of function coloring outweighs the supposed benefit to safety.

Why can’t we approach this problem the way we did with typing?

I’m still interested in an answer to this. To be clear, I’m suggesting that async def would remain. I’m not sure if there are technical challenges to make it work, but the idea is that you would still have visible awaits provided by your editor if you want them.

1 Like

It doesn’t even matter what the data type is. It’s implemented with three steps like I said. Now, there MAY be some protections in some situations, but in general, this is what it takes, and context switches can occur between these operations.

>>> import dis
>>> def inc(key):
...     count[key] += 1
...     
>>> dis.dis(inc)
  1           RESUME                   0

  2           LOAD_GLOBAL              0 (count)
              LOAD_FAST                0 (key)
              COPY                     2
              COPY                     2
              BINARY_OP               26 ([])
              LOAD_SMALL_INT           1
              BINARY_OP               13 (+=)
              SWAP                     3
              SWAP                     2
              STORE_SUBSCR
              LOAD_CONST               0 (None)
              RETURN_VALUE
>>> 

… and async/await helps you with that by …

2 Likes

I don’t think you’ve been reading my posts. I’m not replying further.

I can look at the single line of code await t.set(key, await t.get(key) + 1), and see the fact it’s context switching during a mutation, which probably isn’t concurrently safe. If the awaits weren’t there, I know (not probably know, not by convention) that there wouldn’t be any issues. Unless you use raw threads somewhere. With those, I have to audit all code that could possibly access that object and verify it couldn’t be operating simultaneously, to have the same guarantee without locks (next to impossible).

That’s a reasonable way to see it. I can see that it could spare you from worrying about some lines of code, but that quickly loses it’s value when nearly every line is awaited. In the case of actually implemented software like redis-dict, I think that risk is significantly mitigated by the fact that it is a self-contained library. There is a reasonable expectation that it will handle it’s own shared memory, otherwise it should be considered a bug in the library. I still don’t think the benefit is worth it.

@markshannon Beautiful proposal! I’m a developer, do not know much about compiler etc. But “colored” functions cause me a lot of trouble. I have no hope for Django becoming fully async. I have never found a language that did not benefit from adding virtual threads. Fibers(Ruby), Virtual threads(Java), Coroutines(Kotlin, Go) are all excellent implementations. It would be so much easier if to specify a function that runs on a forked virtual thread, and yields back to original thread after function finishes execution, without tracking async everywhere, and there’s no need to rewrite sync and async version of libraries. Concurrency by default will be awesome. Please go ahead and design a preview version of such a logic in CPython. I don’t have the know how to write the interpreter, but I can surely help to test out this experimental new feature.

3 Likes

Is there a reason you’re blind to gevent? It’s the elephant in the room that we keep mentioning time and time again but you all miss to acknowledge.

And gevent doesn’t get stuck on gethostbyname.

It doesn’t switch in gevent..


It’s easier to talk in single-thread case with GIL enabled I believe.

You’re also implying that you won’t have context-switching issues with shared variables in async-io with multiple hubs in free-threading.

1 Like

Strictly speaking, this is not true. The gevent version of the sample script above demonstrates this:

import gevent
from collections import defaultdict

class IODict:
    def __init__(self) -> None:
        self._d: dict[str, int] = defaultdict(int)

    def __getitem__(self, key: str) -> int:
        gevent.sleep()
        return self._d[key]

    def __setitem__(self, key: str, value: int) -> None:
        gevent.sleep()
        self._d[key] = value

def incrementer(name: str, key: str, t: IODict) -> None:
    for _ in range(5):
        t[key] += 1
        print(name, t[key])

def main() -> None:
    t = IODict()

    gevent.joinall(
        [
            gevent.spawn(incrementer, "a", "k", t),
            gevent.spawn(incrementer, "b", "k", t),
        ]
    )

main()

It has the same bug as the asyncio version. This, however…

… is completely irrelevant to this conversation. It only applies to Python threads. There has been quite a lot of misunderstanding and misinformation in this thread. It has only further convinced me that async/await is a bad idea. I used to think they at least served as hints, but now I think they’re more like blinders.

It all goes back to the fact that concurrency is difficult to understand. Yes, safer than preemptive threading, but only because switching is not as frequent. If you don’t fully understand it you will eventually run into the same bugs. I fear that in my attempts to clarify things I have only contributed to the distraction away from the core topic.

I respectfully suggest that if anyone is still confused about cooperative threading (and the fact that virtual threads are cooperative), that you seek support elsewhere so that we can return to the topic at hand and stop drowning out some of the very insightful comments that have been made.

The main point of Virtual Threads is their potential for parallel execution. That’s the focus of this discussion. If that weren’t the case, we could simply use generators. Within a single thread, virtual threads don’t provide any advantage beyond what generators already offer.

1 Like

At this point, I’m no longer sure what the value of this thread even is, so I’m going to bow out. Whatever insightful comments you wish to highlight, I shall not drown them out.

It’s worth noting, though ,that this isn’t going to achieve ANY changes in Python without the support of a core dev (or at least someone who can sponsor a PEP), and I don’t think I’ve seen any of that. Just lots of people arguing past each other about what’s great and what’s terrible with concurrency. So, have fun debating!