Please take race conditions seriously when discussing threading

markshannon · May 26, 2025, 9:19am

On discussions of free-threading, one comment I see repeated is that free-threading doesn’t introduce race-conditions. The claim is that the race conditions were already present, and free-threading just makes it more likely that they are exposed.

This isn’t true, and in my opinion dangerous. It allows advocates of PEP 703 to dismiss race conditions as not a problem. They are a problem.

Consider this code:

def make_payment(buyer, seller, amount):
    buyer.account -= amount
    seller.account += amount

or this:

def increase_dosage(patient, amount):
    if patient.dosage + amount < SAFE_DOSAGE:
        patient.dosage += amount

Both these examples are thread safe in recent versions of CPython.
With PEP 703 free-threading, these are no longer thread-safe.

Switching to free-threading by default is going to create bugs where there were none.

In case anything thinks I’m exaggerating the problem, and this will all be fine, I’d like to remind you of Therac-25 which caused six cases of death or serious injury. The general software quality was poor, but the reason there were six cases rather than one, was that the bug was a race condition and couldn’t be reproduced by the manufacturer.

I’m not claiming that there are no race conditions in Python with the GIL enabled. There are and we should take those seriously too. It is just that PEP 703 means that race conditions will become much more common.

Rosuav · May 26, 2025, 9:35am

What guarantees this? (Your first example doesn’t have any race conditions in it, but besides that.) Is it merely a coincidence of implementation? If so, the race conditions ARE present, same as we would consider code buggy for relying on refcounting and immediate disposal.

markshannon · May 26, 2025, 9:59am

According to wikipedia a race condition is when " system’s substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent results"
By that definition the code examples I gave have no race conditions using Python 3.10+. They did in 3.9.

Is it merely a coincidence of implementation?

That 3.10 doesn’t have those race conditions was as a result of fixing a specific race condition: Issue #74174.

I’ll let you decide whether that counts as a “coincidence of implementation” or not, but I’d rather we keep removing race conditions, rather than adding them.

Rosuav · May 26, 2025, 10:21am

Yes, and I’m not disputing what a race condition is. (You possibly intended for the first example to disallow if the balance falls below zero?)

I only skimmed that but it doesn’t look like a guarantee to me, just a coincidence. These sorts of “check, then mutate” idioms are inherently flawed and that isn’t changing (unless you use well-defined yield points, eg with asyncio).

pitrou · May 26, 2025, 10:29am

Are they? I would never assume so. They were not in the past. I don’t think we are guaranteeing anything about it in the future. It would be spectacularly foolish to write multi-threaded Python code this way.

The snippets you show are buggy to me if they are meant to be called from multiple threads.

There is basically zero chance that the author of such a piece of code is such an expert on CPython bytecode execution internals ^[1] that they concluded - after detailed analysis - that not taking a lock was a correct (and reasonable!) thing to do. What actually happened is that they forgot to take a lock, or perhaps they didn’t know that someone would be calling their code from multiple threads.

“It works by chance despite being worryingly brittle” is not the same as “it does not have any latent bugs”.

Is it bad if some piece of code that appeared to work stops to work after an upgrade? Yes. Is it CPython’s fault if someone unwillingly relied on some obscure detail of bytecode execution semantics? I don’t think so.

What is this example supposed to prove exactly? That multiple factors can conspire to catastrophic consequences?

How many people in the world qualify? 10? 20? ↩︎

JamesParrott · May 26, 2025, 11:00am

Does the new free threading really mean those examples now don’t need to be:

async def make_payment(...):

async def increase_dosage(...):

malemburg · May 26, 2025, 11:07am

I think we need to add some more light weight lock tooling to Python to make protecting such sections easier.

The threading.Lock() / .RLock() we have seems to heavy weight and verbose for using in larger code bases:

#!/usr/bin/env python3
"""
    Micro benchmark for thread locks

    dependencies = [
        'egenix-micro-benchmark',
    ]

"""
import micro_benchmark

###

@micro_benchmark.configure(iterations=10000)
def bench_without_lock():

   # Init
   x = 1

   # Bench
   x += 1

@micro_benchmark.configure(iterations=1000)
def bench_threading_lock():

   # Init
   x = 1
   import threading

   # Bench
   with threading.Lock():
       x += 1

@micro_benchmark.configure(iterations=1000)
def bench_threading_rlock():

   # Init
   x = 1
   import threading

   # Bench
   with threading.RLock():
       x += 1

@micro_benchmark.configure(iterations=1000)
def bench_threading_lock_with_reuse():

   # Init
   x = 1
   import threading
   lock = threading.Lock()

   # Bench
   with lock:
       x += 1

@micro_benchmark.configure(iterations=1000)
def bench_threading_rlock_with_reuse():

   # Init
   x = 1
   import threading
   rlock = threading.RLock()

   # Bench
   with rlock:
       x += 1

###

if __name__ == '__main__':
    runner = micro_benchmark.run(globals())

Results on my machine:

bench_without_lock: Mean +- std dev: 17.4 ns +- 1.4 ns
bench_threading_lock: Mean +- std dev: 261 ns +- 6 ns
bench_threading_rlock: Mean +- std dev: 366 ns +- 20 ns
bench_threading_lock_with_reuse: Mean +- std dev: 197 ns +- 10 ns
bench_threading_rlock_with_reuse: Mean +- std dev: 214 ns +- 8 ns

Rosuav · May 26, 2025, 11:13am

Hmm. I worry that adding a new type of lock will lead more to confusion than to anything else. Using threading.Lock() is safe and reliable; will people know when it’s safe to use a faster lock?

If it’s possible, I would much prefer to have the API stay the same, but the implementation to automatically choose a level of overhead sufficient to provide safety. In other words, a single Lock() type that might use an OS-level lock if you’re using OS threads, but might use a lighter-weight lock if you are multiplexing virtual threads onto a single OS thread^[1]. That way, if you deploy a single-OS-thread application that relies on these locks, and then subsequently go “hmm, I want to scale this up, can I add a few additional OS threads to share the load?”, you don’t have to search through your codebase for things that will stil work, but are no longer safe.

Maybe that’s a pipe dream though.

though I’m still fuzzy on what exactly that means and how it’s different from asyncio ↩︎

malemburg · May 26, 2025, 11:31am

If we’re heading into a free-threading future (which I believe we are), then we should think about making locks a native Python system feature, much like we have async and await for asyncio code.

The compiler could then potentially optimize such locking sections.

pitrou · May 26, 2025, 11:48am

It is pointless to instantiate a lock in a local function and never use it elsewhere. A lock is meant to prevent from concurrent execution, so will obviously be some piece of state shared between different threads of execution, and slightly long-lived.

So instead of:

@micro_benchmark.configure(iterations=1000)
def bench_threading_lock():
   # Init
   x = 1
   import threading

   # Bench
   with threading.Lock():
       x += 1

try this:

import threading
lock = threading.Lock()

@micro_benchmark.configure(iterations=1000)
def bench_threading_lock():
   # Init
   x = 1
   # Bench
   with lock:
       x += 1

malemburg · May 26, 2025, 12:02pm

Sure and that’s what I did in the bench_*_with_reuse() benchmarks.

The way these micro benchmarks work is that the “Init” part is run once and only the “Bench” part iterated over many times. There’s also an optional “Verify” part which gets run after the “Bench” part to check whether the code actually did the right thing, but I left that out to keep the code short.

With a native language feature, the compiler could manage such lock objects and reuse them as necessary.

PS: I should have used code like this for the inner loop:

if x != 1000:
   x += 1

to produce a more obvious a race condition.

pitrou · May 26, 2025, 12:50pm

Ahah, I see. Thanks for the explanation!

smontanaro · May 26, 2025, 1:52pm

This seems to clearly not be thread-safe. Control could switch to another thread after the buyer’s account was debited but before the seller’s account was credited, then see incorrect balances. Even in a world with a GIL the call to make_payment or its code would have to be protected with some sort of lock.

Rosuav · May 26, 2025, 1:55pm

Skip Montanaro:

markshannon:
Consider this code:
def make_payment(buyer, seller, amount):
    buyer.account -= amount
    seller.account += amount
This seems to clearly not be thread-safe.

TBH money-movement examples really show that you need atomicity, which isn’t the same thing as thread safety. If an exception is raised between those two lines, you’ve just done a half a transaction. This makes an excellent example for transactional integrity in a database, but a poor example for locking in threads.

The other example (with a comparison followed by an update) is a better example of this, but it’s still clearly bad code regardless of the version of CPython it’s running on.

csm10495 · May 27, 2025, 4:01am

I generally agree with the thought process here. Additionally we can’t really assume that everyone who wrote synchronous python code understands the issues that can arise from it being used in parallel.

One side says: the code was buggy.
The other says: how do we protect them from the potential issues?

If at the start of python, it’s easier to start with the nogil mindset and teach more safety here. I’m not sure how to do that this far in. Nothing really is clear about that code being bad from the perspective of a single synchronous caller.

It kind of makes me think that free threading by default may be more of a Python 4 aspiration unless we have much better tooling to detect and mitigate potential issues.

So I ask a different question: how do we propose detecting and mitigating potential issues like the ones given by the OP? (I feel as though that’s a reasonable place to take the conversation to understand best paths forward)

TeamSpen210 · May 27, 2025, 9:54am

Well, one thing Python probably needs is documentation of the guarantees it does make for atomicity (beyond memory safety). For example, can you use dict.setdefault() in multiple threads to ensure a key is set only once? I’m pretty sure that’s safe even in free threading, but it’s not mentioned.

If we’re looking to add syntax for locking, it seems a waste to target such a low-level form of synchronisation - instead design something to ensure you don’t access attributes without locking, things like that.

Liz · May 27, 2025, 3:01pm

setdefault is both safe and consistent (multiple threads doing it at once will agree on the result) to use like this, but it’s possible you’re doing something wrong if multiple threads want to set an initial value.

While we definitely need better documentation about what is and isn’t threadsafe, when it comes to the builtins, all of them are. Beyond the builtins, some of the standard library is only theadsafe in a hollow manner, while it won’t crash the interpreter, it won’t do what a developer wants.

It’s consistency that’s the problem, and many examples of failure in free-threading fail to be consistent without free-threading as well.

nas · May 27, 2025, 5:06pm

I think having a bunch of locks is an anti-pattern. Better to share mutable objects between threads as little as possible. Instead, pass objects between threads using queues, for example. Shared objects should be immutable or persistent data structures.

a-reich · May 28, 2025, 12:55am

Genuine question: what are you basing that on or is there a place where this is mentioned? I mean particularly that they’ll behave consistently, not just “won’t crash the interpreter”.
It seems to me the docs page about free-threading only has a somewhat convoluted statement about “builtin types like… behave similarly to the GIL…”. There’s also an explicit negative thread-safety statement about iterators, of which there are several builtins.

ngoldbaum · May 28, 2025, 1:17am

See also the python wrappers: rpds-py · PyPI