On the notion of what it means to be "thread safe"

In discussion of Sam Gross’s noGIL fork of Python 3.12 (happening in other groups here and here), I’ve seen a few comments which make me think some folks don’t really understand thread safety. In fact, I think some people might believe their code is thread-safe because the GIL is present. That’s incorrect, and I’d like to open a discussion here to try and bring everybody up to a common baseline level of what it means to be thread-safe. Disclaimer: I am no threading guru. I’ve used it a bit over the years, generally in a producer/consumer context where I can simplify my life by using queue.Queue as the communication channel. Every now and then I might need to lock something. I can’t recall ever having to lock two separate bits of data (two locks, acquired at different times). Note that I am only concerning myself with thread programming at the Python level.

To start with, I have a mental model of thread safety which contains two basic elements (there might well be more):

  1. No shared data is accessed by a thread which hasn’t first locked it appropriately.
  2. There are no deadlocks in the code (this is the “two lock” situation I alluded to above).

This topic has been discussed here before. In that thread, @steven.daprano provides a concrete example of what can go wrong when access to a shared bit of data isn’t properly mediated. That’s worth reading. Steven’s example of x += 1 is an excellent of demonstration of my item #1.

Let me expand on the deadlock idea a little. Suppose we have two threads, A and B, each of which wants to manipulate x and y. For some (probably not so) strange reason, thread A generally discovers it needs to manipulate x first, then sometime later (say, after many bytecodes have been executed), discovers it needs to also fiddle with y. In contrast, thread B always wants to access y first, then migh later discover it needs to also manipulate x. Flow of control might look something like this:

  • Thread A gets control, locks x
  • Electrons are shuffled back-and-forth for a bit
  • Thread B gets control, locks y
  • Electrons are shuffled back-and-forth for a bit
  • Thread A regains control, discovers that it really wants to also work with shared object y, so attempts to acquire a lock for it
  • The attempt by A to lock y causes it to lose control of the virtual machine
  • Thread B regains control, executes for a bit, then decides it really wants to also work with shared object x, so attempts to acquire a lock for it

Oops. We have deadlocked. Neither A nor B can acquire the second lock they want, because the other thread is holding it. My purpose here isn’t to solve the deadlock problem, just to note that it exists when you don’t exercise some care.

I’m sure there are other examples of threading problems which can occur, but I suspect that most are assembled in various ways from these two basic problems, access to shared data and deadlock.

Notice that neither the simpler example of two threads modifying a single shared object, nor the deadlock (deadly embrace) example says anything about the GIL. Both of these problems can (and do) occur in current Python code today. They are well understood issues when one writes multithreaded code, and existed long before Python did. An external library (in a separate thread someone mentioned that glib’s setenv function isn’t thread-safe) can have (and hopefully document) threading problems. Using them today requires proper safeguards at the Python level. @cameron provided pseudo-code in that other thread about how to deal with thread-unsafe code over which you have no control.

So, that’s my minimal contribution. A couple days ago I came across a set of slides from a talk given by Aahz at OSCON 2001. I had to resort to the Wayback Machine. Despite the fact that Python 1.5.2 was current at the time, it’s still worth your time if you’re just getting started with threads.

5 Likes

To avoid deadlock the rule is to always acquire locks in the same order in all threads.
In the example above the design must impose an order on x and y.

Lets assume the order is x first then y.

If a thread acquired y and later decides it needs x it must do the following.
Release y, acquire x, acquire y.

1 Like

A lifetime ago, I was a Java developer and I really enjoyed working on multithreaded code. (The OpenJDK JVM [the implementation I used] has, and had, free threading back then.)

In order to facilitate actually even starting to write multithreaded Java, we had the Java memory model: Java memory model - Wikipedia. This is basically a spec/description of what you can expect from a Java implementation with regards to how threads read and write shared data. For example, when cpu caches might be expected to flush to main memory so other cores (and threads running on them) can see the changes.

My understanding is that due to the gil, python had no need of a spec like this, but with nogil it will. So this part will be new to python developers, and I am curious myself.

But this is somewhat lower level than applying locks to critical sections to prevent race conditions; as you say, we need to do this even with the gil.

3 Likes

To avoid deadlock …

Yeah, I didn’t mean to solve the deadlock problem, just show that it is a problem.

Knowing that there is a way to avoid deadlocks does not prevent new code having deadlocks. There is a need for tooling to help debug deadlocks.

Does python need a deadlock detector?

There is already a need to debug deadlocks today (even with the GIL). Thankfully there are a few options out there already, I have used py-spy with success: py-spy · PyPI

py-spy dump --pid 12345

prints the python tracebacks of all threads of an already running (potentially deadlocked) python program.

2 Likes

I agree py-spy is a useful tool, especially for deadlock detection.