Thread safety now and in the future (no-gil)

I’m interested about changes visibility guarantees. Let’s consider a simplier case like this:

  1. x = 1
  2. thread 1 sets x= 2
  3. thread 2 waits in a loop until x = 2

Now questions:

  1. Are there any guarantees that thread 2 will ever see the change if no synchronization is present? I assume it is but wasn’t able to find this in documentation the last time I tried to.
  2. Is using threading.Lock enough? I assume it is but the problem I have is that this effect on changes visibility is not documented in the Lock itself.
  3. Is using threading.Event enough? I assume it is but again this is not documented in any way.
  4. Is using queue.Queue enough? The same problem.

I have experience with java and basically I am missing something like java memory model with formal guarantees of visibility.

Hello Roman and welcome to the Python community :waving_hand:

CPython doesn’t detail visibility guarantees as you point out, mostly because there’s no need to. Perhaps this should be better documented, but essentially Python has what we might call a “sequential” memory model, i.e. one in which bytecode instructions are not reordered by the compiler.
The consequence is that operations that affect memory are always executed in the order specified by the source program.

Yes, and I think you’ll find that for such a simple example, this holds for any memory model. (Including Python’s lack of a memory model.)
Not only this should hold across languages, but there’s no need to add synchronization to it, be it a lock, an event, or a queue.
It simply holds because there’s only one thread that ever writes to x.
See correction below.

In Java, there are complicated examples in which you need to know the details of the memory model in order to predict the outcome. In Python, I think it’s fair to say that the outcome is going to be the least unexpected one, because the interpreter is going execute the instructions in a program exactly in the order in which they appear in the source code.

Data visibility is not related to instruction order.
The traditional issue is that 2 threads may not see the same data in a memory location because of the cache design of the system.
In C level code a memory-barrier instruction is used to force visibility.

As I understand it these menory-barrier instructions are there for python, but have a statement to that effect in the docs would be useful.

1 Like

Yeah, in the Java memory model thread 2 may never see the change (assuming no synchronisation etc).

At present it is not guaranteed by the language that it will in Python either, as Python does not have a memory model. In practice, AIUI, it will always work on CPython, at least so far. Presumably it might not on GraalPy, since that inherits Java’s memory model.

1 Like

I will note that this has always been true. It’s not something new with free threading (i.e., the removal of the GIL). Therefore, I’d be strongly against any suggestion that such a guarantee (with its inevitable associated performance cost) be added without real-world examples of Python code that needs it. And even then, I’d prefer that it’s opt-in (i.e., you have to ask for the guarantee if you need it) rather than being imposed on all Python code - even single-threaded programs.

One of the critical acceptance criteria for free threading was that the performance penalty on single threaded code for supporting it wasn’t unacceptably high. The developers did a great job achieving that goal, and I’d hate it if that success was eroded away by imposing more and more overhead like this.

I think we need to remember that Python isn’t a low level language. If you want cross-thread cache consistency, maybe you should be writing an extension in C, or Rust, or some other language that supports that level of control.

I don’t want to seem like I’m constantly repeating the same message, but IMO what Python needs is better high-level abstractions that ensure that people can write safe multi-threaded code without having to worry about low level details like memory barriers and visibility guarantees. Leave those details for extension authors, who have the knowledge and tools to deal with them.

3 Likes

FTR, I totally agree. Aside from the single-threaded cost, it would also necessarily limit parallelism. If we document anything, perhaps it should be that this is implementation dependent.

1 Like

@colesbury will know for sure, but I think cpython is only safe if memory-barriers instructions or locks that make threads see the same view of memory are used. And those will be in the cpython internals already.

@barry-scott Does it mean that a change made by one thread on shared data is not seen by other threads (always/never/sometimes)? This would make the use of shared data from within threads hard to use in general.

Being pedantic all memory is shared in a multi-threaded process.

At the low level, machine-code/C/rust, sometimes threads do not see changes made in other threads. If it matters code must be written to ensure that all threads see a change that is critical to an algorithm.

In python I think that all changes to python variables will be seen in all threads. But we need the experts on free-threading to confirm I’ve understood this correctly.

Being clear on low-level vs. python’s high-level is important.
What is true for low-level code will not necessarily be true in python.

I agree that such a guarantee is a performance killer.

For now we assume that we need a way to synchronize threads to make sure thread 2 sees changed done to variables in thread 1. But there’s no documented way to do that (especialy in a portable way, that is in a way that would be guaranteed to work in GIL and free threads, and ideally in non cpython conformant implementation). How do I do that reliably?

My question above if Lock/Event/Queue is enough to make sure changes are visible are specifically about this.

The behaviour of cpython suggests (at least in my experience with GIL) there is some internal memory barrier that make changes visible (I assume GIL does a memory barrier). But because this is not documented that is an implementation details which might change.

I agree about the need for high-level abstractions but I can’t see how you can avoid specifying visibility guarantees. This seems to be the very basic property of multithreaded program and some foundation on how changes made by one thread are propagated to another should be part of the language specification.

E.g. documentation might say “changes to variables are not automatically propagated between threads, you need to use a concurrency primitives that do memory sync” and then in for example Lock (or any other high-level abstraction) there’s a note “does memory sync”.

If this is not specified in any way how to create a correct program in this regard?

What level of reliability do you need? For all of the history of Python until now, “it works fine for me, and my tests don’t fail” has been sufficient. What has changed that we suddenly need more than that?

Indeed. But I don’t see why that’s a problem. It hasn’t changed so far, and you can test if it still works in the free threaded build, so what makes you more cautious now than you were (say) one release ago?

The same way as we always have, surely?

I am not sure I understand your argument and question here. By reliable here I mean a way that is backed by a language specification or documentation. From the discussion above it seems that language and the library guarantees nothing about visibility of changes between threads.

I don’t buy your argument with testing. I don’t know a way to test multithreaded program reliably. If I’m lucky – my test fails. If my test doesn’t fail it says nothing about the correctness. It may work locally, in CI and in test env but fail misirably in production sometimes.

I agree the issue is nothing new. This concerns me all the time. I’m not sure I understand why you seemingly think it is a bad idea to have the requirement and behaviour clearly defined and documented.

2 Likes

I have said something wrong in my previous post. I think the correction is very relevant to this discussion, so I’ll try to explain my mistake as clearly as I can.

What was wrong

I said that in this example program, thread 2 will see the write x = 2 in all memory models:

That is not true. In fact, a very similar example is given in Java’s memory model specification, stating that the write may never be read by thread 2. Let me report the Java code from the example in the specification (section 3.1):

class LoopThatMayNeverEnd {
  boolean done = false;

  void work() {
    while (!done) {
      // do work
    }
  }

  void stopWork() {
    done = true;
  }
}

According to the Java MM, and assuming that one thread is executing the work() method and a different thread eventually calls the stopWork() method, it is entirely possible that the loop never ends. (Note that the MM does not specify whether it will end.) I think it’s worth investigating the reason why the Java specification allows for this.

The reason why the work() thread may never exit the loop is that a Java compiler can optimize the loop to completely eliminate the read on the variable done. After all, the thread that is performing the loop never writes into the variable, so why bother checking it? The Java compiler simply doesn’t know that done is a shared variable.

The significance for Python

At this point, it’s important to mention that CPython (the implementation of the language that people commonly use) does not currently host a compiler that does this kind of optimization. Therefore, this sort of problem never arises in Python code. Let’s look at the Python version of the example above:

done = False

def work():
    while not done:
        # do work

def stopWork():
    done = True

Even though the current JIT compiler will not eliminate the while not done check, it is not inconceivable that one day it might, for the same reasons that Java compilers do this kind of optimizations. Hence, if we want to seriously address the main problem of this thread, we must answer two questions:

  1. is this Python code safe now? Yes.
  2. will this Python code be safe in the future? Maybe.

If current proposals like Shannon’s PEP 805 do go forward, then I don’t see why the above code should become unsafe in the future. (And it’s very desirable that it stays safe, so I reckon there’s a very good chance it will.) Nevertheless, as it stands we should face the fact that we’re not promising anything here.

I think that the answer to the first question is not at all to be dismissed: in my view, ongoing efforts to improve Python’s documentation should include an explanation of the fact that this example is safe in Python, differently from other languages. Notably including Java, a very popular high-level language. People with those background have every reason to be concerned about these problems, and we shouldn’t make the mistake of dismissing their concerns; something which was done not long ago by yours truly.

3 Likes

Sorry, I didn’t mean to give that impression. I’m perfectly fine with this getting documented, where we can. There’s the usual concerns that if we over-specify, we could end up limiting our implementation choices in future, but I trust whoever writes the documentation to make that choice appropriately. I just don’t want people to get the impression that it’s any harder to write threaded code now than it was before free threading was introduced - hence why I’m making a point that these concerns aren’t new.

2 Likes

Just my 2 cents, but it is exactly my problem and the reason I started this thread. Coming from other languages I have a quite good understanding how those languages handle thread concurrency. With Python I felt a bit lost when trying to find out how it is handled there. And before starting this thread I spend endless hours in reading messages on forums, tutorial and docs but there were so many contradictory or inaccurate statements spread over the internet, which could not be clarified by official documentation, in the end left behind an uncomfortable feeling.

I think reading all those messages of this thread gives a good impression what can and maybe should be improved.

7 Likes

I agree and disagree :). I’m virtually certain that today with the GIL, one can write such code as @romank0 sketches and get away with it, because:

  1. The GIL ensures that each thread accessing x has to take the GIL and this involves an atomic global variable, so it is synchronising (causes writes before the GIL-switch to be visible to reads after it).
  2. There is no optimisation that recognises that the value of doneis not changed within the loop.

In future (without the GIL, or with some future optimisation) you won’t get away with that. In this sense, it has become harder to write incorrectly synchronsed code that works.

I agree that no-one should have written that code even with the GIL, without at least making x and done atomic. So writing good code is just the same.

1 Like

My current understanding is that the only thing the GIL provides is a guarantee that no more than one Python opcode is running at once. The problem is that it’s basically undocumented what happens in a single opcode[1]. So you can’t in any practical sense reason about what is “safe” and what isn’t, unless you implement the safety guarantees yourself[2]. In particular, I don’t believe there’s any guarantee that the GIL enforces serialisation of writes. It seems like a plausible behaviour, but like everything else, I don’t think it’s documented.

I do think that the GIL makes it less likely that you’ll get caught out by the problems in your code. Enough serialisation will lower the risk of data races, etc. But that doesn’t mean the risks weren’t there. So I agree that people will find that they get more failures with free threading, and they will need to fix their code if they want it to work under free threading. But I don’t think that means that free threading “broke” their code - just that it exposed the problems that already existed. And it’s no harder to write correct code, it’s just no longer as easy to get away with “mostly correct” code.

What free threading does, is it exposes a group of people[3] who are not well equipped to reason about the trade-offs involved in writing threaded code, to increased risk. It’s way too easy to get the impression that you should just slap locks round everything to ensure safety. But then, you’re just reimplementing a GIL for yourself (and almost certainly implementing it far worse than the actual GIL did). I love the potential for increased performance that the free threading builds offer, but I hate the way it’s being presented as a user-facing change, when IMO it should be essentially an implementation detail that paves the way for better multithreading libraries that focus on end user experience.

I, for one, am waiting for those libraries…


  1. For very good reasons - it’s by definition implementation defined behaviour, and generally not appropriate to document ↩︎

  2. using low-level tools like locks, or more appropriately IMO using higher-level “structured concurrency” APIs ↩︎

  3. Like me!!! ↩︎

3 Likes

@jeff5 how do I make a variable atomic? using C or third party library?

These claims confuse me quite a bit. I understand there are no guarantees and I’m still missing how to properly implement the check in the scenario:

  1. x = 1
  2. T1: x = 2
  3. T2: waits in the loop till x == 2

How do I do that with existing python API? I’ve asked about locks, events and queues earlier and didn’t get an answer.

Or do you mean I need to use C (basically to set a memory barrier) to make my python code correct? This would be rather inconvenient.

Fair point. (I’m forgetting what language I’m in.) You would have to associate it with your own lock. But I think we will get atomics as a feature.

A reason you might not have atomics, or need them as much, is that dictionaries claim to be atomic memory, which will make all globals atomic, and all (non-__slots__) attributes of objects. However, I know others are developing a fuller account of what promises Python should make about what objects.

Function-local variables do not need any protection, because they are confined to the thread executing the call. For this reason I think much data access will be properly concurrent. Cell variables (local variables of an enclosing function) can become shared between threads, so you would need a lock, unless cells become atomic.

It’s no harder, but it’s no easier, either. I doubt that you can implement that check with the guarantees the language offers. But the more important question is why you’re trying to implement that check. Presumably you have a business requirement, and you should be able to implement that without needing to use a busy-wait on a variable being changed in another thread.

In the longer term, it’s quite likely we will get additional features in multi-threaded Python. And I wouldn’t be surprised if some of them were low level primitives like atomic variables or memory barriers. But you can do these already as a 3rd party library (cereggi and ft_utils have been mentioned previously in this thread). Having them in the stdlib might be nice, but it’s not necessary. I personally hope the core devs working on free threading choose to focus on higher level APIs, but even if they do, they’ll need low-level primitives, and exposing them for end users seems like a good choice.

I should be clear here - I’m focusing very much on your apparent desire for a documented-correct solution. If all you want is something that should be fine for any real case, just use a global variable. T1 will set it to 2, and T2 will see that change in due course. You can run some stress tests to confirm this for yourself. The only thing you won’t have is documentation saying that nothing can go wrong if you do this. And that’s the thing that I struggle with - why is having the behaviour documented and guaranteed so critical to you? Practicality beats purity here, in my view.