What is Thread Safety as applied to Python?

So I’m just looking for a bit of clarity about this. I’m not new to threads and all that, but I guess I’ve also not tinkered enough around them to find issues.

The reason I’m asking is because I see lots of methods and functions in the asyncio library which are declared not thread safe. Now I just take that religiously, but I don’t exactly understand what makes something not thread safe.
Anyone care to help me understand?

A lack of thread safety means that the methods/functions don’t have protection against multiple threads interacting with that data at the same time - they don’t have locks around data to ensure things are consistent. The async stuff isn’t thread safe because it doesn’t need to be. Async code is all occurring in the same thread, so it isn’t possible to preempt with another one, context switching only occurs at await calls.

Note that Python does have the Global Interpreter Lock protecting C-level code, so regardless you won’t have issues like memory corruption, immortal or erroneously destroyed objects, etc. Builtin types are going to generally be threadsafe, if they’re not calling Python code. But your own objects can definitely end up in an inconsistent state.

3 Likes

See Python GIL. Python is not by its self thread safe. But there are moves to change this: NoGil, etc.

Removing the GIL does not make functions thread-safe.

Thread-safety relates to this question:

If you have a data structure (say, an object, or a list) that is being modified in one thread, can another thread come along and modify it right in the middle of that calculation?

Let’s say you have a global variable x which equals 0, and two different threads attempt to increment it using this:

x += 1

If incrementing is thread-safe, then you are guaranteed that the result can only be x=2 after the two threads have incremented it. Thread safety guarantees that the only way those two threads can increment x looks like this:

  1. thread A locks x, so nothing else can touch it;
  2. thread A reads the value of x, and gets 0;
  3. thread A adds 1 to the value, giving 1;
  4. thread A writes 1 to x;
  5. thread A unlocks x, so other threads can use it;
  6. thread B locks x;
  7. thread B reads the value of x, and gets 1;
  8. thread B adds 1 to the value, giving 2;
  9. thread B writes 2 to x;
  10. and thread B unlocks x.

Of course the locks and unlocks aren’t free, that takes time, and it also means that thread B has to wait for thread A to finish, which slows it down. So thread-safety isn’t cheap.

But without thread-safety, you can have this:

  1. thread A reads the value of x, and gets 0;
  2. thread B reads the value of x, and gets 0;
  3. thread A adds 1 to the value, giving 1;
  4. thread A writes 1 to x;
  5. thread B adds 1 to the value, giving 1;
  6. thread B writes 1 to x.

So now you have a bug where two threads have both tried to increment x, both think they have done so, but x only equals 1 instead of 2.

Welcome to the Quadrant Of Hell:

2 Likes

Alright… I get it now. I just didn’t know that that was thread safety I guess, I kinda thought of it as ordinary stuff one should know while working with threads. Thanks for clearing it up for me.

It very much is, but keep in mind that most code wasn’t written in the context of “working with threads”. Issues of thread-safety are explicitly considered when writing code to work in a multithreaded context, but if the code wasn’t originally written in that context, explicitly, then it’s unlikely all of those pains were taken. (Particularly since, like Steven said, locking isn’t free — even in a single thread. So the decision to bother with it anyway is far from a no-brainer, when writing single-threaded code.)

So the thing to keep in mind is that “code that wasn’t written for thread-safety” — at this point in the evolution of… like… “software” in general — still describes the majority of existing code. When writing new multithreaded code, it’s a bad idea to blindly trust any library or routine unless it’s explicitly declared to be thread-safe. (The more polite libraries explicitly warn when they’re not, though that’s a courtesy as it should really be the default assumption.)

Not that this means you can’t use such code in your multithreaded app.
But without knowing its access patterns or what thread safety guarentees
it makes, if any, you need to provide your own locking to ensure that
the library code itself does not run concurrently. Crude ecample:

global_mutex = Lock()

Then in some threaded code:

... stuff you know is thread safe ...
with global_mutex:
    ... stuff not known to be safe ...
... resume known safe stuff ...

which protects the arbitrary third party code from concurrent use.

There are nuances and complictations, but this is the basic deal: manage
your own thread safety because you know what’s going on, and protect
unsafe stuff from concurrent use.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like