Atomic and thread safe in Python World

allenling · April 22, 2024, 4:35am

i recently read about something related to asyncio, and thread safe just like everywhere, and that
led me to what atomic and thread safe really mean in Python.

from my shallow understandnigs, an operation is atomic as it takes one instruction to complete, and here the global interpreter is the one executing instructions in Python, and instructions are OP codes like LOAD_NAME, CALL_FUNCTION, etc.

and atomic operations are a way to achive thread safety.

and thread safety is for exclusive control, basically it means when i am doing something, accessing, modifying something, i have such a guarantee that no one has a chance meddling in, they have to wait.

and the GIL provides thread safety to some extent because only one thread can run at a time, but thread switching would still happen in between OP codes.

and i found this link

and i felt kind of confused about x = L[i] is considered thread safe.

let’s take a look at the op codes for L.append(x),

In [72]: dis.dis("L.append(x)")
  1           0 LOAD_NAME                0 (L)
              2 LOAD_METHOD              1 (append)
              4 LOAD_NAME                2 (x)
              6 CALL_METHOD              1
              8 RETURN_VALUE

append takes one instruction to complete, so when there’s multiple threads calling append but still
only one thread can acquire the GIL and actually execute append, which is CALL_METHOD.

or maybe there’s one thread calling L.append, and another one is calling L.pop(), and pop takes one instruction to complete as well.

In [73]: dis.dis("L.pop()")
  1           0 LOAD_NAME                0 (L)
              2 LOAD_METHOD              1 (pop)
              4 CALL_METHOD              0
              6 RETURN_VALUE

even though two threads switch between LOAD_METHOD, say thread 1 pauses at LOAD_METHOD, and the thread 2 runs to LOAD_METHOD, and then we back to thread 1, there’s still only one thread holding the GIL in order to execute the op code CALL_METHOD.

so there’s always one manipulating the list, either appending data to the list, or popping data from the list by executing op code CALL_METHOD .

and here are op codes for x = x + 1

In [74]: dis.dis("x = x + 1")
  1           0 LOAD_NAME                0 (x)
              2 LOAD_CONST               0 (1)
              4 BINARY_ADD
              6 STORE_NAME               0 (x)
              8 LOAD_CONST               1 (None)
             10 RETURN_VALUE

suppose x was initialized to 0, and thread 1 is running, and it has finished LOAD_NAME, thread 1 knows that x is zero, and before it exectues BINARY_ADD, thread 2 acquires the GIL, and then thread 1 will be paused, and then thread 2 will be the running thread, and let’say thread 2 will execute LOAD_NAME, and BINARY_ADD before we switching back to thread 1, and then data corrupts.

so x = x + 1 is not thread safe.

and op codes for x = L[i]

In [75]: dis.dis("x = L[i]")
  1           0 LOAD_NAME                0 (L)
              2 LOAD_NAME                1 (i)
              4 BINARY_SUBSCR
              6 STORE_NAME               2 (x)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

what about thread 1 pausing before BINARY_SUBSCR but after LOAD_NAME L?

suppose i is was, and there was 3 elements in the list L, and thread 2 might just pop the last element from L, which would result in L being shrinked to a size of 2.

then thread 1 resumed and executed L[2], then out of range.

alicederyn · April 22, 2024, 6:26am

An operation is atomic if it appears to occur instantaneously. The number of instructions isn’t strictly relevant in general: the further towards hardware you get, the more you find things that look like a single instruction actually aren’t (e.g. microinstructions, or LL/SC on ARM). Python achieves this appearance of instantaneous action with a global interpreter lock (GIL), which it holds around every interpreter instruction, so your intuition turns out to be correct (edit: or not, see next post), at least in current versions of Python. However, it helps to know the underlying idea, as you can achieve atomicity yourself with correct use of mutexes or other algorithms. Or take asyncio: as scheduling is cooperative, not pre-emptive, any block between two awaits can be considered atomic, provided you don’t run any background threads accessing the same data.

hauntsaninja · April 22, 2024, 6:32am

What is and isn’t atomic is indeed quite subtle. (You mention asyncio, that’s a little different and easier to reason about because it’s cooperative. Unless you await, nothing else will run)

I personally think reasoning about it in terms of bytecodes is more confusion than its worth. Sure, if an operation spans multiple bytecodes, certainly don’t treat it as atomic. But CALL is a single bytecode and that’s clearly not atomic! (Also the details of when bytecodes can drop the GIL changes, e.g. in Python 3.10 bpo-29988: Only check evalbreaker after calls and on backwards egdes. by markshannon · Pull Request #18334 · python/cpython · GitHub means a lot of trivial attempts to demonstrate non-atomicity will no longer reproduce in Python 3.10 and newer, but this isn’t guaranteed by the language)

I think a way to think about it is in terms of segments of uninterrupted C code. If you call into C and don’t give up the GIL and don’t call into Python code and don’t decref (since reference count hitting zero can call __del__), whatever you do will (probably) be atomic with respect to all other Python code.

See also delete misleading faq entry about atomic operations · Issue #89598 · python/cpython · GitHub and PEP 703 – Making the Global Interpreter Lock Optional in CPython | peps.python.org

allenling · April 22, 2024, 9:11am

thanks for replies.

@alicederyn

with regard to the definition of atomicity in programming world，

i agree with that many things seem like one instruction but actually might be not.
there’s a lot of talks, articles, videos around atomicity, and some people say

and someone would refere to something like all-or-nothing, etc, but i guess we can just narrow the scope here.

the term instantaneously, IMHO, implies that one-instruction, uncuttable, can-not-be-interrupted property.

if one operation can be interrupted, then it can not be perfomed instantaneously since someone can just jump in and make that operation stop(the term instantaneously here does not refer to speed, right?), and that operation can not finish instantaneously and will be finished in future, meaning that operation stops at some smaller instruction.

but given that we are talking about Python world, and let’s just put that defination of atomicity a side, and say, roughly speaking, an operation is atomic if it can not be interrupted, at least from a higher level perspective, and the underlying OS or libs would take care of the details for us.

i hope that we can achieve an consensus on this, but please correct me if i am wrong.

@hauntsaninja

yes we can manually release the GIL in C code, and python does release the GIL somewhere at certain point, like handling network events.

let’s just don’t include any C code and consider things at Python level only, and for simplicity, make some assumptions here, like, for instance, L.append will not release the GIL(i haven’t digged into the implementation details of append, i hope i am right).

and first, let’s say we all agree that in the context of the GIL, thread switching happens among op codes.

what i wanted to figure out is that why x = L[i] is thread safe.

or could someone explain and do a dry-run or simulation on how the interpreter works and switch two threads step by step to demonstrate x=L.append[i] would be thread safe.

i read delete misleading faq entry about atomic operations · Issue #89598 · python/cpython · GitHub, and posted that question, and i hope that someone can help.

at last, if there’s so many kind of exceptions in terms of atomicity, maybe we should just simply treat no python code thread safe?

Rosuav · April 22, 2024, 9:44am

Let’s start with a definition of “thread-safe”. The simplest way to think about it is that, given two threads doing two operations, it is exactly the same as if thread A performed its operation before thread B does, or the other way around. I’ll use a simple increment as an example:

[Thread A] x = x + 1
[Thread B] x = x + 1

This really means:

[Thread A] What is X?
[Thread A] Calculate the number one higher than that
[Thread A] Store that into X.
[Thread B] … as above.

A failure of thread-safety here would look like this:

[Thread A] What is X?
[Thread B] What is X?
[Thread B] Calculate the number one higher than that
[Thread B] Store that into X.
[Thread A] Calculate the number one higher than that (ie the number fetched in step 1)
[Thread B] Store that into X.

And you can see the problem here.

There’s WAY less risks when it comes to a simple lookup like L[i]. What we have looks something like this:

Take the list referenced as L
Find its internal state: an array of objects, and the number of those objects.
Check that i is a valid index (otherwise raise)
Grab a reference to the object, and return it.

Failure of thread safety simply cannot happen with two threads doing this at the same time, since they’re both just reading. So the failure would have to involve a thread changing the list while you’re reading it. A simple assignment like L[4] = 123 isn’t going to break this, as you’ll either get the value before assignment or the value after it. ^[1] So that’s still safe. The biggest risk is going to be resizing the list, since that can result in replacing the entire array. Here’s what an append operation would look like:

Get the same internals we were seeing before
Does the array have capacity for another element? (It’s common to overallocate a bit to speed up repeated appends.) If so, slot the thing in there, increment the list’s length, we’re done.
Okay, we need to expand that. Make a new array with more capacity.
Copy in all the references (this isn’t like assignment in Python, it’s just copying pointers).
Switch to the new array with its increased capacity.
Slot the new element in, increment length, as per step 2.

This can definitely run into parallelism problems, particularly if two threads are appending simultaneously. It’s not too hard to keep this safe against simple reads (note that the old array is still there right up until step 5, at which point the new one is in position and ready), but if another thread tries to assign to the array, there’ll be problems.

Solution? Locks. There are lots of different granularities that can be chosen, and they offer a tradeoff between time spent acquiring/releasing locks and time lost through a loss of parallelism. Traditionally, CPython has used a Global Interpreter Lock which basically says “if two threads are trying to run CPython bytecode, serialize”. This is actually the highest performance choice for many use-cases, but it gets a horrible reputation due to being abysmal at certain other use-cases (like hammering the CPU with two threads constantly incrementing numbers). Another option would be to have a lock on every Python-level object - before using or mutating any object, you have to acquire its lock. While htat might seem like a really smart idea, have a quick think about (a) the vast number of locks you’d need to acquire, and (b) how likely it is for two threads to both need None at the same time. Finding a good locking model is essential to good multithreaded performance!

I’ve omitted a ton of details here but hopefully that’ll give you a bit of an overview.

I’m assuming here that the interpreter is capable of atomic writes of pointer-sized integers; if it isn’t, basically everything is broken, so it’ll be built with that in mind. ↩︎

alicederyn · April 22, 2024, 3:19pm

In this case, “interrupted” means “seen to be in a partial state” or “seen to have taken effect but then seen to have not taken effect afterwards”.

“Atomic” means other things in other places (e.g. in database theory), but for concurrency, it means “there is an instant in time before which everyone agrees the operation has not happened, and after which everyone agrees the operation has happened”.

Importantly, that “everyone” is anyone else doing operations in the system. This is why locking is a solution: while you hold the lock, nobody can measure the state! So you can say “the instant in time is the moment I first hold all the locks I need”, and that works for the definition. Anyone trying to see what state the system is in will have to take the locks, and so will either observe the complete state before you started (they got the lock first) or the complete state after you finished (you got the lock first).

There’s a whole world of “lock free” algorithms that achieve the same effect without locking. They are generally much much harder to reason about and prove correct, but it comes down to the same trick: pick a point in time that the operation “happens at”, and prove consistency.

Note that atomic doesn’t mean “cannot be cancelled”. I could raise an exception in a critical section and abort; provided I have some meaningfully defined end state that other threads observe afterwards, that can still be atomic

allenling · April 22, 2024, 3:51pm

@Rosuav @alicederyn

guys, i have to say i really appreciate your thoughtful, provocative, and informative replies.

and i guess for me it’s time to revisit some kind of fundamentals, thanks again!

Rosuav · April 22, 2024, 6:20pm

Though it usually means “if cancelled, it must be ENTIRELY cancelled”. If you raise an exception in a critical section and abort, the end state should normally be the start state.

alicederyn · April 22, 2024, 6:42pm

That’s an exception safety guarantee rather than an atomic guarantee.

hauntsaninja · April 22, 2024, 10:04pm

If you want to reason about what is atomic, you need to think about C code. If list was implemented in pure Python, L[i] / list.__getitem__ would not be atomic; you’d need to do locking.

I think the FAQ says this well:

When in doubt, use a mutex!