I don’t think that’s quite right. Sequential consistency is a property of the memory model that a runtime might offer the user’s code, not a property of the user’s code. The user may have assumed it at statement or byte code level, and may be incorrect, but if they’ve relied on it falsely for correctness, isn’t that just “not correctly synchronised”?
Conversely if they haven’t relied on it but access to shared data is properly synchronised (possibly what you mean by “non-deterministic”), I don’t think it needs a new name.
Just to be clear, I’m very supportive of (your main intent) having clear language and e.g. specifying which operations must be atomic in every implementation.
I think the statement “concurrent actions on built-in (and stdlib) objects will not crash the interpreter” is worth making, and that the interpreter is therefore “thread-safe” in that limited sense, while reminding people that (as ever) that doesn’t mean your code is correct.
The atomic ops are, possibly, ok for trivia multi-threaded algorithms that have very limited state to manage. But as soon as the algorithm is none-trivia you will need to lock while mutating state.
I think all the atomic ops in 3.14 are only there to prevent python from corrupting it’s internal state, not to have multi-thread algorithms just work.
I feel obliged to say that the tricky details being referred to are encapsulated by cereggii’s AtomicDict.reduce. Example usage here.
Though I’m sorry to continue the sidetrack and promise to stop here.
My feeling in general towards the topic’s title “Thread safety now and in the future” is that thread safety is better served by data structures that are not the standard dict/list/etc, and thus don’t have to make compromises with single-threaded code.
I am new to threading but I have a basic classic question. What would happen if, say, first element of the list was to be incremented by two threads? Complex data type operations such as for dict and list involve multiple steps like hashing, probing etc. Wouldn’t it be an issue to read, maybe right in between these operations?
The increment isn’t atomic. my_list[idx] += 1 is desugared to my_list[idx] = my_list[idx] + 1 where even if the read and write are individually atomic, the combined operation isn’t.
To answer your broader question, list and dictionary modifications happen while holding the per object lock.
Thank you for answering about the per object lock. I still have my doubt whether it is right to call a non-atomic operation “safe” if the result is 6 not 7 (assuming, +1 increment to 5 by two threads who both read at the same time)
I think it’s fair to say that a lot of this discussion has been about what people mean by “safe”. Probably the main conclusions so far are:
“Safe” means different things to different people. Always establish what you mean by “safe” (or use a more specific term) when having a discussion.
What’s most important is what operations are atomic. Unfortunately, Python doesn’t document that - at least not yet, there are people working on this.
It’s definitely not obvious what is atomic. Even lst[n] = 1 isn’t necessarily atomic, because it calls __getitem__, which could be user code (depending on the type of lst).
Wherever possible, by far the simplest approach is to avoid using shared objects across threads. Use queues, or some form of higher level abstraction like concurrent.futures, to communicate data between threads.
None of this is new with the free threaded builds. Your example could give a result of 6 instead of 7 even with the GIL.
Using either one of these two types as items in the list should provide the safety you’re looking for.
Unless you were interested in the problem of setting generic items of lists. If that is the case, I don’t know of any atomic list implementations yet, but there are atomic dictionaries out there:
You could, however use an atomic reference as list items, so that you move the problem of atomically setting an item to the problem of atomically changing the object inside an atomic reference, which can be done safely with either:
I’ll note (just for information, not as a criticism of the linked libraries) that even with atomic types, you can’t just blindly use them and expect to be “safe”. There are many ways that data races can be introduced into your code, so you still have to take care to understand when other threads can pre-empt yours. This is why I’m much happier recommending that people avoid shared data and use structured concurrency mechanisms, rather than low-level constructs like locks and atomic types.
You are of course correct, though I think your point is conveyed in the linked documentation. (If not, feel free to open issues on the projects.)
I don’t think that advice is applicable to all circumstances, and using locks or atomics is also not advisable in all circumstances.
I think we need to be clear about what tools are out there and when to use them, and the work on docs that is currently ongoing is a step in the right direction.
Thank you for your comment. Since I am new to threading, I am just curious how can the result be still 6 with GIL? Is it due to concurrency (two threads reading the same value one after another where threads are concurrently (not at the same time due to GIL) executed)?
Be mindful that the GIL is used by the CPython interpreter to ensure the correctness of its own internals. It does not make any concurrency guarantees about the correctness of Python code.
As Sam was saying:
This means that the GIL (and now free-threading builds) guarantees only that:
the read my_list[idx] is atomic
the write my_list[idx] = … is atomic
but, crucially, their combination is not atomic
The GIL ensures that individual bytecodes are effectively atomic, but my_list[idx] += 1 is desugared to more than one bytecode.
If one thread is iterating over a dict or list by a for-in loop and another thread is adding/removing an item to/from it, what should one expect without additional locking measures?
Possibilities I can think of:
The reading thread is retrieving all items including the added one.
The reading thread is retrieving all items including the removed one.
The reading thread is retrieving all items except the added one.
The reading thread is retrieving all items except the removed one.
The reading thread is retrieving an undefined number of elements which can miss more than the removed or added one due to the changes of internal structure in between iteration steps
Crash of the system due to inconsistencies.
Normally I would expect possibilities 1 to 4, but I’m unsure about 5, and I would definitely not expect 6.
Iterating over a dict while modifying it fails in single threaded code:
>>> d = {1:2,3:4,5:6}
>>> for k in d: d[k+10] = 'foo'
...
Traceback (most recent call last):
File "<python-input-1>", line 1, in <module>
for k in d: d[k+10] = 'foo'
^
RuntimeError: dictionary changed size during iteration
If you only replace values by assigning to existing keys and don’t add/remove keys then I think it is fine but this is quite subtle and possibly not guaranteed in any docs.
If you have a list then there are a bunch of different things that you could do:
l[i] = 3
l.pop()
l.pop(i)
del l[:i]
l.clear()
...
I would expect that much like for dicts an assignment like l[i] = 3 without changing the size of the list can be viewed as an atomic operation and then if another thread iterates over the list the iterator will yield either the old or the new value but not give duplicate values or crash or anything else. Another case that is probably not guaranteed explicitly but perhaps could be is that if you only append to a list then a concurrent iterator would effectively see some consistent version of what were the items in the list at one time in between the different append operations.
Another case not listed in your examples 1-6 is that rather than a seg-fault type crash there could be a Python-level exception. That is in fact what happens in the dict example above but I expect that iterating over a list would never give an exception.
I would expect that much like for dicts an assignment like l[i] = 3 without changing the size of the list can be viewed as an atomic operation and then if another thread iterates over the list the iterator will yield either the old or the new value but not give duplicate values or crash or anything else.
That part was my initial question and also confirmed by @colesbury and @pf_moore so I would consider this a well-established fact.