I was surprised to find out that there’s a known crashy function in the time module: time — Time access and conversions — Python 3.14.0 documentation
For Python 3.14, I think that an average user who does not require increased performance or managing greater complexity (scale and coupling) will likely not need to worry about free-threading now (other than perhaps awareness).
Concurrency and parallelism can be viewed from multiple perspectives: hardware, runtime, and application levels.
Before looking at higher-level abstractions added to the language, we, as a community, will benefit from better explanations and increased documentation. This documentation needs to reflect the runtime language guarantees and considerations as well as the application developer patterns to best approach concurrency and parallelism.
The GIL does not ensure that an algorithm is thread-safe.
The free-threading python will also not ensure that an algorithm is thread-safe.
You have to use locks etc to ensure an algorithm is thread-safe.
What is new is that free-threading python is likely to expose locking issues with algorithms that seemed to work under the GIL.
It depends what you mean by “thread-safe” which different people may interpret in different ways. If by thread-safe you mean that you won’t get a crash or segfault but the operations from different threads may occur in an unspecified order then both GIL and free-threading Python make it safe to replace an item in a list in one thread while reading from the same list in another. Without locks there is no way to know if the reading thread will see the old or the new value but if that does not matter (as specified in the OP) then it is fine.
It is not necessarily the case that locks are needed even if some synchronisation across threads is needed because both GIL and free-threading Python have internal locks and perform many operations atomically. A guide for what is thread-safe should also document which operations are atomic and whether or not atomicity is implementation-defined or can be depended on as a language feature.
I do not see the locks on list, dict etc being anything but protections for the C-Python interpreter against crashing.
As soon as you have read-modify-write or test-then-access in your algorithm that is running across threads you will need locks.
I do not agree thaty the list, dict locks are of any use in making algorithms use multi thread.
So it seems we have two contradictory statements regarding my question here, which I think is a strong indication that things need to be clarified.
But which statement is correct?
Again the question was: One thread is updating an already exisiing element in a list or dict with a fixed value which is not related to its previous value while the other thread is reading form the dict or list.
What else should I expect beside the possibilities that the reading thread gets either the old or the new value then?
Nothing. As stated above the reading thread will get the old or the new value.
There are some subtleties though which depend on understanding which operations are atomic. The operation
items[3] = 'red'
is atomic. The corresponding read
value = items[3]
is atomic. If you do these operations in two threads then you cannot assume that they occur in a particular order but the behaviour will be equivalent to the two operations happening atomically in one of the two possible orders.
You should think of the assignment to value as a separate operation from the read items[3] though:
_tmp = items[3]
# another thread can execute something in this gap
value = _tmp
That is not an important distinction if value is just a local variable but might matter if it is a global shared across threads or if assigning into something else that might be shared e.g. this is not atomic:
other_items[10] = items[3]
If different threads can change the size of the list then e.g. this can fail due to a race condition:
if len(items) > 3:
value = items[3]
It can fail if another thread calls items.clear() in between the len check and the items[3] read.
This is why you need atomic operations. The list.pop method is atomic so if two threads pop they don’t get the same item because the read-modify-write inside list.pop is atomic. Likewise append is atomic and if two threads append then both items are appended but in an unspecified order. This atomicity depends on locks but the Python programmer does not need to use locks explicitly because the runtime does it implicitly using either the GIL or per-object locks.
Precisely.
Knowing (and documenting!) which items are atomic is often crucial for proving a multi-threaded algorithm works as you intended. Python has traditionally been rather bad at documenting such details, claiming that making atomicity guarantees limits flexibility when code is being maintained. That was a bad argument under the GIL and remains a bad argument now, but conversely I don’t know if the new energy around free threading makes it more likely that people will be willing to commit to atomicity guarantees.
Having said that, as soon as you bring 3rd party code into the mix, it’s highly likely you’ll encounter APIs that don’t document their atomicity guarantees, so you’re back into “read the code, take a chance, or add locks just in case” territory.
This is atomicity, though, as opposed to thread safety - which IMO is a meaningless term because people should always be able to assume that the interpreter will not crash.
The one grey area I see is with something like lst[len(lst)-1]. That looks safe, although it isn’t, because other threads can run between when subscript is calculated and when the list element is accessed, and that code could change the length of lst. That still can’t crash the interpreter, but it could cause an exception to be raised. Some people (not me!) might argue that unexpected exceptions are somehow worse than data races (“so you mean we have to put try...except blocks around everything?
”)
It’s still about atomicity guarantees, though, not about “thread safety” in the way people are trying to define it (as “not crashing the interpreter”). And I don’t think that reframing “thread safe” as “won’t raise unexpected exceptions” is a remotely usable definition, FWIW.
@oscarbenjamin Thanks again for clarifying. What you described is exactly the behavour I would have expected.
I can say with some certainty that we’re not getting that message across very well. The hype around free threading is real, and is strongly at odds with the idea that “the average user doesn’t need to worry about free threading”.
Better documentation is always beneficial. I don’t necessarily agree that we need better documentation before higher-level abstractions. I think both can be worked on at the same time.
Agreed, but:
- We could have done that under the GIL. In my previous message I noted that Python has never provided good documentation of what operations are atomic. We could have done that years ago. We are starting to do it now, and that’s great, but even under the GIL having that information would have improved things[1]. The same is true of documentation on recommended patterns.
- Higher-level abstractions can very easily be implementations of recommended best practices. For example, I imagine that the map-reduce pattern is one that would be recommended. But we aren’t limited to describing that pattern. Why not provide it as a library as well? That way, we don’t demand that everyone has to implement the tricky bits of the algorithm[2].
- The persistent suggestion that “doesn’t crash the interpreter” is a property that needs mentioning at all indicates to me that we’re not integrating the conversation around free threading into the wider context of Python as a language. It never even occurs to me in any other context that an interpreter crash is anything other than a language implementation bug[3]. It’s a great technical achievement in terms of the free threading implementation, and the developers have a right to be proud that they achieved it, but it isn’t (and shouldn’t be) something that end users need to know about.
On the subject of the “wider context”, people see Python as a language that’s easy to program in. I want free threading to reinforce that view, not contradict it. And I don’t think that documenting how to do “safe concurrency”, no matter how useful that will be in itself, will take us in that direction. Better abstractions and frameworks in the stdlib will. After all, concurrent.futures is a great example of “making concurrency easier”, so it’s not like there’s no precedent here. If we don’t have a “better way” to offer alongside discussions of atomic operations and explicit locks, we risk losing momentum and community goodwill[4].
We could, but we always felt those were better left as implementation details, not part of the language spec (for various reasons, such as the ability to change CPython behavior later, and alternative implementations of Python).
And regardless, situations where you can avoid explicit locking purely by relying on atomic ops are rare in practice. They might occur often in toy examples, however.
It is, and AFAICT you don’t really explain why it is not sufficient as is. ![]()
Precisely. What’s different now that makes it important that we document atomicity guarantees for free threading? None of the questions I’ve seen raised are any different whether or not the GIL is present.
One specific example, which I’ve used a few times now, is running some code in multiple threads and collecting the results. The parallel execution part of that is ideal for concurrent.futures, but the collection is tricky to get right. The “obvious”[1] answer is to update a shared result object, but that either introduces data races or hurts scalability, depending on whether you do locking correctly. The correct answer is probably to use the map/reduce pattern, but to be honest, that’s only words to me and I don’t know how I’d apply it in any specific context[2]. Documenting the pattern might help, but I suspect there are tricky details to get correct, or you still have data race/scalability problems. Having a library that handles the difficult bits would be ideal here.
It’s not that concurrent.futures needs replacing, just that there are other parts of a concurrent application that could benefit from similar library support.
Another example is some form of cancellation support - if you have tasks running in an executor, safely terminating the executor is tricky to get right (even with the limitation that cancellation will always wait for all current tasks to complete). I’ve tried to implement this in a concurrent CLI application, to support Ctrl-C. It can be done, but it’s awfully low level. Having that in the stdlib would again be useful to reinforce the idea of Python as easy to code correctly in.
And yes, these could likely be done as 3rd party libraries. My point here is that if all the stdlib has[3] is manual locking and a bunch of advice on what can go wrong, it feels like threaded programming from 20 years ago ![]()
I suppose threading is still better than asyncio, which doesn’t even have something like concurrent.futures ![]()
I guess this approach is obvious to you, but to me the obvious answer is to use executor.map and collect the results in the main thread. This would be documented better with some examples, though.
Well, the best advice is to use algorithms that don’t need additional locking at all, but for most people to write code that way, they do need to know at least some amount of guarantees about the datastructures in the standard library. I wholeheartedly agree that locking shouldn’t be prominent advice for concurrency it should be a last choice when there is no other option, and people shouldn’t sprinkle locks in everywhere hoping to fix a data race, because lock contention can and will dominate runtime in many cases, to the point of being slower than the singlethreaded version.
If we want the documentation to be better than the threading of 20 years ago, we need to be guiding users towards better patterns, not just the correct tools.
The simplest example of this for the standard library is probably a concurrent futures executor, and collecting the results in the thread it was launched from.
There’s also producer consumer patterns with queues, where you only have to rely on the atomicity of getting the next item from a queue.
There are also binning functions, for cases where you have a reason to mutate a shared datastructure to ensure only one worker thread is ever mutating that section of the datastructure (binning on dict keys is a good example of this)
I can keep going on for quite a while depending on the actual needs of an application, but I think providing the first two of those and maybe the third is enough to nudge people into not thinking about it in terms of “what needs locking?” but “how do I ensure changes to this are well-ordered while minimizing the need for explicit synchronization?”
The Executor methods submit and map are skewed towards returning results from the tasks, rather than updating a shared object. From that point, your main thread has to aggregate the collected results. How to do that depends on the application; for example, if your individual tasks return dicts with disjoint keys, the main thread could just merge the dicts as they arrive.
Sorry, I oversimplified my example. In my real code I have millions of calculations, with results in a very limited range (10-20). Returning a list of a million values from map is a lot less efficient than returning a counter with 10 keys.
But that’s really just details. map often is the right answer. Maybe all that’s needed is some sort of optional accumulator for map, to cover a slightly wider range of use cases.
Map will return a generator of values that plays nicely with a counter. ![]()
You can do that with executor.map as well. Divide your millions of calculations into chunks and apply map to the list of chunks. This is exactly what map-reduce is: your map function turns a chunk of inputs into a counter of outputs and the reduce function combines the counters from separate chunks to produce the total counts. You probably don’t need any multithreading for the reduce part so you can loop over executor.map in the main thread combining the counts returned from the map function in each thread.
That being said from my previous experiments this sort of thing may not scale well with free-threading. I’m not sure exactly what causes that but a big difference between doing this sort of thing in C and doing it in Python is that integers in CPython are heap allocated and reference counted. If the main bottleneck is creating millions of small objects in the heap and twiddling all their reference counts then write contention to RAM may be more significant than any of the actual calculations you are trying to do. Throwing more threads at this problem is potentially like using more threads for better download speeds: it might help but perhaps not if you’ve already maxed out your internet connection while using only one thread.
… and this is the sort of thing I’m characterising as a “tricky detail”, that would be better encapsulated in a library utility (how to do this, and the fact that it’s important to do so, isn’t as obvious to the average user as people seem to think). But let’s not sidetrack the discussion any further.
I don’t think it’s a sidetrack though–I think this is a good example of where a piece of the Python documentation could be improved. The tools do exist in the stdlib, but the instructions are not clear enough for users who aren’t already familiar with multithreaded idioms.
The docs for this stuff have a lot of vague advice like “chunksize might matter a lot” and not a lot of detail on why. This is partially because it all depends. But it could be clearer.