Also, this is a perfect example (IMO) of code that should not include locking. For the vast majority of use cases (possibly all, depending on the answers to the points @oscarbenjamin made) this code is completely safe, and adding the overhead of a lock would be an unnecessary (and non-trivial) overhead. Also, we cannot reasonably expect every Python developer who writes code like this to consider thread safety.
So who does do the locking? Callers may not even be aware that there is shared state here.
Should all library code from any source be considered as potentially unsafe for use in a multi-threaded context unless there’s explicit documentation otherwise? Or do we just shrug our shoulders and YOLO it?
I know none of this is new. This isn’t about free threading, it’s simply about the subject of this thread - how do we discuss threading, and how do we ensure that such discussions take race conditions seriously (which IMO includes considering how library code gets written in the real world, by people who may not have threading in mind when they write it).
No, it isn’t. It might overwrite a key that was inserted by another thread. Also in a situation where keys are potentially being removed it might fail with KeyError. Also it adds overhead to what should be the fast path of inserting a new key when there is no contention by adding an additional dict lookup and function call.
Higher level atomic things need to be based on lower-level atomic primitives which is why it is important to have things like an atomic dict.setdefault method.
In the current state? people who care about this already treat it that way because so many libraries are written without it in mind.
It’s not new to freethreading, but the fact that freethreading has the potential to increase performance where threading previously wouldn’t is making people much more aware of the fact that people haven’t been writing code or documenting with thread-safety in mind.
I think the first step here is good terminology for users, followed by actually determining a few behaviors that are worth guaranteeing. I think dict.setdefault is a good example worth guaranteeing consistency, but there’s the caveat that dict subclasses may not uphold it.
As for terminology, The difference between “won’t crash the interpreter” (threadsafe), “will be consistent across threads” (consistent), and “data race-free” is commonly not expressed in discussions, I’m glad it has been here in the back and forth, but it’s a core component to understanding if something is safe in a specific use.
Agreed. Hopefully, we can find a way to improve things without making it impossible for people to continue writing code and documenting without having to worry about thread safety (if they don’t want to).
Agreed.
What frustrates me is that free threading merely ensures “won’t crash the interpreter” levels of thread safety, but the messaging around free threading is failing to make that clear. We have people who think the GIL promised consistency and/or protection against data races, who don’t understand why the free threading work hasn’t “preserved” those protections. And people who expected more of the free threading work, because “the interpreter won’t crash” is a bare minimum, and nobody’s telling them about any other benefits they’ll get. Etc.
Having the terminology is great. We need to work out how to use that terminology to get a message across that the average Python user[1] can appreciate.
And remember, a huge number of Python users aren’t developers by trade, so that average is a lot less informed than you think it is! ↩︎
Yeah, it’s easy to forget that the average Python user probably only knows the algorithmic complexity for one or two data types. And how Unicode works, of course. Of course.
In your version, there’s a risk of a false positive, where a value that has already been deleted is returned instead of the default. I don’t believe this approach is free from race conditions:
if new_ref is keyed_ref:
return default
else:
return new_ref()
Honestly I suspect the average Python user has almost no exposure to parallel computing and no desire to learn how to do it themselves. Their most direct usage of multi-threading/processing would be trying out the n_threads/n_processes/n_jobs argument in a function they are calling. And they will very pleased if that starts to work more efficiently and show up more often.
There’s lots of document, and there’s some balance to hit between documenting everything for the average user, while also efficiently enabling the developers most eager to actually take advantage of free threading. That latter group are the ones who will really drive the success of free-threaded Python, because their work will make the benefits available to the whole community.
I’m still confused as to how the whole community will benefit, though. If we don’t provide some form of easy to use structured concurrency that the average user can understand and use, then the majority of people will stick with single threaded code.
The concurrent.futures module is the closest we have to that sort of effortless parallelism right now, but it’s still a long way from being easy and safe[1]. And the message I’m hearing (repeatedly) is that free threading does nothing to improve the usability of concurrent.futures. Conversely subinterpreters, with their isolated by defaul model, might just do so…
I’ve had people tear my naive code to shreds enough times that I can say that with some confidence ↩︎
I was thinking of users[1] who mostly use popular third-party packages to perform data analysis and the like. This section from PEP 703 goes into this use-case a little bit. For instance, users of scikit-learn have long taken advantage of the n_jobs parameter to parallelize their model fitting, etc. That parameter is the only interface they need to worry about, but the sklearn folks have done a ton of work maintaining joblib to make it performant most of the time.
Based on the quote in the PEP, if scikit-learn could switch entirely to free-threading they’d both simplify their code and improve performance. Their users wouldn’t need to think about threading at all to take advantage of that, and that’s 100M downloads per month from PyPI. That’s why enabling the developers of major packages will benefit the wider community that isn’t comfortable with writing free-threaded code.
I’m a bit surprised you have that impression. To me, free threading means I can replace concurrent.futures.ProcessPoolExecutor[2] with ThreadPoolExecutor and I’ll a) improve performance and b) be able to do more stuff with it (because I won’t have to worry about pickle-ability). This is a lot of trivially parallel stuff–there’s no locking or cross-thread communication. I don’t think subinterpreters would make that code any simpler to write–it would prevent me from breaking isolation, but I was not trying to do that in the first place.
You might be alright in that situation, because code that currently uses ProcessPoolExecutor has to avoid shared state (as I understand it). But as soon as you “do more stuff with it” you’ll have to ensure safety yourself - there’s no built in isolation to help you, and all the stdlib offers is “the interpreter won’t crash”.
If you’re able to write safe, race-free code in shared-memory multithreading, you’re not the sort of “average Python user” I was talking about. (My go-to example, which has unfortunately been thoroughly ripped to shreds by now[1], is a hobbyist with very little Python experience doing Monte Carlo simulation, and thinking “hey, if I run this in a thread pool using my 8 cores, I’ll get 8x the performance!”)
because people thought I wanted them to show me how to fix the problems with it, when the fact that the problems were there is essentially the point… ↩︎
You might not even “do more”, you might just find that your logging setup which previous put os.getpid() in the log filename is now massively corrupted because you’ve got 20 threads interleaved into one file, rather than 20 files.
Fixable, sure, but the reason we’re all pushing on this is because you’re[1] expecting non-developers who pulled together some bits of sample code to either (a) know that they need to do this, or (b) diagnose and debug that they need to do it.[2]
I suspect subinterpreters wouldn’t help here either, but I can easily contrive an example where they would. I know we’re all problem solvers here, but the focus is on the people, not the problem. ↩︎
By “do more” I just meant “pass objects I couldn’t pass before”…like a lambda function, or an unpickleable class. I agree that it’d be easy to slip into stuff that isn’t safe–that’s true with processes too, and I’m used to it.[1]
I was just pushing back a little on the “free threading does nothing to improve concurrent.futures” comment. To me, it definitely improves that experience. It doesn’t make everything perfect, but it’s a noticeable improvement in usability.
and yes I was just fishing for compliments about how great a programmers I am ↩︎
There are several (three) complexities that should be notified :
-Sequential-based algorithms (like logistic regression) would not benefit that much from free-threading.
-The most vectorizable algorithms (like PCA using SVD under the hood) are already leveraging SIMD/Lapack/BLAS stuff that is heavily multicore-optimized.
-Another thing is that for some applications (outside of sklearn probably), some multithreaded version might also be slower than the sync one… depending on the cache/RAM I-O or CPU bound aspects, the machine architecture, the working set size (in memory), the locks concurrency, etc…
→ So there is a large quantity of parameters implied (and of course some tasks are by nature multithreadable beneficially while some others are actually not).
For sure–I am not trying to downplay the complexity of threading overall. I was just trying to convey that there are projects out there that understand the complexity[1] and would take it on, because it would improve their code and provide a benefit to their users.
To be fair, I meant to say "does nothing to improve the usability of concurrent.futures.ThreadPoolExecutor. What I don’t understand is why free threading makes any difference to whether you can (or should) switch from ProcessPoolExecutor to ThreadPoolExecutor.
And to be clear, the reason I’m interested is not for my own education, but because we need to be able to articulate the logic behind that decision in a way that the average Python user can correctly evaluate the risks and trade-offs. Which at the moment, no-one seems able to do without triggering a debate that clearly demonstrates that the answer isn’t as straightforward as it needs to be if it’s to qualify as suitable for an average user…
Average python developer confirms this. It is not that I have no exposure to parallel computing, it is rather that there is no positive experience about parallel code at all (except that it scales).
That said, we enjoy using numpy for example which uses lapack which is written with openmp. So our most optimistic expectations is that more of our libraries will just get faster with free-threading. Such a fine topic as “the details of python bytecode and how it affects race conditions across different python versions” has no relevance to my daily job and millions of others I believe.
Sorry, I was wrong about this. I saw a discussion about making the builtin iterators (map, range, iter) threadsafe and thought that had happened for 3.14, but I misremembered and doesn’t appear to have.
From source diving to verify: The builtin primitive builtin data types: int, str, bytes, dict, set, frozenlist, list, and tuple are mostly threadsafe, with the exception of sharing and reusing an iterator created from them across threads (iter, .keys, etc). In addition to that, collections.deque is also threadsafe aside from iteration. Of the mutable ones of those, there are safe ways to use them and unsafe ways to use them outside of iteration, which isn’t currently safe.
None of this is a guarantee currently, and it is specific to CPython, but it would be hard to imagine a world where anyone argues to make these less safe than they currently are when discussion of making iteration safe for these is currently happening
I agree with people above, it would be better if some of these things became language level guarantees, but the most that many things can offer is consistency, not race-free behavior. Race free behavior involves application design.
To be clear, when you say “threadsafe” do you mean “can’t make the interpreter crash” or are you referring to some stronger property than that?
Because honestly, I don’t consider “can’t make the interpreter crash” to be a very interesting property - as far as I’m concerned, if you can crash the interpreter that’s just a bug. Certainly not something worth documenting.