My broader point wasn’t about the vagaries of dict.__setitem__ but that we might be panicking a little too hard about the difficulties of multithreading. I’m not saying they’re zero, obviously, but let’s not get scared into thinking all of this is incomprehensible.
But in this particular case (threads fetching urls) using a dict would be both safe and consistent. Presumably you wouldn’t fetch the same url twice, and even if you did so what, nothing bad would happen. Right?
Sure, there are plenty of cases where it’s fine, but also plenty where it’s not. There’s nothing to gain from counter-counter-examples on this particular point - just say that it is your desire that writing expression-level thread-safe code should be a core part of learning Python, and allow others to say that their desire is that it shouldn’t be.
It also doesn’t benefit from parallelisation today, and so it doesn’t arise as an issue. It may benefit from multiprocessing, but because that model[1] basically forces you to accumulate per-process and then merge at the end, you end up with a pattern that works (and would also work for threads, but the trick is in discovering the pattern, not that there’s anything magical about it).
Once everyone “knows” that you can just run the same code on multiple threads on the same objects and get some amount of speedup, just watch it happen more often, and more and more people will discover that they need to line-by-line review and fix all their code to figure out why the results don’t add up. Or they’ll whack great big locks around library functions that need fixing and wonder why the contention outweighs any performance improvement.
I feel like the talk at the language summit from the people behind Project Verona/Lungfish is very relevant to this discussion. They have a plan to add rust-like ownership semantics to Python, with the net effect of being able to turn races on any Python object into a runtime error, among other benefits. I’m not sure if there’s a public version of the slides that we saw at the language summit - but the general idea is summarized over in the discussion about their first step: adding a way to immortalize Python objects.
It’s a long-term plan, but also it appeals to me that we would also in the process fix incorrect multithreaded GIL-enabled workflows that trigger races on state tracked by the interpreter.
Maybe @stw can elaborate more on how his team’s work directly speaks to the concerns people are raising in this thread.
What I’d say users want is essentially the ability to write code in a performance-agnostic way (i.e., most straightforward implementation) and then, if they later decide they want it to be faster, they want to be able to “make it faster” by leveraging multiple cores, with as few changes to the original code as possible. (Whether they need that, I’m not sure. )
The problem is it’s hard (at least for me) to see from here how these different options for parallelism will translate to that, because what matters for users is not the underlying parallelism model but the tools that are built on top of that to make it easy for users to accomplish practical tasks.
It does seem to me that many users perceive free-threading as more likely to be best in the end. One reason is that some of the problems people have currently with multiprocessing (like the process startup time cost) seem pretty hard to overcome. Another is that it’s easier to share data within a single process than across processes. (As far as I can tell, subinterpreters are closer to separate processes than to threads along these dimensions.) Of course, the fact that it’s easier also means it’s easier to do it in an unsafe way that will blow up later. But I think what people are hoping for is a future where there is free-threading and also a convenient set of tools that let us write more or less the same code for single- and multi-threaded use, with little to no penalty in clarity or performance on either side. It may be that that’s unrealistic[1], but I think that’s what drives a lot of anticipation about free-threading.
In particular, I think it would be hard to get there without overhauling the builtin types to be more thread-friendly. ↩︎
To add to this, I’ve been searching in vain for years for an explanation for when/why/if parallelising a bulk loop operation using ayncio would be faster than using a ThreadPoolExecutor().map(). Likewise when I have several different sometimes-IO-bound tasks in parallel and I can choose between asynio or using a vanilla threading.Thread() for each task.
As someone who’d be an end user, I don’t quite see what the advantage is of not having free threading, when presumably, there would be some number of libraries that could abstract away some of the difficulty away from free threading difficulties.
Or I guess it’s that I don’t particularly see why it would be a good idea to restrict the capabilities of parallelism based on some design decisions in python and the std lib. To me it makes more sense to let the GIL, as much as possible, let people play and figure out what common patterns can be abstracted into libraries. Rayon, for example, is a 3rd party library. It was able to figure out a way to get the type system to have less foot guns when doing threading. Python doesn’t have a strong type system like that, but I think the general idea of “we have this language, and we should let other people be creative about how to abstract away threading safely in python.”
This is basically, “assuming everything is alright, everything will be alright”
Looking at other languages, they’ve largely abstracted away the difficulty by adding async/await (task parallelism, rather than data parallelism), web workers, or thread isolation. We’ve got those already, so while every other language is tacking them on to try and save their developers from traditional multithreading, why are we trying to ignore those and push towards traditional multithreading?
Thanks, I’m well aware of how it was implemented in Python. But I was actually referring to other languages, where async/await was universally (as far as I’m aware) added to abstract away the complexity of the language’s futures/promises abstractions, which were added to abstract away the complexity of non-returning thread join leading devs to use mutable shared memory instead of passing results safely.
Why didn’t Python add it for the same reason? Because we didn’t have the same problem.
Because they are prior art, and so we can learn a lot by looking at the lengths (and directions) they go to avoid using it.
For instance, have they developed “libraries that could abstract away some of the difficulty”? If so, what do those libraries offer? What are the difficulties they considered most important to abstract away? Which ones are successful, to the point of being codified into core runtimes or languages, and which ones didn’t really have an impact?
And once that information is gathered, where can/should Python go with it? Do we need to develop the same libraries? Can we develop equivalent functionality/abstractions without exposing the underlying hazards that led to their creation in the first place?
It often seems like we’re approaching free-threading as if we’ve only just invented it.
Isn’t this kind of understating the capabilities that have been implemented by @eric.snow for PEP 734? It implies that sharing has the same limitations as multiprocessing. Yes, currently sharing is limited at a basic level to certain primitive types, but those types can be shared without duplicating memory. Also, anything supporting the buffer protocol can be shared as memoryviews - and this has very low latency regardless of whether the buffer is very large!
(It also omits mention of the Project Verona team’s proposal at the language summit to add immutable conversion of objects and ability to share them across interpreters.)
With the exception of async() , the standard-library facilities are low-level, machine-oriented, threads-and-lock level. This is a necessary foundation, but we have to try to raise the level of abstraction: for productivity, for reliability, and for performance. This is a potent argument for using higher level, more applications-oriented libraries (if possible, built on top of standard-library facilities).
I’ve definitely seem similar discussions for .NET and JavaScript, though I don’t recall whether those were public (I’m pretty sure the .NET one was), and it was easier to find C++ because they do most of their fun discussions in public.
(The mentioned async in that quote is a standard C++ construct, by the way, not a generic reference or a third-party library.)
I think the point @pitrou is making is that those languages still have those low-level abstractions, and anyone can use them. The document you linked is just a guideline. I definitely think Python will benefit from good guidelines!
Are you proposing “implement free-threading but don’t expose the low-level parts”? I think that would be too limiting, as some large and popular libraries have expressed interest in low-level access.
Even if the average Python programmer doesn’t want to deal with threads, the developers of packages like numpy and scikit-learn seem willing to deal with the complexity, and they have a vast user-base who would be very happy to take advantage.
And the point that I thought was implicit is that those languages started with the low level features and can’t now take them back.
Of course you can argue that C++ would never hide the lowest level system functionality from its users, but that’s also their explicit domain. If you want to argue that Python is for implementing operating systems, go ahead, but I don’t think you’ll get much buy-in.
The problem with totally open free-threading is that “the average Python programmer” doesn’t get a choice. It actually only gets more complicated when libraries are trying to use it and hide it from the user (see all the issues that arise when libraries try to do this with asyncio), it tends to work out better for the average Python programmer to learn how to do multithreading properly and do it around the libraries, rather than within.
Besides, libraries that really want to deal with the complexity already have all the tools they need. They’re a bit constrained by backwards compatibility constraints (whereas something like polars can design from the start to perform extended computation without having to go back into Python), but tools like Dask and numba have already shown ways to minimally extend numpy and scikit-learn to be able to parallelise without needing to drastically change the underlying runtime. Doing more on threads within that computation is absolutely possible without needing Python-level free-threading.
I’m quitting this thread now. Nobody is going to change their mind, so I’ll leave it to the silent readers to decide whether we’ve actually done a thorough job of understanding how free-threading is going to impact our users, and whether it’s the best option.
Well, other people like me are of the opposite opinion: that it will give them more choice than the statu quo where the GIL leads to complex, brittle and less performant alternatives such as multiprocessing.
Tools like Dask and Numba have their own CPU-heavy bottlenecks (such as the JIT compiler in Numba and the scheduler in Dask) that may actually benefit from removing the GIL.
I suppose for me I view python as a hacky way to do stuff with foot guns anyway, so this wouldn’t change my usage of python that much, but I do get the argument that a free threading option for beginners could be a bit annoying.
On the other hand, multithreading is just hard in general, and I would guess that a normal python user writing scripts would find the sub interpreter stuff a little more unpythonic. Python hasn’t really been the language to force a good coding style on people to avoid bugs. It’s been “how can I get this working as fast as possible, bugs be damned.” In that sense, more “safe” solutions feel like they would just get more in the way of that style of coding. Is that the correct way? I don’t know, but it’s definitely how a lot of people use python.
From my understanding, the benefits of removing the GIL as opposed to a safer solution are(
It’s faster
It gets less in the way
Safer libraries can be built on top of it.
The disadvantages of removing the GIL are
It has more footguns for someone using it
It requires some amount of churn in current library ecosystem.
I’m generally of the opinion that type safety is the best model to try and avoid footguns like that, so maybe my opinion is just we should have free threading, and then think of cool ways to use the type system that we’re building in python to make that an optional way to guarantee safety. That’s certainly a future problem to solve, but I feel like that’s the most useful path: give the user a hacky way to do it, and then give them the tools to avoid some of those footguns when they want to do it safer.
As a note, I’m not super invested in my view here. I was hoping the view as a user might be valuable to hear.