I wonder what the outcome of this topic is supposed to be…
The answer to Eric’s question is clearly: no, since we already have multiple ways of getting Python to run well on multi-core systems, but it’s certainly one of the possible choices we want to have as well.
And IMO, it’s good to have choices, since there isn’t just one way of getting the most out of available compute. Which option to choose depends a lot on the application being run, data locality requirements, availability options and communication needs.
I know it but it is not far better than what fork() can share between processes.
Each interpreter need to import all dependency modules. It may take several seconds and MBs of RAM anyway for large projects.
Multiprocessing module has SharedMemory too.
Multi interpreter + per interpreter GIL looks like multiprocessing without process to me.
I am not denigrating or attacking multi interpreters. I am just saying that when you look at its characteristics objectively, it is similar in many ways to the multi process we have seen so far.
Multiprocessing has its drawbacks, especially in Windows, where subinterpreters are better than multiprocessing in many situations.
I didn’t attend to Language Summit and I just see Project Verona today.
If this project is successful, the usability of subinterpreter will be greatly improved.
However, I am not sure how many of the modules and classes loaded by import sqlalchemy can be shared between interpreters.
I also do not know if it is possible to have a shared connection pool among multiple interpreters, from which an interpreter can borrow a connection.
I feel Project Verona’s goal is farther than free threading.
Please note that fork() only works on select platforms, and comes with a large number of caveats and pains. It is generally discouraged. In 3.14, the multiprocessing module has switched to using the forkserver model by default, where there is no sharing between a parent and its logical children.
That’s how I view it too, but it could be an improvement in many situations still. First because launching a new process is often expensive, and second because if a child process dies, then it’s difficult to guarantee anything about any shared primitive such as a Lock or a Queue.
It’s rather annoying to setup though. Not many people will want to deal with the complication.
Of course, I know. The characteristics of a multi-interpreter are similar to those of a multiprocesses, but it has many advantages over a multiprocesses. I have no intention of ignoring or downplaying that point.
This thread is not about whether to stop the multi-interpreter, but whether to step free threading down (stop releasing build). So I did not feel the need to elaborate on the advantages of multiinterpreters over multi-processes.
Multithreading has completely different characteristics than multi-interpreter or multi-processing, and those who need both threads and multi-cores must use both multi-threading and multi-interpreter or multi-core at the same time. Once free threading is achieved, it becomes by far the easiest and most efficient.
To reduce the advantage of free threading, asyncio and multi-interpreters must be improved until multi-threading itself is rarely used. I feel that is more difficult than implementing a free thread.
Once I am in tree-threaded mode, as far as I understand, threading.Thread utilises all cores.
What if I want the old behaviour (just run many threads on a single core), can I do that in free-threaded mode? Say I need both: old and new behaviour of threading.Thread in the same application.
No, threading.Thread always uses all cores. When the GIL is enabled though each thread needs to hold the GIL to be able to do most things and only one thread can hold it at a time. That means that usually if any one thread is doing something then the other threads are all waiting for the GIL which prevents the threads from running in parallel even if they are running on different cores. The only way to have this “old behaviour” in the free-threading build is to enable the GIL process-wide (which can be done with a command line option).
FWIW, I’ve proposed the following a couple times: make the GIL a per-interpreter option [1], and maybe make the main interpreter have the GIL by default [2].
IIUC, the free-threaded build is currently implemented to enable the GIL automatically on a per-interpreter basis, in response to loading single-phase init modules. It would then be a matter of giving users the option when creating a subinterpreter. ↩︎
Conceivably, some of the performance penalties of free-threading could be mitigated when the GIL is enabled. Thus, enabling the GIL on the main interpreter by default would reduce the cost-without-benefit paid by users that do not need threads. Of course, that isn’t much of an issue if the overhead of free-threading is sufficiently low. ↩︎
Just to be clear, I started this thread to discuss the idea that we do have other options for multi-core parallelism; we don’t need to feel like “either we continue with free-threading or we give up on multi-core parallelism”. That’s definitely been an underlying, unspoken conclusion I’ve been sensing from parts of the core team, so I wanted us to talk about it directly.
I am definitely not arguing that “we shouldn’t pursue free-threading due to having other options”. Instead, I’m saying that we don’t have to pursue free-threading if we have strong reasons not to, since we do have other options.
Personally, I think free-threading is a huge headache for everyone, especially non-experts, and something to which I’m hesitant to expose all users. However, Python has always been a language that provides a wide selection of tools and programming paradigms. Plus, I trust the judgement of the core team as a whole and of the steering council, as long as we have all the information.
One thing that excited me about free threading (as a regular user) is lower memory and other resource usage for parallel tasks.
Right now if I have random code that does some IO and API calls in parallel via threads, the gil slows me down a lot. When I use processes, it works nicely BUT I’m using a lot more CPU and RAM.
I rewrote the same code in golang and it was a bit faster than both but used much much less resources. I was fine with python being a bit slower but I couldn’t really handle the large difference in resource usage.
The resource issue came from multiple processes all having a bunch of the same duplicated data. So more memory and then also more CPU usage between (probably) process context switching.
Long story short: free threading would give me the best of worlds: performance without the gil, and having the same (mostly read only) data in memory once.
I don’t think it’s the only option, but a nice one to have available for folks willing to properly use locks, etc.
Now we also have subinterpreters but that didn’t exist back then. (Also I have no idea how to share complex data between them.. leading me back to free threading).
Providing good documentation about this is definitely a short-term priority.
Likewise we will be working on making it easier to share data between subinterpreters efficiently and on making them start up faster and use less memory (i.e. due to the duplication you mentioned). Right now we’re at the minimal correct implementation stage, so there’s plenty of room to make improvements.
I have a bet on this, starting from the following assumptions :
The GIL is the best (most simple) safety available.
Opting for free-threading would offer more performance.
Users want a clear and minimal syntax to enforce all safe concurrency requirements.
The minimal safe concurrency requirements consist in properly declare which objects have to be protected by a lock before processing outside of the GIL.
Users do not want to implement locks or call acquire(), release(), …
Bonus : users wants a playground to experiment with / learn / teach the difficulties induced by concurrency paradigm.
I think of one possible approach to satisfy these assumptions… by deferring locking responsibilities into a “sub-GIL” : a dedicated instance temporarily replacing the GIL.
(I do not have the required expertise to know if a sub-GIL is possible neither to state the limitations of this idea in typical real-world use cases.)
Here is a “playground pseudo-code” to reflect the semantics :
@Rlocks(x, y) # reading x or y will raise in subGIL without this decorator
@Wlocks(z) # writing on z will raise in subGIL without this decorator
def process():
global z
z = x + y
with SubGIL(): # defers GIL responsibilities regarding x, y, z (inferred from decorators inspection)
process()
...
(Note: Things can be added : an RWlock, Barriers (possibly with await), Queues/Pools/Run of Threads/Processes…
Note: While this approach should protect from race conditions, I don’t think it can be made absolutely safe against deadlocks.)
The idea here is to allow to gradually shift sync operations outside of the GIL. One first step can be done from sync code (by adding decorators and a context manager) and later, the atomicity can be fine-grained (by splitting big functions into nested smaller ones), increasing performance (and complexity) while consistently ensuring all locks are declared.
It absolutely isn’t. It’s designed to make the interpreter thread safe. Users writing python code should not be relying on it to make their own code thread safe.
That’s right, the idea is that the sub-GIL is not a GIL… → It is rather something that is spawned by the GIL itself to defer the responsibilities of locking the registered objects in another frame, with functions akin to what threading.Lock do, but with more automation, finer-grained-optimization, and syntaxically more sugared. Thus the GIL is not removed, but serves as a basis to construct an external frame for threads to run within, where non-readability and non-writability are default for every unregistered objects.
Perspective of a lurking layman (who prefers subinterpreters):
Subinterpreters can provide most users with solid support for parallelism - not the maximum CPython could provide, but still reasonably good (IMO). Free threading goes all the way and extends support to that additional X% of use cases that may call for it.
The subinterpreter effort goes with the grain of CPython’s original, relatively simple GIL-based design. From what I gather, a lot of the work to make subinterpreters possible has not only improved CPython’s general performance and embeddability, but has also boosted the code’s quality and maintainability. Future work in this area will likely continue to align with the design and provide additional broad benefits that are independent of parallelism (faster interpreter startup, faster module loading).
Free threading (necessarily?) goes against the grain of CPython’s design. The work represents a very complex engineering effort that significantly raises the barrier to entry for contributors. And because the original design can’t be completely shed (due to API/ABI backward compat) the result will be a hybrid under constant friction that seems unlikely to ever reach a clean, coherent state (like a system built for free threading from the start).
Does reaching that added X% justify the “carrying cost” of CPython free threading? I don’t know, but from the outside the cost seems pretty high.
I think free-threading should be viewed (and communicated) as elevating CPython to a better baseline, not as a solution for parallelism, similar to how musl conforms to spec closer than glibc. I feel that the GIL deviates reality from typical Python programmers’ expectation, even if it allows for specific shortcuts.
This means any actual solutions for parallelism would be built on top of free-threading, testing it as a fact of life. I think a modern language should not be requiring devs to use primitives such as locks to achieve consistency in parallelised apps, but instead be using more intelligent mechanisms (such as async, dependency graphs, etc).
Note this is less about the language and more about the ecosystem of libraries available (not necessarily the standard library, either, though the concurrent.futures module already provides a bunch of useful functionality that can be sufficient in the simple cases).
I am curious as to why this discussion is here.PEP-703 indicates the PEP category post as the appropriate venue. What are the intended audience and particpants for this dicussion thread?