PEP 684: A Per-Interpreter GIL

CPython’s runtime relies on some global state that is shared between all interpreters. That will remain true with a per-interpreter GIL, though there will be less shared state.

From what I understand, WASM does not support any mechanism for sharing state between web workers (the only equivalent to threads of which I’m aware). So using multiple interpreters isn’t currently an option, regardless of a per-interpreter GIL. IIUC, at best you could run one runtime per web worker, which is essentially multiprocessing.

1 Like

I just want to add that per-interpreter GIL would greatly increase Python’s usefulness for User-Defined Functions (UDFs) in DuckDB. DuckDB automatically parallelises SQL queries, including those with UDFs. However, thus far, we have been severely blocked to do this with Python as a UDF implementation language because of the GIL. The only way around this currently is to fork additional processes and to ship inputs and outputs around between processes with all the associated headaches. So yes, please add this!

7 Likes

Thanks for the insight!

1 Like

Thanks for all the hard work and insights on this! Is PEP 684 still targeted for the 3.12 release?

Yeah, we’re still aiming for 3.12, assuming the PEP is accepted by the Steering Council.

5 Likes

On behalf of the Steering Council, I’m happy to report that we have accepted PEP 684.

@eric.snow, thanks for all of your efforts on this PEP and all of the supporting work it took to get us here over the years!

37 Likes

With just this PEP, is there a performance gain from using subinterpreters in threading.Threads as opposed to just raw threading.Threads.

I’m trying to understand any additional level of concurrency we get via just this PEP. It sort of sounds like it’s the same as threading.Threads for now until we get per-interpretor GIL.

1 Like

Compared to regular threads

Subinterpreter threads can run works holding GIL in parallel.

Compared to multiprocessing (fork, forkserver)

Fork has many limitations and pitfalls, and Windows doesn’t support fork.
On the other hand, fork can share some RAM between interpreters.
Forkserver can be used to avoid some pitfalls.

Compared to multiprocessing (spawn)

Subinterpreter threads is similar to spawning multiprocessing in some way.
Both can run on Windows. Both can not share RAM between interpreters.
But subinterpreter threads is much faster to start compared to process.

In the future, we may be able to faster inter-subinterpreter communication and some memory sharing between subinterpreters. After that, subinterpreter threads can be more fast and efficient than spawn.

3 Likes

Not quite true, but there are a few hoops you have to jump through to share RAM (such as converting the memory address to an int, passing it across as bytes, reconstituting it and wrapping it in some kind of accessor object).

What you can’t share is Python objects.

2 Likes

Of course. All processes and sub-interpreters can share some RAM.
My point was the rough performance/memory efficiency characteristics compared to forking. (If concurrent.futures package adds SubinterpreterPoolExecutor.)

Have been discussing this PEP with people at work and we have not been able to answer this question:

In Python up to 3.11, user created objects are shared among threads and can be accessed and updated by any thread, although the GIL effectively limits concurrency. It seems that with PEP 684, this is no longer the case, i.e. the GIL does not limit concurrency but also no longer allows cross-thread access to objects, which would imply fork-style copy-on-write(?).

Thus raises questions to how things will actually work.

E.g.

  • a dict or list obj allocated on the main thread is inserted with objects from code run by several threads (think map-style operation). Will the main thread see/be able to read/update these inserted objects? What happens to objects created by a thread and inserted into a global obj, once a thread terminates?

  • a thread accesses and updates an obj created on the main thread (say an element in a list). Will the main thread see the update, and when?

  • can threads be selectively created with either a new interpreter (thus its own GIL, i.e. unconstrained concurrency vv main & other threads, possibly losing shared objects semantics), or by using the main process GIL (thus limiting concurrency but keeping shared object semantics)?

We assume the answers to these questions boil down to “works like multiprocessing fork”, but would appreciate clarity.

If our assumption is correct, it seems this PEP would introduce a new “fully concurrent/shared nothing thread model” that is not always backwards compatible. With this in mind, will the default threading model in 3.12 be to continue working with a global GIL to maintain backward compatibility?

This makes the GIL per-interpreter. Multiple interpreters are a different concept than threads. Within an interpreter, all threads still obey that interpreter’s GIL. Multiple interpreters no longer share a single GIL though, so the interpreters themselves are more independent of each other.

5 Likes

But that doesn’t come with an easy and safe way to handle lifetimes. A big improvement would be some sort of cross-interpreter memoryview object.

I don’t dispute that. Luckily, that memoryview object can be implemented on top of what I described, so we don’t need to block the initial implementation upon having such an object available :slight_smile:

Agreed, but given that it interacts with memory management and potentially interpreter finalization, its implementation probably needs to be provided by core Python.

1 Like

Using multiple interpreters does not (nor has ever) change the behavior of threads in a single interpreter. Each interpreter in the process is isolated from the others. Within a single interpreter, its threads will continue to share objects/data. A per-interpreter GIL does not change that. Basically, there shouldn’t be any change in behavior from Python 3.11, especially if you don’t already use multiple interpreters.

There is no copy-on-write between interpreters. When an additional interpreter is created, it is completely fresh. Anything you might have done in another interpreter must be done all over again in the new one. We would certainly benefit from mechanisms that facilitate explicitly crossing the isolation boundary between interpreters in a limited way, e.g. safely sharing objects, but that’s not where things are at right now. The first step is to get isolation correct, including a per-interpreter GIL.

4 Likes

I would say there is a difference. If you write a C extension that uses a static global to share an object between interpreters, you could get away with this in 3.11 and before. But in 3.12 it will invite race conditions since access to that object (from refcount to “real” mutations) is not protected by a single GIL.

Of course, this is all well understood and the solution is for extension modules not to do this, and to advertise this fact using the two-phase module initialization.

4 Likes

This should only be the case if using per-interpreter GILs, though, which is not the default (probably should never be the default) and isn’t something you accidentally do. If not using per-interpreter GILs, everything should work as before: one GIL for the process, the same allocators for all interpreters.

Fair, but that choice is up to the user or application, not the extension. Basically such behavior used to be harmless and now it may be a hazard. And the problem is that it’s hard to debug — an app may usually work in this configuration but rarely do something wrong.

1 Like

Thanks for explaining and making it clear that interpreters and threads are different and no change in behavior is expected (rereading the PEP now, I realize we somehow jumped to wrong conlusions).