I’d also like to mention my use case, as it’s not AI/ML and would significantly benefit from GIL removal.
I’m developing a desktop app with dozens of soft-realtime systems in a single process, and a GIL stall (when some native code holds the GIL for too long) can cause a stutter on multiple of the systems at once.
The app is Talon. It’s extremely user-scriptable with Python, with features like eye tracking, head tracking, speech recognition, noise recognition, audio processing, 60fps screen overlay, keyboard/mouse input taps, which all might make blocking calls into the Python layer.
If the GIL is held for, say, >100ms by a C extension - the user’s eye-tracking mouse cursor may freeze in place, an input tap could block physical keyboard/mouse usage until a Python callback can next be scheduled, and any overlay UI will drop frames. Stall and jitter over 16ms may be noticeable (e.g. I upstreamed a fix for CPython lock jitter on Windows). We also can’t run soft-realtime audio code directly in Python at all, even though Python is nominally fast enough for it, because taking a long-held lock (the GIL) can block the OS audio stack and cause it to drop audio frames.
I’ve been manually working around issues by playing whack a mole with the most egregious GIL stalls, but it’s still a problem in edge cases, and on more heavily-loaded user systems or weaker CPUs.
Solutions like multiprocessing are a bad fit here, as a given user will have hundreds of custom scripts running. We can’t give each user script their own process, because that would use far too many resources. There’s no easy place to implicitly split the workload. The closest solution I have in mind (besides nogil) would be to provide a WebWorker-like API where you can ask for specific functions to run in another process or in a subinterpreter with a separate GIL, but the ergonomics of that would be weird and I’d rather avoid it if possible.