This would actually be extremely helpful for PyO3, and possibly extension authors in general. We were considering adding our own pseudo-GIL, but having this in the interpreter would be much better because it would then be more semantically equivalent than a localised “GIL” per PyO3-based extension.
Perhaps it would even make sense for the existing PyGIL_x APIs to lock/unlock this pseudo-GIL, as a backwards-compatibility aid for existing extensions.
If I’m reading this PEP correctly, the thread-safety requirement for native threads making Python C API calls changes from “this thread must have acquired the GIL” to “this thread must have a Python thread state”, so there may need to be PyThreadState_Attach() and PyThreadState_Detach() APIs so that native threads correctly pause for the stop-the-world GC before interacting with Python objects. Or would this be done internally to the Python C API?
We were considering adding our own pseudo-GIL, but having this in the interpreter would be much better because it would then be more semantically equivalent than a localised “GIL” per PyO3-based extension.
It’s not clear to me what you mean by a pseudo-GIL. What are the precise semantics? Here are two possibilities given in the current “nogil” prototype:
Local mutual exclusion: This is provided by the critical section API. In the context of PyO3, this is what you’d want for something like GILOnceCell because it avoids deadlocks in the same way as the GIL. (i.e. for, the same reasons you provide GILOnceCell as an alternative to once_cell).
Global Mutual Exclusion: In other words, preventing all other threads from accessing Python objects. This is provided by the “stop-the-world” functionality, but is not exposed as a public API to extensions. Using this at a fine-grained level would, I think, be catastrophic for performance. I think the community would be better served by an extension simply not building for the --disable-gil mode than it any widespread use of stop-the-world by extensions.
If I’m reading this PEP correctly, the thread-safety requirement for native threads making Python C API calls changes from “this thread must have acquired the GIL” to “this thread must have a Python thread state”
As mentioned in the PEP, the API calls remain unchanged. For example, before this PEP you need to call PyGILState_Ensure (or similar function) before accessing the C-API from a native thread. After this PEP, you still need to call PyGILState_Ensure (or similar function) before acecssing the C-API from a native thread.
I’m proposing Global Mutual Exclusion, with the existing API, but for a different reason. Python is now thread-safe, but many extensions can be presumed to be thread-unsafe unless they have had their internal data access evaluated for nogil use and ported.
Right now, the PEP changes I’ve identified as annoying for existing extensions are:
GIL removal: I suspect this will introduce non-Python data races in many extensions.
PyDict_GetItem / PyList_GetItem. This may be avoidable as I mentioned here, though I am unsure of the performance impact of hiding this change from extensions. (However, it’s another subtle safety-by-default issue, so I think it’s worth trying)
Refcount changes. There’s no way around this.
Memory API changes. I haven’t really evaluated these.
For the GIL, I suggest we give “legacy extensions” the old GIL semantics for the sake of their code unless they opt out somehow. It changes the incentives for extension authors. Instead of users reporting “your extension (and others) crashes if I compile them for nogil” they will report “we can improve performance on nogil by switching from legacy gil to critical sections”. If the public perception of nogil is that existing extensions now have subtle crashes, I think it could hurt the adoption. This suggestion changes it from a safety problem (nondeterministic crashes on nogil), to a performance problem (your extension hasn’t been updated so it’s holding the pseudo-GIL).
People will definitely hit nondeterministic crashes on legacy extensions otherwise, and will have no idea how to debug them. They will just know that switching back to GIL fixes the issue.
I know this means existing extensions may not be as performant for multithreading, as they will still hold a GIL. But it means they shouldn’t crash, so compatibility is better. And there’s a way forward to improve their performance, without having to go through the crashing step first.
(Side note: I don’t actually know what PyO3 wants here, @davidhewitt should definitely go into detail on their use case)
I wonder what this means for package users. For package developers to benefit from the GIL removal, do they need to develop only for the nogil variant, or can the benefits carry over to the GIL-ed variant? If the benefits are only possible with the nogil target, it’s a strong incentive for developers to only develop for it, The perceived benefit of having two variants will be largely, if not entirely, compromised.
Thanks for the feedback Ryan. Based on your suggestions and @eric.snow’s prior suggestions, here is a possible modification to the PEP:
The --enable-gil build of Python (the default) always runs with the GIL.
The --disable-gil build typically runs without the GIL, but this behavior can be affected at runtime by extension loading and overridden by environment variable.
In --disable-gil builds, when loading an extension, CPython checks for a new PEP 489-style Py_mod_gil slot. If the slot is set to Py_mod_gil_not_used, then extension loading proceeds as normal. If the slot is not set, the interpreter pauses all threads and enables the GIL before continuing. Additionally, the interpreter issues a visible warning naming the extension, that the GIL was enabled (and why) and the steps the user can take to override it.
PYTHONGIL environment variable
In --disable-gil builds, the user can also override the behavior at runtime by setting the PYTHONGIL environment variable. Setting PYTHONGIL=0, forces the GIL to be disabled, overriding the module slot logic. Setting PYTHONGIL=1, forces the GIL to be enabled.
The PYTHONGIL=0 override is important because extensions that aren’t thread-safe can still be useful in multi-threaded applications. You may want to use the extension from only a single thread or guard access by locks. For context, there are already some extensions that aren’t thread-safe even with the GIL, and users already have to take these sorts of steps.
The PYTHONGIL=1 override is occasionally useful for debugging.
This solution would work very well for PyO3 - we would be able to default to requiring GIL and then extension authors can opt out of the GIL when they are comfortable.
I fear that the downside of this approach is that nogil would take a long time to build traction, because it becomes a problem of extensions being built for it plus extensions being upgraded for it. Nevertheless it’s probably worth it for the safety advantages - a slow and stable rollout would be good from perception of users.
I’ve been thinking the transition to a GIL-free world would look a lot like the 2-to-3 transition, dragging out interminably. The details would obviously be much different, but giving extension authors an escape hatch would likely delay the widespread adoption of nogil. Would an application writer know that their presumed GIL-free application had been hamstrung by some extension they might well have no control over? Would they be able to pinpoint the source?
This is assuming a GIL-free world is the end goal, which, as long as the GIL-ful version is considerably faster in many common Python applications, is a notion I disagree with. Being able to choose between GIL-ful and GIL-free at run-time feels like a better end goal to me but I’m sure there are many reason why that is bad/hard/impractical.
In preventive case nogil doesn’t make it for 3.12, wouldn’t a convention be already nice on how wheels for this alternate reality shall be named ? … replacing the “c” of “cp” per a “n” for nogil cpython ?
cramjam-2.6.2-cp312-none-win_amd64.whl = with gil
cramjam-2.6.2-np312-none-win_amd64.whl = nogil
cramjam-2.6.2-cp312.np312-none-win_amd64.whl = with or without you nogil
It would allow to use existing pypi infrastructure.
wouldn’t a convention be already nice on how wheels for this alternate reality shall be named ?
No because I don’t even necessarily agree it’s an interpreter-level tag difference compared to an ABI tag difference. And I personally don’t want to expound the energy on that debate unless it’s going to be useful (and my active COVID recovery says it wouldn’t be at this time).
it looks about 80 commits still to apply in the todo of nogil-3.12 , plus about 420 commits from cpython-3.12a4 to now, so nogil-3.12 is roughly 500 commits behind.
Given the benefit it brings to some workloads in the cloud, I would expect that unofficial nogil-3.12 will get a bigger community than pypy, and offering them a pypi way-of-blooming, aside of any SC approval for “mainline cpython”, would be great.
(I’m not sure where PEP 703 discussion is happening now that several of the major threads here have been locked by the moderators, so apologies if I should post this somewhere else.)
We at Anaconda have been watching the nogil work with great interest for some time. After some internal discussion, I wanted to announce that we are able to commit Anaconda engineer time toward the packaging challenges that will be associated with adopting PEP 703, including any work on pip, cibuildwheel, and conda-forge that will be needed to get nogil-compatible packages into the hands of the Python community. We can coordinate with the PyPA folks to see where we can be most useful if the PEP moves forward.