My vote goes to no: make 3.12 safe, then remove the limitations.
For example, PyMem_SetAllocator with PYMEM_DOMAIN_MEM or PYMEM_DOMAIN_OBJ could block creating independent GILs, and new PyMem_SetGlobalAllocator could be added.
And, I guess setting memory allocators should be blocked if multiple GILs exist? Apparently, after Python is initialized, PyMem_SetAllocator should be only used only for hooks that wrap the current allocator (is that right @vstinner?), but creating such a hook using PyMem_GetAllocator gets you a race condition. IMO the best thing the initial implementation can do is to fall, and leave a better solution for later.
A wrinkle is that PyMem_SetAllocator has no way to signal failure ā it silently ignores errors. Guess it predates PyStatus?
IMO, the solution is to not opt in for now. If synchronization/introspection API is missing, letās add it after the PEP is in place. (IMO there are many issues in this area ā thatās why Iām trying to convince Eric to make the initial implementation safe but limited.)
That said, a thread-safety restriction on the allocators is the simplest way forward for a safe 3.12 (under a per-interpreter GIL). Or were you talking only about the constraint on extension modules?
Do you mean if someone sets a custom mem/object allocator then subinterpreters with their own GIL should not be allowed? That is reasonable, if we donāt have enough information to conclude that existing custom allocators (used with PyMem_SetAllocator()) are thread-safe.
What would this do?
Yeah, thatās a race weād have to resolve. However, rather than disallowing it, Iād expect a solution with a granular global lock, like we have for the interpreters list.
Right. Weād have to do something like leave the current allocator in place and return. Then youād have to call PyMem_GetAllocator() afterward to see if your allocator is set. A function that returned a result could be helpful.
Regardless, it would make more sense to me if we had a separate API for wrapping the existing allocator after init (e.g. PyMem_WrapAllocator()). Then PyMem_SetAllocator() would apply only to the actual allocator and only be allowed before runtime init. However, that is definitely not part of this PEP (nor necessary for it).
Yes, that seems like the easiest safe way forward.
Same as PyMem_SetAllocator, but allow subinterpreters with their own GILs ā i.e. that allocator would be assumed to be thread-safe.
(Yes, it needs a better name.)
Thanks for clarifying. I agree that we should look into a new allocator set/get API that relates to interpreters. However, I donāt think this PEP needs that.
Thatās a good idea. Iāll make a separate post just about this.
Regardless, I was hoping to leave specific APIs that help extension modules out of this PEP. From PEP 684:
We will work with popular extensions to help them support use in multiple interpreters. This may involve adding to CPythonās public C-API, which we will address on a case-by-case basis.
Iām sure we will add a fair number of utility APIs that might help extension maintainers reach multi-interpreter and per-interpreter GIL compatibility. It seems like the PEP would be out-of-phase with that effort, so it would be better to not include specific additions in the proposal.
+1
Yeah, thatās certainly something to look into (but not for this PEP). I known @steve.dower has some thoughts in this area, and certainly @vstinner does and I do. That said, Iād rather any further discussion on this get its own DPO thread, to avoid side-tracking the PEP discussion.
faulthandler the crash reporting feature would remain per process. Just as it can do with dumping the current traceback of each thread in the VM, it should presumably be extended to do that for each subinterpreter so that it is clear which tracebacks belong to what.
faulthandler.dump_traceback* APIs could just dump thread stacks related to the calling interpreter? Or easier: simply restrict all faulthandler APIs to being called from the main interpreter rather than allowing them from subinterpreters. Given they deal with process wide state, just donāt let subinterpreters call them at all.
Itās not a clear-cut answer as it all depends on how you want to utilize per-interpreter GILs. WebAssembly does not natively have threads, so it would be no different than the situation today. If those Emscripten-based WebAssembly runtimes support some version of threads and that can be used from a pthread API, then it should be transparent. But all of that is up to Pyodide and Emscripten.
CPythonās runtime relies on some global state that is shared between all interpreters. That will remain true with a per-interpreter GIL, though there will be less shared state.
From what I understand, WASM does not support any mechanism for sharing state between web workers (the only equivalent to threads of which Iām aware). So using multiple interpreters isnāt currently an option, regardless of a per-interpreter GIL. IIUC, at best you could run one runtime per web worker, which is essentially multiprocessing.