Yes, that seems like the easiest safe way forward.
Same as PyMem_SetAllocator, but allow subinterpreters with their own GILs – i.e. that allocator would be assumed to be thread-safe.
(Yes, it needs a better name.)
Yes. It’s out of scope for this PEP, but :
We probably should expose API for user-defined granular global locks. AFAIK we don’t have a good way to “allocate lock if not already allocated” that would work with multiple GILs.
Such a lock would be useful one-per-process modules (the isolation opt-out), as well as for Marc-André’s use case. IMO, this should be addressed relatively quickly, so people don’t start writing extensions that are only usable in the main interpreter. (I see relying on a single main interpreter as technical debt. Eventually I’d like to allow a library to call PyInitialize without caring whether there’s already an interpreter around. The concept of a main interpreter complicates that, but if it’s contained in the core, it should be manageable.)