PEP 684: A Per-Interpreter GIL

This seems to imply that Cython doesn’t support multiple interpreters at all? Is that so? If not, under what conditions does or doesn’t it support them? This would be important since so much of the scientific Python ecosystem is built on Cython.

Would things be different if there was a single GIL-free interpreter (like Sam Gross’s nogil project)?

If this is an OS thread that doesn’t “belong” to any interpreter (yet), in most cases that should probably be an error. The problem doesn’t seem to be dependent on whether each thread has a GIL or not – if there is more than one interpreter active, a thread created by C code outside of Python would have to figure out which interpreter it wants to belong to before it could do anything. I think we currently lean heavily in the direction of always defaulting to the main interpreter, even though that could already be incorrect.

In the new design basically all objects are sensitive to interpreter context, since you cannot touch any object’s refcount without holding the correct interpreter’s GIL. (The exceptions are trivial constants like None or False which will likely be immortal per PEP 683 and can be shared between all interpreters.)

It is an interesting idea to to consider a function’s globals as providing a pointer to the interpreter, but I think that’s too late – without the right interpreter’s GIL you cannot even safely follow a pointer from a function object to its globals let alone do a lookup in that globals dict.

2 Likes

That’s right. The current status is:

  • In the main compilation mode most globals (include extension type typeobjects) is defined as static C variables at the global scope rather than on a module object. I made a PR today that should improve that dramatically but it isn’t the end point. The next step after that involves accessing the module state from PyType_GetModule and similar. That’ll probably be an optional compile-time option since I imagine it’d have a speed cost.
  • In the limited API mode (which is much less tested/supported/complete) the globals is put on a module-state struct. That struct is looked up from PyState_FindModule which by nature implies a single interpreter. As far as I can tell that’s basically the only option with the limited API.

It is an aim to improve this, although I wouldn’t like to put much of a time-line on it.

I don’t think so but I’m not fully sure of the implications of that. Sam did patch Cython so that it works with his nogil project (although that doesn’t use any special features of the project, it just ensures that Cython modules compile and run with it)

Chiming in with some thought we had in HPy regarding “the question of if supports-multiple-interpreters is equivalent to supports-per-interpreter-gil”.

From out Numpy porting effort: not only global state of libraries is an issue, but also extensions may have, for example, some global cache that does not hold PyObject*, but results of some costly computation that is otherwise pure C. If I understand it correctly, with GIL, this just works fine even with sub-interpreters. Without GIL, you’d have to put your own lock around accesses to this cache or put the cache into module state.

All in all, I think that this discussion shows that the best would be to decouple these things: make them more explicit and more future proof. There can be other “execution modes” in the future, for example, the mentioned no-GIL (putting aside how realistic it is that it lands in main soon), will there be some “multi-phase module init with I-can-even-deal-with-no-gil” thing?

In HPy we want to split the initialization into something like “extension initialization”, where the extension would tell us what expectations it has (e.g., I need GIL, but per interpreter is OK), but without making any API calls yet. (For HPy specifically it would also tell us which HPyContext version the extension is compiled against). Once that is settled, we can actually call the module init (locking GIL if it tolds us so, for example, for HPy also using the right HPy ABI version).

I also think that the “extension initialization” API should be designed in a way that by default the Python engine cannot take any assumptions, i.e., no subinterpreters, GIL required, and it will take some assumptions only if some very explicit flag is set. If I, as an extension dev, have to do something like extension_info->supports_per_subinterpreter_gil = true and I am not sure what this subinterpreter GIL thing means, it should be clear that I better find out (and supports_per_subinterpreter_gil will be a good place to document that). On the other hand, when I start my new extension by copy-pasting some preexisting extension example that happens to use multi-phase module init, it does not tell me so clearly that there are some assumptions that I am communicating to Python by using multi-phase module init.

Well, I did think about it a bit; the HOWTO says, relatively vaguely:

In these cases, the Python module should provide access to the global state, rather than own it. If possible, write the module so that multiple copies of it can access the state independently (along with other libraries, whether for Python or other languages). If that is not possible, consider explicit locking.

Whether anyone actually does this correctly is another question. in the stdlib, we have readline which IMO should do this, but I was never involved with that module. (Reviewing it is too low on my TODO list, but, sadly, so low that I’m not likely to get to it.)

Also note that cryptography is a widely used module that gets tested in a lot of different use cases – like with mod_wsgi, which tends to expose lots of issues related to multiple interpreters.
Other modules don’t get that kind of testing, so I wouldn’t be surprised if many were subtly broken.

Yes, PEP 489 says that multi-phase init modules “are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly”, but honestly, when it was written, no one knew what that means. For example, I thought that it’s OK to share objects like ints. We know now that even those must be per-interpreter (or possibly immortal), but for modules that were written in the mean time, interpreting PEP 489 as requiring “perfect” isolation feels like moving the goalposts – even if it is technically correct.

Despite what the spec says, it’s so hard to test isolation, so multi-phase init might actually not be a good indicator for “supporting subinterpreters” in general – not just for the GIL ‍:(

I guess there is a big-ish decision to make. Sub-interpreters & per-interpreter GIL won’t be perfect in 3.12, and we can choose between:

  • supporting all existing modules, but crashing/misbehaving in some race conditions, making the feature seem unstable, or
  • only supporting “updated” modules in subinterpreters with their own GIL, making the feature less useful.

Case in point:

What would this flag do?
Revert to the mechanism used for single-phase init (probably crashing/misbehaving in some cases), or make Python refuse to load the module in non-main interpreter (making multiple interpreters less useful)?
Or refuse to load the module more than once per process, which the HOWTO currrently recommends doing manually? (BTW, note how that recipe relies on a global GIL…)

2 Likes

Thanks for point this out (and about the benefit of a multiple-interpreters opt-out).

Would Cython defer to CPython doing the multiple-interpreters check? Is there something inherent to Cython that makes it incompatible with multiple interpreters?

UPDATE: you already answered this. :slight_smile:

Would Cython defer to CPython doing the multiple-interpreters check?

If there was an official mechanism to indicate (in)compatibility then we’d likely use it.

Is there something inherent to Cython that makes it incompatible with multiple interpreters?

I think this was what I already answered. However: I’m currently not sure whether we can support all of the current Cython feature-set with multiple interpreters. I’m mainly thinking about C functions that can access Python globals. (I’m also not sure this is something that Python can help with). So even in future when we support it properly it may be that some Cython modules will never be able to support multiple interpreters. But that’s very much a future problem…

Perhaps it is an error (in sqlite) to define a function in one interpreter and name it in a query submitted from another. I think there are environments where a new platform thread might invoke a call-back but I’m influenced by Java not C in that conviction.

Good point. The interpreter has to be identifiable in a thread-safe way before PyGILState_Ensure() completes. I wonder if only case-specific solutions can exist, in this case in the callback_context.

At this point I’m strongly leaning toward adding a moduledef slot for “supports use in multiple interpreters” (i.e. an opt-in flag). However, I don’t see the point of a distinct “supports per-interpreter GIL” slot since there doesn’t seem to be much interest for one without the other currently.

Contrary to what PEP 489 says, the default would be “does not support use in multiple interpreters”. Ideally the opposite would be the default, but it seems like there are enough extensions out there that would be a problem, even among those that implement multi-phase init.

That said, I expect we could switch the default at some point in the future. With that in mind, it would make sense to add an explicit “does not support use in multiple interpreters” moduledef slot now (matching the current default).

I’ve updated PEP 684 after the last set of feedback. You can see the changes in https://github.com/python/peps/pull/2807.

The PEP text is still at https://peps.python.org/pep-0684/.

Significant changes:

  • settled on keeping the allocators global but requiring that they all be thread-safe
  • the state of the existing “small block” will be moved to PyInterpreterState
  • dropped references to mimalloc
  • simplified the C-API changes
  • clarified the situation with incompatible extension modules
  • proposed that extensions always opt in to per-interpreter GIL support with a new PyModuleDef slot (at least until we have enough evidence that multi-phase init is sufficient)
  • expanded “How to Teach This”

For me the most critical things to settle are:

  • Are we okay to require that the “mem” and “object” allocators be thread-safe, whereas currently we say they can rely on the GIL?
  • Can we avoid making extensions opt in to supporting per-interpreter GIL (if they already implement multi-phase init)?

Open questions (from the PEP):

  • Are we okay to require “mem” and “object” allcoators to be thread-safe?
  • How would a per-interpreter tracemalloc module relate to global allocators?
  • Would the faulthandler module be limited to the main interpreter (like the signal module) or would we leak that global state between interpreters (protected by a granular lock)?
  • How likely is it that a module works under multiple interpreters (isolation) but doesn’t work under a per-interpreter GIL?
  • If it is likely enough, what can we do to help extension maintainers mitigate the problem and enjoy use under a per-intepreter GIL?
  • What would be a better (scarier-sounding) name for importlib.util.allow_all_extensions?
2 Likes

9 posts were split to a new topic: How to share module state among multiple instances of an extension module?

My vote goes to no: make 3.12 safe, then remove the limitations.
For example, PyMem_SetAllocator with PYMEM_DOMAIN_MEM or PYMEM_DOMAIN_OBJ could block creating independent GILs, and new PyMem_SetGlobalAllocator could be added.

And, I guess setting memory allocators should be blocked if multiple GILs exist? Apparently, after Python is initialized, PyMem_SetAllocator should be only used only for hooks that wrap the current allocator (is that right @vstinner?), but creating such a hook using PyMem_GetAllocator gets you a race condition. IMO the best thing the initial implementation can do is to fall, and leave a better solution for later.

A wrinkle is that PyMem_SetAllocator has no way to signal failure – it silently ignores errors. Guess it predates PyStatus?

IMO, the solution is to not opt in for now. If synchronization/introspection API is missing, let’s add it after the PEP is in place. (IMO there are many issues in this area – that’s why I’m trying to convince Eric to make the initial implementation safe but limited.)

2 Likes

Agreed. The PEP shouldn’t need more than that.

That said, a thread-safety restriction on the allocators is the simplest way forward for a safe 3.12 (under a per-interpreter GIL). Or were you talking only about the constraint on extension modules?

Do you mean if someone sets a custom mem/object allocator then subinterpreters with their own GIL should not be allowed? That is reasonable, if we don’t have enough information to conclude that existing custom allocators (used with PyMem_SetAllocator()) are thread-safe.

What would this do?

Yeah, that’s a race we’d have to resolve. However, rather than disallowing it, I’d expect a solution with a granular global lock, like we have for the interpreters list.

Right. We’d have to do something like leave the current allocator in place and return. Then you’d have to call PyMem_GetAllocator() afterward to see if your allocator is set. A function that returned a result could be helpful.

Regardless, it would make more sense to me if we had a separate API for wrapping the existing allocator after init (e.g. PyMem_WrapAllocator()). Then PyMem_SetAllocator() would apply only to the actual allocator and only be allowed before runtime init. However, that is definitely not part of this PEP (nor necessary for it).

Agreed.

I was talking about both :‍)

Yes, that seems like the easiest safe way forward.

Same as PyMem_SetAllocator, but allow subinterpreters with their own GILs – i.e. that allocator would be assumed to be thread-safe.
(Yes, it needs a better name.)

Yes. It’s out of scope for this PEP, but :

We probably should expose API for user-defined granular global locks. AFAIK we don’t have a good way to “allocate lock if not already allocated” that would work with multiple GILs.
Such a lock would be useful one-per-process modules (the isolation opt-out), as well as for Marc-André’s use case. IMO, this should be addressed relatively quickly, so people don’t start writing extensions that are only usable in the main interpreter. (I see relying on a single main interpreter as technical debt. Eventually I’d like to allow a library to call PyInitialize without caring whether there’s already an interpreter around. The concept of a main interpreter complicates that, but if it’s contained in the core, it should be manageable.)

1 Like

Thanks for clarifying. I agree that we should look into a new allocator set/get API that relates to interpreters. However, I don’t think this PEP needs that.

That’s a good idea. I’ll make a separate post just about this.

Regardless, I was hoping to leave specific APIs that help extension modules out of this PEP. From PEP 684:

We will work with popular extensions to help them support use in multiple interpreters. This may involve adding to CPython’s public C-API, which we will address on a case-by-case basis.

I’m sure we will add a fair number of utility APIs that might help extension maintainers reach multi-interpreter and per-interpreter GIL compatibility. It seems like the PEP would be out-of-phase with that effort, so it would be better to not include specific additions in the proposal.

+1

Yeah, that’s certainly something to look into (but not for this PEP). I known @steve.dower has some thoughts in this area, and certainly @vstinner does and I do. That said, I’d rather any further discussion on this get its own DPO thread, to avoid side-tracking the PEP discussion.

I started a thread at https://discuss.python.org/t/a-new-c-api-for-extensions-that-need-runtime-global-locks/20668.

1 Like

faulthandler the crash reporting feature would remain per process. Just as it can do with dumping the current traceback of each thread in the VM, it should presumably be extended to do that for each subinterpreter so that it is clear which tracebacks belong to what.

faulthandler.dump_traceback* APIs could just dump thread stacks related to the calling interpreter? Or easier: simply restrict all faulthandler APIs to being called from the main interpreter rather than allowing them from subinterpreters. Given they deal with process wide state, just don’t let subinterpreters call them at all.

1 Like

Will per-interpreter GIL work in a WASM context ? to bring parallelism also in this web context.
(Pyodide and Jupyterlite comes to mind)

It’s not a clear-cut answer as it all depends on how you want to utilize per-interpreter GILs. WebAssembly does not natively have threads, so it would be no different than the situation today. If those Emscripten-based WebAssembly runtimes support some version of threads and that can be used from a pthread API, then it should be transparent. But all of that is up to Pyodide and Emscripten.

CPython’s runtime relies on some global state that is shared between all interpreters. That will remain true with a per-interpreter GIL, though there will be less shared state.

From what I understand, WASM does not support any mechanism for sharing state between web workers (the only equivalent to threads of which I’m aware). So using multiple interpreters isn’t currently an option, regardless of a per-interpreter GIL. IIUC, at best you could run one runtime per web worker, which is essentially multiprocessing.