How to share module state among multiple instances of an extension module?

Hi Eric, I don’t know if this has been mentioned before, but there’s an important case you will need to address in both the multiple interpreters per process and per interpreter GIL setup:

External C libraries often use global structures and corresponding initialization procedures which can only be run once per process. An extension module which want to use wrap such a library will have to know in which copy of the interpreter to run this initialization (i.e. similar to the main thread) and it’ll have to share the resulting pointers to the initialized structures between the processes.

A good example is the ODBC C API. When interfacing to an ODBC manager or driver, you have to initialize an environment struct once per process. This is then used to create connections and cursors. ODBC managers and drivers are thread safe, so the GIL is not needed to protect those, but the GIL may still be needed by extensions, since they not be re-entrant or they may use internal structures or caches which would require locks.

  • Are there APIs to allow for such sharing of structs and pointers between interpreters ?
  • Is it possible to determine whether an extension is running in the main interpreter or a later copy ?

Thanks.

PyMem_SetAllocator() can be called while Python is running: tracemalloc.start() uses that. The PyStatus API is only to initialize Python. Currently, PyMem_SetAllocator() cannot fail. It’s the responsability of the caller to only add a hook but still call the same allocator. It’s only safe to replace the allocators before Python initialization: Python Initialization Configuration — Python 3.13.0a2 documentation

PyMem_SetAllocator() can be called after Py_PreInitialize() and before Py_InitializeFromConfig() to install a custom memory allocator. It can be called before Py_PreInitialize() if PyPreConfig.allocator is set to PYMEM_ALLOCATOR_NOT_SET.

Sadly, despite the two-level quoting I lost track of what “that” refers to in this sentence. Can you clarify?

Thanks for pointing this out, Marc-André. This is indeed something we’ve considered at length. PEP 489, 630, etc. are effective for isolating modules, but only for their own state. However, we don’t provide much accommodation for modules that depend on the state of linked libraries.

Note that, generally, this is not a new problem. For example, it affected the cryptography package 7 years ago.

Interpreter-isolated (multi-phase init) extensions already have to manage isolation relative to their library dependencies, including data that is initialized only once per process. Per-interpreter GIL adds an extra consideration. Until the module is managing that isolation, it should not indicate that it is compatible with use with multiple interpreters (via multi-phase init), much less a per-interpreter GIL.

Yeah, an extension must sort out thread safety relative to its linked dependencies before it claims to support use under a per-interpreter GIL. It should be a smaller change than the isolation that’s needed for multiple interpreter support, but maybe a little trickier (depending on the linked library).

We don’t have anything like that currently. It sounds really useful.

It isn’t part of this PEP though. If such utilities meant we could be confident that multi-phase init extensions were (sufficiently) compatible with per-interpreter GIL, then I’d probably pursue that. However, I’m not sure any number of helpers will lead us to reach that confidence.

With the public API (but not limited API):

    if (PyInterpreterState_Get() == PyInterpreterState_Main()) {
        ...
    }

What would you use that for?

“make 3.12 safe, then remove the limitations.”

The idea is to have the extension only initialize the external C library in the main interpreter and then pass around the global state (pointer or handle) to the other interpreters via the sharing APIs and a shared key-value store managed by the main interpreter.

Thinking about this some more: it’s not necessarily the case that the main interpreter will be the first to import the C extension, making things even more complex. E.g. what happens if the sub-interpreter, which owns the global C lib struct, terminates, while the other interpreters happily continue to use it ?

The C extension would probably have to prevent imports in sub-interpreters to avoid such situations. Similar to what you do with threads that want to install signal handlers.

I see two main ways:

  • Simple but still safe: Make the module fail to import more than once per process. There’s a simple recipe in the HOWTO, but it’ll need an additional lock to work with multiple GILs.
  • Use lock-protected process-global state, with its own reference counting to ensure it’s cleaned up when no longer needed. Don’t treat one interpreter specially.

If you have shared state that needs to be correctly cleaned up when everyone’s done with it, you should do one of these now – even with a single GIL.

(Next thing to think about: how do you safely share per-process state with other people’s modules, and perhaps even non-Python wrappers? But to solve that you’d probably need improvements in the library you’re wrapping.)


@moderators, the thread starting with Mark-André’s first message could be extracted into its own topic.

You want a separate thread on how to share module state among multiple instances?

Yes. It’s a good discussion to have, I’d love to continue, but it drowns out the PEP issues like global memory allocators.
The PEP now explicitly leaves this topic for later, and it’s about subinterpreters in general rather than multiple GILs.

FYI, I started a thread to talk about sharing global locks: https://discuss.python.org/t/a-new-c-api-for-extensions-that-need-runtime-global-locks/20668/.

This came up in gh-99127.