A New (C) API For Extensions that Need Runtime-Global Locks

Related: https://discuss.python.org/t/how-to-share-module-state-among-multiple-instances-of-an-extension-module/20663/

How is this different from the extension using global variables and us calling back into it when we’re unloading?

Motivations:

  • avoid races when creating locks
  • provide an alternative to static variables
  • let the runtime manage lifetime (at least for locks)

Honestly, though, whatever we do should be motivated by the needs of extension maintainers as they try to support multiple interpreters (and a per-interpreter GIL). My thoughts above are just me spit-balling. :smile:

3 Likes

A possible motivating case: https://github.com/python/cpython/issues/99127#issuecomment-1306095643.

Do we need to provide an alternative to static variables? If you really have process-global state, it’s the one time to use statics, isn’t it?

Also, we already have PyThread_ locking API. It would be nice to reuse it, if we can, so that only allocation & freeing of the lock would be new.
Here’s a sketch of how I’d like to use the API, omitting lots of details for clarity:

static PyThread_type_lock my_lock;
// internally, PyThread_type_lock has a new internal refcount field

mod_init() {
    PyGlogalLock_AllocateOrIncref(my_lock);
    if (lock_was_just_allocated) {
        setup_my_global_state();
    }
}

my_func() {
     PyThread_acquire_lock(my_lock);
     do_my_stuff();
     PyThread_release_lock(my_lock);
}

mod_free() {
    PyGlogalLock_DecrefOrFree(my_lock);
    if (lock_was_just_freed) {
        teardown_my_global_state();
    }
}

Of course it needs more locking (and additional calls) to ensure setup_global_state and teardown_global_state are still protected by the lock. That’s the tricky part, so I’m sending the sketch first, to see if it’s a good direction.
But this hard part is also the main reason we need new API :‍)

2 Likes

I like Petr’s proposal, except I question why they need to be locks provided by CPython if they’re not protecting CPython state?

What’s wrong with using your platform’s locks for protecting platform resources? (If you want a cross-platform synchronization library, you’re not going to choose libpython for that :wink: )

1 Like

Oh yes you are :‍)
Given how many modules will need this to function safely with multiple GILs, I firmly believe that Python should abstract away the platform differences here.
Python already does most of that work with PyThread_*_lock – except for the need to hold a global lock to allocate another global lock. If we plan to remove Python’s current global interpreter lock, we need a replacement.

3 Likes

That is a good question.

Let’s take a step back. Perhaps it’s my wrong understanding of how loading an extension in multiple interpreters would work when interfacing to a C library with pre-process global state. So let’s see how that would work:

  • interpreter A is the first to load the extension
  • the dyn linker load the shared lib
  • A calls the module init function
  • the init function sets up the global state – in the single interpreter mode, this would normally happen using static C vars to hold that state, e.g. for ODBC, the SQLHENV henv.

So far, so good. Now another interpreter loads the extension:

  • interpreter B loads the extension
  • the dyn linker see that the shared lib is already loaded, so points interpeter B to it
  • B calls the module init function
  • the init function now has to check whether henv is set, to avoid a second ODBC env init
  • the extension module continues initializing the module with objects, using henv where necessary

This would work without locks. ODBC is thread safe, so no additional locks are needed for managing ODBC calls.

Now, interpreter A wants to terminate.

  • A calls the module cleanup function
  • this cleanup function would need to free the ODBC SQLHENV henv, if it were the last interpreter to use it, but B is still using it

At this point, we have a problem: how can A know that henv is still needed by B ?

In this scenario, locks could be used, but would not necessarily be the best solution (see below).

A better solution would be to have some sort of communication between the interpreters to check whether the ODBC henv is still needed. This could be done by having lock protected variables shared between interpreters (the key-value storage we discussed in the previous topic), or by providing a reference counting feature, where each extensions running in different interpreters can register their on-going use of their shared resources.

Alternatively, the extension could use a static C variable for this and protect it with a lock (using a thread lock as discussed above). This doesn’t seem like a good solution, though, since every single extension would have to go through the same hoops to make this happen.

PS: After writing the above I researched the ODBC SQLAllocHandle API and found that it is possible, in theory, to have multiple environments per process. I’ve never used or tried such a setup and given my experience with ODBC drivers, this kind of setup would likely introduce compatibility issues, so would not recommend it.

2 Likes

Same here.

As Petr noted, without a GIL between them, there may be a race on creating that shared lock. So we need a runtime-global lock that can guard creating new locks.

I suppose there may be other reasons (but I stopped at per-interpreter GIL :smile:).

How about a pair of new module slots combined with some ref. counting mechanism in for example the import machinery? Py_mod_global_init and Py_mod_global_exit?

1 Like

That sounds great!
Scratch my idea :‍)

Could you elaborate on that a bit more. I’m not sure I follow.

Note that extensions may need to manage multiple external resources, not just one as in the case of the ODBC example, so having just a single ref count per loaded extension copy would not be enough.

E.g. let’s say an extension loads a C library and enables a number of extensions in that library. Interpreter A may have just used extension 1, while interpreter B uses extension 2. A would then want to free (just) the resources for extension 1 when terminating, while keeping the main C library globals and extension 2 untouched.

Having a global key value storage for extension module copies to manage their resource state would help a lot and avoid much of the locking logic which would otherwise be needed in the extension.

It is similar to Petr’s idea (so we should not scratch that), but it simplifies the API for the extension modules by providing convenient PEP 489 module slots.

Pseudo-code without error checking, partly borrowed from PEP 489:

def PyModule_ExecDef(module, def):
    # ...

    exec = None
    g_init = None
    for slot, value in def.m_slots:
        if slot == Py_mod_exec:
            exec = value
        if slot == Py_mod_global_init:
            g_init = value  # In Petr's example, this would be setup_my_global_state()

    if g_init:
        if global_state_refcnt(def) == 0:
            acquire_global_lock()
            g_init(module)
            global_state_incref(def)
            release_global_lock()

    if exec:
        exec(module)

# Called when module is unloaded (or dealloc'd), for example at interpreter shutdown
def unload_module(module, def):
    # ...

    for slot, value in def.m_slots:
        if slot == Py_mod_global_exit:
            g_exit = value  # In Petr's example, this would be teardown_my_global_state()
            if global_state_refcnt(def) == 1:
                acquire_global_lock()
                g_exit(module)
                release_global_lock()
            global_state_decref(def)

The nice thing with such an API, is that the extension modules can get rid of possibly-hard-to-get-right boilerplate locking code; they can simply provide functions for setup and tear-down of global state. The runtime will make sure to call these functions when needed; that responsibility is not on the extension module author.

Unless I’m misreading you, I believe you should be able to solve that using a global lock API and the ordinary Py_mod_exec and m_free/m_clear coupled with the proposed Py_mod_global_ slots.

1 Like

Extension modules already have storage for global state – plain static variables. We just need locking.

You’d need your own refcounting for those extensions (an array of refcounts, or a C-level map if the list of possible extensions aren’t known at buildtime).
In Py_mod_global_init you’d initialize that structure and allocate a lock to protect it.
When loading an extension, you’d add it to that structure (or incref), with the lock held.
When unloading an extension (in m_clear at the latest), remove/decref with the lock held.
In Py_mod_global_exit, the structure must be empty. Destroy it and the lock.

I don’t think Python should provide the C-level map (key-value storage).

2 Likes

Ok, fair enough. Python takes care of making sure that global init and teardown are protected with the global lock and the modules have to manage their resources in some custom way.

Is it possible to use Python objects for such management ? They’d have to be allocated in the main interpreter, but shared across all interpreters via static C vars in the extensions.

No.
(Technically some objects can be used across interpreters, but that’s an implementation detail – it would tie you to an exact build of CPython, and you’d need to verify the assumptions with each update, which would be pretty tricky to verify for anything non-trivial. And depending on how per-interpreter allocators end up being implemented, you might not be able to allocate anything in a non-main interpreter, not even a string or a bigger int. I don’t think Python objects would help much given that constraint.)

1 Like

Hmm, so each extension will have to tackle the same problem on it own.

Please consider that each extension that deals more than just a bit with data, will have to:

  • count the number of times the extension is loaded (to figure out when to finalize)
  • refcount all shared resources (to figure out when to finalize those)
  • find its own way to communicate with instances running in other interpreters (to share allocated internal data structures for more efficient use, e.g. loaded models for ML [1])
  • figure out a way to do thread locking in a portable way to protect shared resources (since Python’s thread locking API likely won’t help with this, if I understand correctly – unless you want to halt all loaded interpreters using runtime-global locks)
  • (possibly more, which I’m not seeing now)

For existing external thread safe C libraries the above will mostly have been figured out in some way or another (e.g. ODBC comes with a complete handle infrastructure for these things), but think about extensions which currently rely on the GIL to protect them against threading issues and implement their logic mostly by themselves.

Those will now each need a complete new stack of APIs to handle the extra locking, sharing and refcounting.

IMO, it would be better and create more following to have support for these things right in the Python C API. This then avoids many subtle bugs you can introduce in such APIs and also is more inviting for extension writers to consider adding support for multiple interpreters.


  1. One of the use cases for having multiple interpreters in one process was that of being able to share already loaded ML models. I don’t remember which company this way, could have been Facebook. ↩︎

1 Like

If you’re hinting at a world where multiple interpreters are loaded into a single process using
different allocators, I’m pretty sure the whole idea is doomed to fail :frowning:

This whole multi-interpreter thing is already complex enough.

  • count the number of times the extension is loaded (to figure out when to finalize)

As I understand Erlend’s proposal, Py_mod_global_init will only be called for the first module object to be initialized, and Py_mod_global_exit will be called after the last one is freed.
(There’s Py_mod_exec and m_free that are called for all modules, those would have the global state set up.)

  • refcount all shared resources (to figure out when to finalize those)

Yes.

  • find its own way to communicate with instances running in other interpreters (to share allocated internal data structures for more efficient use, e.g. loaded models for ML [1]

Yes, but it can use static variables protected by a lock (see below).

  • figure out a way to do thread locking in a portable way to protect shared resources (since Python’s thread locking API likely won’t help with this, if I understand correctly – unless you want to halt all loaded interpreters using runtime-global locks)

Python’s thread locking should definitely help here.
IMO, best practice would be to allocate a module-specific static lock in Py_mod_global_init, and free it in Py_mod_global_exit. You can even use several locks for more granularity.

Why?
If we don’t do that, we might need a global allocator lock – which sounds worse than the GIL, performance-wise.

2 Likes