A New (C) API For Extensions that Need Runtime-Global Locks

eric.snow · November 2, 2022, 7:47pm

This is split from the PEP 684 (per-interpreter GIL) discussion: PEP 684: A Per-Interpreter GIL - #27 by encukou. I’d normally post this to the capi-sig (which isn’t on DPO quite yet) but I’d like to continue the already-started discussion on the same messaging platform. Sorry if this is noise to any of you.

CC @encukou

This definitely makes sense. It would help with the race on creating a new shared lock, as well as reduce the burden on extension authors. One advantage is that then the runtime can deal with the creation race, store the lock, and clean it up during runtime finalization.

I imagine we’d want something like:

/* Return the ID for the requested lock, allocating if necessary. */
Py_ssize_t PyThread_EnsureGlobalLock(PyModuleDef *mod, const char *name);

/* Return the identified lock, verifying the matching module and lock name. */
PyThread_type_lock PyThread_GetGlobalLock(PyModuleDef *mod, const char *name,
                                          Py_ssize_t *index);

/* Indicate that the lock is no longer needed in the current interpreter. */
void PyThread_ReleaseGlobalLock(PyModuleDef *mod, const char *name,
                                Py_ssize_t *index);

eric.snow · November 2, 2022, 7:48pm

After thinking about it some more, would it make sense to add a general mechanism to store process-global state for an extension but protected/managed by the runtime (in the same way as above)? It actually feels similar to how we do thread-specific storage, so I drew some inspiration from the Py_tss_*() API (see PEP 539):

typedef struct _PyRuntime_store_t *PyRuntime_store_t;

/* Return the named storage key for the module.
   Create the key, if necessary. */
PyRuntime_store_t PyRuntime_store_ensure(PyModuleDef *mod, const char *name);

/* Deallocate the key (and storage), if no longer used
   (e.g. in another interpreter). */
void PyRuntime_store_release(PyRuntime_store_t key);

/* Return 1 if the key has a value set (even if NULL). */
int PyRuntime_store_is_set(PyRuntime_store_t key);

/* Set the given value for the key. */
int PyRuntime_store_set(PyRuntime_store_t key, void *value);

/* Return the value currently set for the key. */
void * PyRuntime_store_get(PyRuntime_store_t key);

/* Clear the stored value for the key. */
void PyRuntime_store_delete(PyRuntime_store_t key);

/* a possible data layout */
struct _PyRuntime_store_t {
    PyModuleDef *mod;
    const char *name;
    Py_ssize_t index;  /* 0 means "not set" */
    Py_ssize_t refcount;
};

A lock-specific API could be a wrapper around all that.

eric.snow · November 2, 2022, 7:51pm

Related: https://discuss.python.org/t/how-to-share-module-state-among-multiple-instances-of-an-extension-module/20663/

steve.dower · November 2, 2022, 7:52pm

How is this different from the extension using global variables and us calling back into it when we’re unloading?

eric.snow · November 2, 2022, 7:55pm

Motivations:

avoid races when creating locks
provide an alternative to static variables
let the runtime manage lifetime (at least for locks)

eric.snow · November 2, 2022, 7:57pm

Honestly, though, whatever we do should be motivated by the needs of extension maintainers as they try to support multiple interpreters (and a per-interpreter GIL). My thoughts above are just me spit-balling.

eric.snow · November 7, 2022, 7:39pm

A possible motivating case: https://github.com/python/cpython/issues/99127#issuecomment-1306095643.

encukou · November 8, 2022, 9:34am

Do we need to provide an alternative to static variables? If you really have process-global state, it’s the one time to use statics, isn’t it?

Also, we already have PyThread_ locking API. It would be nice to reuse it, if we can, so that only allocation & freeing of the lock would be new.
Here’s a sketch of how I’d like to use the API, omitting lots of details for clarity:

static PyThread_type_lock my_lock;
// internally, PyThread_type_lock has a new internal refcount field

mod_init() {
    PyGlogalLock_AllocateOrIncref(my_lock);
    if (lock_was_just_allocated) {
        setup_my_global_state();
    }
}

my_func() {
     PyThread_acquire_lock(my_lock);
     do_my_stuff();
     PyThread_release_lock(my_lock);
}

mod_free() {
    PyGlogalLock_DecrefOrFree(my_lock);
    if (lock_was_just_freed) {
        teardown_my_global_state();
    }
}

Of course it needs more locking (and additional calls) to ensure setup_global_state and teardown_global_state are still protected by the lock. That’s the tricky part, so I’m sending the sketch first, to see if it’s a good direction.
But this hard part is also the main reason we need new API :‍)

steve.dower · November 8, 2022, 1:17pm

I like Petr’s proposal, except I question why they need to be locks provided by CPython if they’re not protecting CPython state?

What’s wrong with using your platform’s locks for protecting platform resources? (If you want a cross-platform synchronization library, you’re not going to choose libpython for that )

encukou · November 8, 2022, 1:43pm

Oh yes you are :‍)
Given how many modules will need this to function safely with multiple GILs, I firmly believe that Python should abstract away the platform differences here.
Python already does most of that work with PyThread_*_lock – except for the need to hold a global lock to allocate another global lock. If we plan to remove Python’s current global interpreter lock, we need a replacement.

malemburg · November 8, 2022, 2:28pm

That is a good question.

Let’s take a step back. Perhaps it’s my wrong understanding of how loading an extension in multiple interpreters would work when interfacing to a C library with pre-process global state. So let’s see how that would work:

interpreter A is the first to load the extension
the dyn linker load the shared lib
A calls the module init function
the init function sets up the global state – in the single interpreter mode, this would normally happen using static C vars to hold that state, e.g. for ODBC, the SQLHENV henv.

So far, so good. Now another interpreter loads the extension:

interpreter B loads the extension
the dyn linker see that the shared lib is already loaded, so points interpeter B to it
B calls the module init function
the init function now has to check whether henv is set, to avoid a second ODBC env init
the extension module continues initializing the module with objects, using henv where necessary

This would work without locks. ODBC is thread safe, so no additional locks are needed for managing ODBC calls.

Now, interpreter A wants to terminate.

A calls the module cleanup function
this cleanup function would need to free the ODBC SQLHENV henv, if it were the last interpreter to use it, but B is still using it

At this point, we have a problem: how can A know that henv is still needed by B ?

In this scenario, locks could be used, but would not necessarily be the best solution (see below).

A better solution would be to have some sort of communication between the interpreters to check whether the ODBC henv is still needed. This could be done by having lock protected variables shared between interpreters (the key-value storage we discussed in the previous topic), or by providing a reference counting feature, where each extensions running in different interpreters can register their on-going use of their shared resources.

Alternatively, the extension could use a static C variable for this and protect it with a lock (using a thread lock as discussed above). This doesn’t seem like a good solution, though, since every single extension would have to go through the same hoops to make this happen.

PS: After writing the above I researched the ODBC SQLAllocHandle API and found that it is possible, in theory, to have multiple environments per process. I’ve never used or tried such a setup and given my experience with ODBC drivers, this kind of setup would likely introduce compatibility issues, so would not recommend it.

eric.snow · November 8, 2022, 4:55pm

Same here.

As Petr noted, without a GIL between them, there may be a race on creating that shared lock. So we need a runtime-global lock that can guard creating new locks.

I suppose there may be other reasons (but I stopped at per-interpreter GIL ).

erlendaasland · November 9, 2022, 12:23pm

How about a pair of new module slots combined with some ref. counting mechanism in for example the import machinery? Py_mod_global_init and Py_mod_global_exit?

encukou · November 10, 2022, 10:47am

That sounds great!
Scratch my idea :‍)

malemburg · November 10, 2022, 11:13am

Could you elaborate on that a bit more. I’m not sure I follow.

Note that extensions may need to manage multiple external resources, not just one as in the case of the ODBC example, so having just a single ref count per loaded extension copy would not be enough.

E.g. let’s say an extension loads a C library and enables a number of extensions in that library. Interpreter A may have just used extension 1, while interpreter B uses extension 2. A would then want to free (just) the resources for extension 1 when terminating, while keeping the main C library globals and extension 2 untouched.

Having a global key value storage for extension module copies to manage their resource state would help a lot and avoid much of the locking logic which would otherwise be needed in the extension.

erlendaasland · November 10, 2022, 12:04pm

It is similar to Petr’s idea (so we should not scratch that), but it simplifies the API for the extension modules by providing convenient PEP 489 module slots.

Pseudo-code without error checking, partly borrowed from PEP 489:

def PyModule_ExecDef(module, def):
    # ...

    exec = None
    g_init = None
    for slot, value in def.m_slots:
        if slot == Py_mod_exec:
            exec = value
        if slot == Py_mod_global_init:
            g_init = value  # In Petr's example, this would be setup_my_global_state()

    if g_init:
        if global_state_refcnt(def) == 0:
            acquire_global_lock()
            g_init(module)
            global_state_incref(def)
            release_global_lock()

    if exec:
        exec(module)

# Called when module is unloaded (or dealloc'd), for example at interpreter shutdown
def unload_module(module, def):
    # ...

    for slot, value in def.m_slots:
        if slot == Py_mod_global_exit:
            g_exit = value  # In Petr's example, this would be teardown_my_global_state()
            if global_state_refcnt(def) == 1:
                acquire_global_lock()
                g_exit(module)
                release_global_lock()
            global_state_decref(def)

The nice thing with such an API, is that the extension modules can get rid of possibly-hard-to-get-right boilerplate locking code; they can simply provide functions for setup and tear-down of global state. The runtime will make sure to call these functions when needed; that responsibility is not on the extension module author.

Unless I’m misreading you, I believe you should be able to solve that using a global lock API and the ordinary Py_mod_exec and m_free/m_clear coupled with the proposed Py_mod_global_ slots.

encukou · November 10, 2022, 12:39pm

Extension modules already have storage for global state – plain static variables. We just need locking.

You’d need your own refcounting for those extensions (an array of refcounts, or a C-level map if the list of possible extensions aren’t known at buildtime).
In Py_mod_global_init you’d initialize that structure and allocate a lock to protect it.
When loading an extension, you’d add it to that structure (or incref), with the lock held.
When unloading an extension (in m_clear at the latest), remove/decref with the lock held.
In Py_mod_global_exit, the structure must be empty. Destroy it and the lock.

I don’t think Python should provide the C-level map (key-value storage).

malemburg · November 10, 2022, 1:07pm

Ok, fair enough. Python takes care of making sure that global init and teardown are protected with the global lock and the modules have to manage their resources in some custom way.

Is it possible to use Python objects for such management ? They’d have to be allocated in the main interpreter, but shared across all interpreters via static C vars in the extensions.

encukou · November 10, 2022, 2:10pm

No.
(Technically some objects can be used across interpreters, but that’s an implementation detail – it would tie you to an exact build of CPython, and you’d need to verify the assumptions with each update, which would be pretty tricky to verify for anything non-trivial. And depending on how per-interpreter allocators end up being implemented, you might not be able to allocate anything in a non-main interpreter, not even a string or a bigger int. I don’t think Python objects would help much given that constraint.)

malemburg · November 10, 2022, 4:13pm

Hmm, so each extension will have to tackle the same problem on it own.

Please consider that each extension that deals more than just a bit with data, will have to:

count the number of times the extension is loaded (to figure out when to finalize)
refcount all shared resources (to figure out when to finalize those)
find its own way to communicate with instances running in other interpreters (to share allocated internal data structures for more efficient use, e.g. loaded models for ML ^[1])
figure out a way to do thread locking in a portable way to protect shared resources (since Python’s thread locking API likely won’t help with this, if I understand correctly – unless you want to halt all loaded interpreters using runtime-global locks)
(possibly more, which I’m not seeing now)

For existing external thread safe C libraries the above will mostly have been figured out in some way or another (e.g. ODBC comes with a complete handle infrastructure for these things), but think about extensions which currently rely on the GIL to protect them against threading issues and implement their logic mostly by themselves.

Those will now each need a complete new stack of APIs to handle the extra locking, sharing and refcounting.

IMO, it would be better and create more following to have support for these things right in the Python C API. This then avoids many subtle bugs you can introduce in such APIs and also is more inviting for extension writers to consider adding support for multiple interpreters.

One of the use cases for having multiple interpreters in one process was that of being able to share already loaded ML models. I don’t remember which company this way, could have been Facebook. ↩︎