Changing the PyCapsule API to better support versions

Hi everyone!

The current PyCapsule API is a bit lacking, especially in terms of versioning an ABI. I’ve worked with @encukou on developing a draft PEP on this topic. It mainly addresses these three problems:

  • Capsules, in general, have no proper way to version themselves.
  • PyCapsule_Import has a bug when it comes to submodules – but technically, fixing it would be a breaking change.
  • Nowadays, many modules are not immortal, so if a capsule needs to reference it’s own module, it needs a destructor. This isn’t necesarilly a problem, just an inconvenience – this PEP makes it automatic.

Here’s what we came up with (apologies for any formatting errors between the conversion of reST to Markdown):

Abstract

CPython capsules are typically only known to those that are familiar with the C API – in short, they smuggle a void * between Python code and C code. This makes them the preferred way for a library to expose an ABI, since the Python packaging ecosystem does not allow you to link extension modules against each other (see why this is the case here).

However, the current capsule API doesn’t have any standard way to manage ABI versions. This PEP takes some inspiration from SemVer, and solves that problem by modifying the internal PyCapsule structure to hold an ABI major version, as well as a reference to its module, and the size of the ABI structure.

Motivation

Generally, a capsule contains a struct or array containing function pointers, which are then unpacked by a header file (which is installed by pip or something similar – this is conventionally stored in $DATAROOT/include) that defines ABI functions. For example, an ABI could be exported by defining a structure that looked like the following:

typedef struct {
    PyObject *(foo *)(int); // PyObject *foo(int);
} FooABI;

Then, a capsule would hold a pointer to this structure, and the header file could define foo as such:

static PyObject *
foo(int whatever) {
    FooABI *foo_abi = PyCapsule_Import("foo.foo_abi", 0);
    // Throughout this PEP, caching is omitted for simplicity
    if (foo_abi == NULL) {
        return NULL;
    }
    return foo_abi->foo(whatever);
}

However, the header file and the capsule at runtime may be written for different versions, causing some ambiguous segfaults if not handled properly. Capsules have no “proper” way to specify a version, so it’s up to the developer to decide how to do it. For example, that could be a version field containing an integer (but again, this is totally unspecified). In the FooABI example from above, this could look like:

#define MAJOR_VERSION 2

// In the real world, this would be in a header file - ignore that for simplicity purposes
typedef struct {
    int32_t major_version;
    PyObject *(foo *)(int); // PyObject *foo(int);
} FooABI;

static PyObject *
foo(int whatever) {
    FooABI *foo_abi = PyCapsule_Import("foo.foo_abi", 0);
    if (foo_abi == NULL) {
        return NULL;
    }

    if (foo_abi->major_version != MAJOR_VERSION)
    {
        PyErr_SetString(PyExc_RuntimeError, "major version does not match");
        return NULL;
    }

    return foo_abi->foo(whatever);
}

This is quite a bit of boilerplate just for adding a very basic versioning system. Note that in any approach, this does not cover using the ABI from an FFI, such as ctypes, which has to know how to handle the version. Many ABIs choose to be completely frozen because of this lack of support, which isn’t exactly an ideal standard. It would be much easier – and much less error-prone – if a capsule did all of this automatically.

Incidentally, this PEP can also solve two problems regarding capsules:

  • More and more modules are not immortal singletons – assuming data is stored on the module and not on the capsule, capsules need to hold a reference to their module if they use it in any way. So, nearly all modern capsules need a PyCapsule_Destructor just for that. Ideally, this could be done automatically.
  • If the C extension is a submodule of the package, and importing the package doesn’t also import the submodule, then PyCapsule_Import will fail.

For example, if the ABI function foo() from earlier wanted to access foo.bar in the package, it would need to add an additional PyCapsule_Destructor to its capsule, just for Py_DECREF-ing the module.

static void
decref_module(PyObject *capsule)
{
    FooABI *abi = PyCapsule_GetPointer(/* ... */)
    Py_DECREF(abi->module);
}

Regarding version problems, some changes to the capsule API are needed for backwards compatibility reasons. By adding a version API into capsules directly, frozen ABIs could safely “unlock” themselves – otherwise, they’re going to remain frozen, since they cannot add a version field. This could have happened simply because the developer forgot to, such as with the FooABI structure from earlier – it would be impossible for FooABI users to check if a function is supported on their version safely.

A dedicated function for importing ABI versions will also fix the problems with FFIs. For example, using an ABI from ctypes would be much safer and easier – the developer would not have to remember to check the version, nor reverse engineer the ABI’s version management to do that.

Rationale

This PEP addresses these problems by adding fields for each of the following to the PyCapsule structure directly:

  • Py_ssize_t size, representing the size of the ABI structure (e.g., in the example above, this would be sizeof(FooABI)).
  • int32_t major_version, containing the major version number of the ABI (this is unspecified in FooABI from above). Note that this is in relation to SemVer, but more on that in a moment.
  • PyObject *module, holding a strong reference to the module where the ABI was created.

This PEP has some ideas inspired by SemVer, namely:

  • A major version bump is a breaking change
  • Minor versions are for additions (note that in this specification, size is used in place of a minor version number).

Note that these are the only rules that ABI developers need to follow from SemVer – they don’t need to follow to entire specification. Micro/patch versions from SemVer are disregarded in this context, as they don’t affect ABI compatibility (e.g., FooABI V1.0.0 is more or less equivalent to FooABI V1.0.1 here).

Sizes are used instead of minor versions as a clever way to make things simpler – since ABI developers are expected to follow SemVer (or really, this PEP’s version of SemVer), a size does all of the following:

  • It prevents any odd errors related to minor version checking – users don’t have to check against arbitrary numbers for version checking (e.g., if (foo_abi->minor_version >= 14) /* Do something with foo */). Instead, the user can do if (PyCapsule_GetSize(capsule) >= offsetof(foo_abi, foo)), which is much less error-prone (and easier, considering the user doesn’t have to check whatever documentation to figure out when foo was added).
  • Technically speaking, it automatically bumps the “minor version” every time something is added. Since any change will automatically update the size, the ABI developer doesn’t have to remember to change it or document when something was added (although, they probably should, they just don’t have to).
  • Makes it clearer to an ABI developer that following SemVer for minor versions is a must, since changing the structure could risk memory errors if the header definition doesn’t match the runtime version. For example, if foo() were to be removed from FooABI from earlier, the existing code that thinks it has a foo field would access out-of-bounds memory at runtime. By adding an explicit sizeof(), it may be more intuitive to an ABI developer to not mess with it between versions. (Although, this is C we’re talking about – it’s difficult to foolproof things).

Technically, instead of a major version number being kept as an integer, it could probably be sufficient to just keep an ABI version in a capsule name (e.g., foo.foo_abi_v1 would always return the ABI at version one). In fact, this is the recommended option for backporting this PEP. However, a “stringly typed” API is generally less convenient (in the sense that, if you have a structure to your data, you should keep it structured), so this PEP adds a new module slot instead of something like overriding the module-level __getattr__.

With that approach, an entirely new ABI would be needed to make breaking changes – this PEP allows you to update the ABI itself depending on the major version number. For example:

static PyObject *
capsule_getter(PyObject *self, const char *qualname, int32_t major_version) {
    if (strcmp(qualname, "foo.foo_abi") {
        PyErr_SetString(PyExc_RuntimeError, "unknown abi");
        return NULL;
    }

    FooABI *foo_abi = malloc(sizeof(FooABI));
    /* Initialize ABI */

    if (major_version < 2) foo_abi->foo = old_foo;
    return PyCapsule_NewVersioned(foo_abi, /* Other arguments */);
}

This wouldn’t be possible with the string approach – you would have to make a brand new ABI just to change a field.

Specification

Additions to the API

Changes to PyCapsule

This PEP proposes the following new fields to the internal PyCapsule structure:

typedef struct {
    /* ... */
    PyObject *module; // NULL by default
    Py_ssize_t size; // 0 by default
    int32_t major_version; // 0 by default
} PyCapsule;

Addition to ctypes

This PEP adds a PyABI class to ctypes:

class PyABI(ctypes.Structure):
    _capsule_: types.CapsuleType
    _capsule_size_: ctypes.c_ssize_t

    @classmethod
    def from_capsule(
        cls,
        capsule_or_module:  str | types.ModuleType,
        capsule_name: str | None = None,
        major_version: int = 0,
        min_size: int = 0,
    ) -> typing.Self: ...

    def __init_subclass__(
        cls,
        size_field: str | None = None,
        default_size: int | ctypes.c_ssize_t = 0
    ) -> None: ...

However, it’s difficult to understand PyABI without understanding the C API additions to this PEP, so more on that later.

Additions to the C API

This PEP introduces the following additions to the C API:

PyObject *PyCapsule_NewVersioned(void *pointer, const char *capsule_name, PyCapsule_Destructor destructor, PyObject *module, int32_t major_version, Py_ssize_t size);
Py_ssize_t PyCapsule_GetSize(PyObject *capsule);
int PyCapsule_GetModule(PyObject *capsule, PyObject **module);
int32_t PyCapsule_GetMajorVersion(PyObject *capsule);
PyObject *PyCapsule_GetFromModule(PyObject *module, const char *capsule_name, int32_t version, Py_ssize_t min_size);
PyObject *PyCapsule_ImportVersioned(const char *capsule_name, int32_t version, Py_ssize_t min_size);
int PyCapsule_IsValidWithVersion(PyObject *op, const char *name, PyObject *module, int32_t major_version, Py_ssize_t min_size);

// New module slot
#define Py_mod_capsule 5

This PEP also deprecates PyCapsule_Import (as a soft deprecation) – more on that later. Regardless, there’s a lot to take in here – let’s go through each section.

New ABIs

On 3.14+, capsules intended to expose an ABI should use PyCapsule_NewVersioned instead of PyCapsule_New. NewVersioned will set the new module, size, and major_version, and New will initialize each of them to empty values (NULL and 0). For example, defining an ABI would look like:

FooABI abi = {
    /* ABI Fields */
};

PyMODINIT_FUNC PyInit_foo()
{
    PyObject *m = PyModule_Create(/* ... */);
    PyObject *capsule = PyCapsule_NewVersioned(
        abi, // ABI Pointer
        "foo_abi", // Capsule name
        NULL, // Capsule destructor
        m, // Reference to module
        1, // Major version number
        sizeof(FooABI) // Size of the structure
    );
    /* Initialize attributes */
    return m;
}

For backwards compatibility purposes, a size and major version of 0 denote that the capsule is the first version. PyCapsule_NewVersioned will allow explicitly setting both of these values to 0 to let CPython know that the ABI is compatible with the legacy version. Note that PyCapsule_NewVersioned will raise a ValueError if the size or major version number is negative, to reserve negative values for error indicators in the getter functions.

ABI Sizes

GetSize is the getter for the size of the underlying ABI structure. So, in the previous example, the size field in PyCapsule would be sizeof(FooABI), so GetSize would return that. Since PyCapsule_New implicitly sets the size field to zero, GetSize returns 0 on capsules created with it.

For example:

void
downstream_user_function()
{
    PyObject *foo_capsule = PyCapsule_ImportVersioned(/* ... */);
    FooABI *foo_abi = PyCapsule_GetPointer(foo_capsule, "foo_abi");
    // Skip error checking for simplicity

    if (PyCapsule_GetSize(foo_capsule) >= offsetof(foo_abi, foo))
    {
        // foo() is available on this version
        foo_abi->foo(42);
    } else {
        // foo() is not available! Use a fallback or throw an error!
        /* ... */
    }
}

Version and Module Getters

PyCapsule_GetModule is the interface for getting a module object out of the capsule. Since this might be NULL, this function returns an integer denoting the result of the call (-1, 0, or 1) and takes a pointer to an output parameter to store the module (PyObject **).

If this function fails, the pointer is set to NULL, and returns -1. If it succeeds, but the capsule does not have a module set, then this function returns 0 with the output pointer set to NULL. Otherwise, this returns 1 and sets the pointer to a strong reference to the module.

PyCapsule_GetMajorVersion, on the other hand, returns the major version number passed to NewVersioned, 0 if it was initialized the old way, and -1 if an error occurred. For both of these functions, if the object passed was not a capsule, a TypeError is raised.

It’s up to the developer to decide how to deal with backwards compatibility with these functions – ideally, they will just do it the old way, and drop support for it as they decide to support new versions. For example, if foo() from earlier wanted to access bar on the module:

PyObject *
foo_impl(int whatever)
{
    PyObject *capsule = PyCapsule_ImportVersioned("foo.foo_abi", MAJOR_VERSION, sizeof(FooABI)); // See specification below
    PyObject *module;
    if (PyCapsule_GetModule(capsule, &module) < 0)
        return NULL;

    if (module == NULL)
    {
        /* ABI was initialized with the legacy PyCapsule_New, we'll just throw an error */
        PyErr_SetString(PyExc_RuntimeError, "your python version is too low!");
        return NULL;
        // Note that in a real-world scenario, the ABI would have some fallback module field
    }

    PyObject *attr = PyObject_GetAttrString(module, "bar");
    if (attr == NULL)
        return NULL;
    /* Rest of foo() */
}

Import Utilities

PyCapsule_GetFromModule takes the following:

  • Any module object passed to PyCapsule_NewVersioned, or a module that contains a Py_mod_capsule module slot.
  • Qualified capsule name. This may be NULL, per the current capsule API.
  • Major version number.
  • Minimum size of the ABI structure.

GetFromModule will raise a RuntimeError if the major version or module on the returned capsule does not match what was passed.

Now, with PyCapsule_GetFromModule, the ABI function for foo from earlier would look like:

#define MAJOR_VERSION 2

static PyObject *
foo(int whatever) {
    PyObject *foo_mod = PyImport_ImportModule("foo", 0);
    if (foo_mod == NULL)
        return NULL;

    // This will either call the Py_mod_capsule slot, if it's defined, or
    // try and access the foo_abi attribute on the module.
    PyObject *capsule = PyCapsule_GetFromModule(
        foo_mod,
        "foo.foo_abi"
        MAJOR_VERSION,
        sizeof(FooABI)
    )
    Py_DECREF(foo_mod);
    if (capsule == NULL)
        return NULL;

    FooABI *foo_abi = PyCapsule_GetPointer(capsule, "foo.foo_abi");
    // We can't DECREF the capsule until after foo() has been called, since
    // technically, we don't have a real reference to it, meaning
    // that it can be deallocated upon the call of Py_DECREF
    if (foo_abi == NULL) {
        Py_DECREF(capsule);
        return NULL;
    }

    PyObject *result = foo_abi->foo(whatever);
    Py_DECREF(capsule);
    return result;
}

The main upside of this function is that now, if you were to use this ABI from outside the header file – such as through an FFI – then it would be trivial to implement. Example with ctypes:

from ctypes import pythonapi, Structure, POINTER
import foo

# Initialize argtypes and restype for GetFromModule and GetPointer
# ...

class FooABI(Structure):
    _fields_ = [
        ("foo", ctypes.PYFUNCTYPE(ctypes.py_object, ctypes.c_int))
    ]

capsule = pythonapi.PyCapsule_GetFromModule(
    foo,
    "foo_abi",
    2,
    0
)
abi = ctypes.cast(
    pythonapi.PyCapsule_GetPointer(capsule, "foo.foo_abi"),
    POINTER(FooABI)
)

PyCapsule_ImportVersioned is a utility for making this process shorter. Unlike PyCapsule_Import, ImportVersioned returns the capsule object, not the underlying ABI, since the user needs to hold a reference to the capsule. It takes all the same parameters as GetFromModule (with the exception of no module parameter, and the capsule_name should be a qualified name including the module), and should be preferred over GetFromModule where possible.

Now, the foo function from earlier looks like:

#define MAJOR_VERSION 2

static PyObject *
foo(int whatever) {
    PyObject *capsule = PyCapsule_ImportVersioned(
        "foo.foo_abi",
        MAJOR_VERSION,
        sizeof(FooABI)
    )
    if (capsule == NULL)
        return NULL;

    FooABI *foo_abi = PyCapsule_GetPointer(capsule, "foo.foo_abi");
    if (foo_abi == NULL) {
        Py_DECREF(capsule);
        return NULL;
    }

    PyObject *result = foo_abi->foo(whatever);
    Py_DECREF(capsule);
    return result;
}

Changes to the current capsule import implementation

Note that as stated earlier, an edge case exists in the current implementation of PyCapsule_Import. Since fixing PyCapsule_Import would technically be a breaking change, ImportVersioned is the fixed alternative.

Due to this problem, this PEP will deprecate PyCapsule_Import("foo.foo_abi", 0) in favor of PyCapsule_ImportVersioned("foo.foo_abi", 0, 0). Although, this is a soft deprecation, as stated earlier – there will be no actual plan for removal in the future, just new use of PyCapsule_Import will be discouraged.

With that being said, it’s important to make some changes to PyCapsule_Import to allow a proper migration path for existing ABIs. If a module imported by PyCapsule_Import has a Py_mod_capsule slot, as shown below, then Import will call it with a major version of 0 and the qualified name instead of attempting to access the attribute of the module. Note that, unlike the new capsule import API, if the major version was 0 and a different major version is on the returned capsule, this will not raise an error. This may seem a bit counterintuitive and error-prone – but it’s important for backwards compatibility.

This means that if foo.foo_abi_v1 were to be passed to PyCapsule_Import, then the call to the module slot would have:

  • The imported module object for foo.
  • foo.foo_abi_v1 as the qualified name.
  • Major version of 0.

We’re getting a bit ahead of ourselves, though. More on this in the backwards compatibility section.

Capsule Validation

Similar to the existing PyCapsule_IsValid function, this PEP introduces a PyCapsule_IsValidWithVersion function. Once again similar to the other additions to the C API in this PEP, this function takes a capsule object, the capsule name, a reference to the module (which may be NULL, if it was initialized the old way), a major version, and a minimum size. IsValidWithVersion acts as an extension of PIsValid (i.e., it makes the same checks, with some additional ones), so IsValidWithVersion does everything that IsValid does, with all of the following:

  • Ensuring the module passed and the module on the capsule are equivalent (including when they are both NULL).
  • Checking if the major_version matches that on the capsule.
  • Making sure the size on the capsule is greater than or equal to the minimum size passed.

Once again, this function can be used to check compatibility with the legacy ABI by passing NULL, 0, and 0 for the module, major version, and size arguments. Note that the existing PyCapsule_IsValid will not check these values by default, instead it will just ignore them. To stay similar to IsValid, IsValidWithVersion returns 1 indicating true, and 0 indicating false.

This function is useful if you got a capsule through some other means, and want to make sure it’s the one you want. For example:

    int
some_user_function(PyObject *capsule)
{
    // Ensure we have the legacy ABI
    if (!PyCapsule_IsValidWithVersion(capsule, "foo.foo_abi", NULL, 0, 0)) {
        PyErr_SetString(PyExc_RuntimeError, "expected legacy ABI");
        return -1;
    }
    /* ... */

    return 0;
}

New module slot

This PEP adds a Py_mod_capsule slot, which will be paired with a PyObject *(capsule_getter *)(PyObject *module, const char *qualified_name, int32_t version) – this will be the preferred alternative over adding the capsule to an attribute.

PyCapsule_* functions (including the existing ones) will choose to use the new slot if the module defines it, otherwise, it will do it the old way. For example, with foo previously, it would now expose an ABI as such:

const FooABI foo_abi_v1 = {
    /* ... */
};

static PyObject *
capsule_getter(PyObject *self, const char *qualname, int32_t major_version) {
    return PyCapsule_NewABI(
        foo_abi_v1,
        "foo.foo_abi",
        NULL,
        self,
        1,
        sizeof(FooABI)
    );
}

static PyModuleDef_Slot ModuleSlots[] = {
    {Py_mod_capsule, capsule_getter},
    {0, NULL}
};

static struct PyModuleDef FooModule = {
    PyModuleDef_HEAD_INIT,
    "foo",
    /* Other fields */
    .m_slots = ModuleSlots
};

A module may not define multiple Py_mod_capsule slots.

ctypes ABI Support

As mentioned earlier, this PEP introduces a new ctypes.PyABI class. Also mentioned earlier, this class behaves like, and inherits from, ctypes.Structure. Users should set the _fields_ class attribute to define an ABI structure, just like a Structure. PyABI also comes with two instance attributes:

  • _capsule_, which is a reference to the capsule object itself. This is of type types.CapsuleType.
  • _capsule_size_, which is a ctypes.c_ssize_t containing the size of the ABI structure, if it’s known.

This class also takes two subclass arguments:

  • size_field, which is a string containing the name of the field that provides the size of the ABI structure. For example, if this was size, then PyABI would use the value of the size field at runtime to set the _capsule_size_ attribute. By default, this is None, in which case the value of _capsule_size_ is deferred to the default_size parameter below.
  • default_size is pretty self-explanatory – it’s the default value of _capsule_size_ if the size on the capsule object is zero (due to it being explicitly set that way or created with a legacy API).

If both of these arguments are passed with non-default values (e.g., class FooABI(ctypes.PyABI, size_field="size", default_size=42)), a ValueError is raised, since these parameters counteract each other.

Now, since PyABI inherits from Structure, it can be instantiated the same way – but the user probably doesn’t want to do that. Instead, they want to use from_capsule, which will take a capsule object and map it to the structure automatically, as well as doing the proper version checking. from_capsule has three parameters:

  • A module object, or a string containing a fully qualified location of the capsule. Depending on the value of this parameter, ctypes will choose which capsule import function to use.
  • The capsule name itself (this has been foo.foo_abi in the examples throughout this PEP). This is None by default, in which case NULL is passed as the name.
  • The major version number of the ABI. This is 0 by default.
  • The minimum size of the ABI. This is also 0 by default, meaning that the check is technically skipped (since an ABI size cannot be less than zero).

As mentioned, the mechanism to import the capsule object is dependent on the first parameter. If the argument is a module, then PyCapsule_GetFromModule is used to get the capsule object – otherwise PyCapsule_ImportVersioned is used.

Finally, the last quirk of PyABI, is that if _capsule_size_ is set, it checks that the size is greater than the offset of a value upon attribute lookup. If not, a RuntimeError is raised.

For example, to use FooABI from ctypes would look like:

import ctypes
import foo

class FooABI(ctypes.PyABI):
  _fields_ = [
      ("foo", ctypes.PYFUNCTYPE(ctypes.py_object, ctypes.c_int)
  ]

abi = FooABI.from_capsule(foo, "foo.foo_abi", major_version=2)
abi.foo(42)  # Will ensure that size > offsetof(foo)

Backwards Compatibility

Capsule Imports

PyCapsule_New initializes both the size and major_version to zero – this means that it’s the first version of the ABI. As mentioned earlier, PyCapsule_NewVersioned can also explicitly set these values to zero to denote that it’s compatible with the first version.

Likewise, PyCapsule_Import will call the Py_mod_capsule slot with a major version of 0. This retains C API compatibility – if an ABI were to use PyCapsule_New and its users were to use PyCapsule_Import forever, nothing would change. However, if this ABI wanted to make changes someday, it could then bump the major version for new changes, and users that explicitly ask for the new version will receive it.

Sizes

In the rejected ideas section later, a standard Py_ssize_t size at the beginning of an actual ABI structure was part of this specification. This was decided against, but it can still be used as a replacement for the capsule-level size in the meantime. If an ABI does not provide a way to access the size, then they lack that backport mechanism for this PEP – users will just have to ignore size/minor version checking.

For example, if FooABI from the beginning of this PEP wanted to provide a size on their ABI, it could add the field like so:

typedef struct {
    PyObject *(foo *)(int); // PyObject *foo(int);
    Py_ssize_t size;
} FooABI;

const FooABI foo_abi = {
    foo,
    sizeof(FooABI)
}

Now, a downstream user could use it like such:

#if Py_MINOR_VERSION < 14
#define SIZE(abi) abi->size
#else
// Technically, PyCapsule_GetSize could return
// an error, but we're treating it like a
// field - ignore that.
#define SIZE(abi) PyCapsule_GetSize(abi)
#endif

static int
user_function() {
    // We should use PyCapsule_Import here, since
    // it's backwards compatible and the major version
    // for foo_abi is zero
    FooABI *foo_abi = PyCapsule_Import("foo.foo_abi", 0);
    if (foo_abi == NULL) {
        return -1;
    }

    if (offsetof(FooABI, foo) <= SIZE(foo_abi)) {
        foo_abi->foo(42);
    } else {
        // Use some fallback
    }
}

Major Versions

On Python versions <3.14, an ABI can store the major version in the capsule name itself (e.g., foo_abi_v2), and then use PyCapsule_Import to import it. Going with the previous example, that would be PyCapsule_Import("foo.foo_abi_v2", 0). Then, on newer versions, a Py_mod_capsule slot could return the v2 ABI when foo_abi_v2 is requested. This is the purpose of the special case when returning different major versions in PyCapsule_Import – existing code that imports the ABI the old way will still work, assuming the library handles it in their Py_mod_capsule slot. For example, the capsule getter for the previous example could look like:

static PyObject *
capsule_getter(PyObject *self, const char *qualname, int32_t major_version)
{
    FooABI *foo_abi = malloc(sizeof(FooABI));
    /* Initialize fields... */

    if (major_version == 2 || !strcmp(qualname, "foo.foo_abi_v2"))
    {
        foo_abi->foo = new_foo;
    }

    return PyCapsule_NewVersioned(/* ... */);
}
2 Likes

I really don’t see at all how this is so much better than the foo_abi_v2 convention to be worth all the effort. It seems you can quite easily implement all the helpers you propose around a fixed set of names, and there are plenty of precedents for unversioned interfaces that use entirely new names when the major version changes.

Could you elaborate more on what this makes possible that is currently not possible?

Alternatively, what convenience does this add for the final end user - that is, not the author of the helpers/wrappers, but the actual person who is depending on two modules being able to negotiate to share information - that is not available when using distinct names?

1 Like

Thanks for the feedback!

Specifically, as mentioned in the motivation, it is currently impossible (or not necessarily “impossible”, but extremely unsafe) to make a breaking change without making an entirely new capsule. I agree that making a new capsule is the proper approach if you’re making several breaking changes, but if e.g. there was an underlying security problem, there would be no way to make that change whatsoever. Note that currently with a capsule, an ABI cannot make a breaking change for any reason, including security vulnerabilities.

This is especially an issue when it comes to existing ABIs – if they did not previously include a way to check the version at runtime (because perhaps, they forgot to). This might be a bad example, but take a look at _socket’s ABI. It’s stated in the comment to always add things on the end for compatibility – but there’s nothing on the structure that could actually differentiate the structure between versions (e.g. a size field, per the proposal, or something like a hard coded minor version number). Granted, any end user can use CPython’s version instead, but imagine this was in an external library, without CPython’s version management to fallback to.

I’m a little confused on what you mean here, could you elaborate? If by “final end user” you mean someone writing in Python, there’s not much benefit – ABIs are irrelevant to the Python code.

Although, plenty of people use ctypes for calling C code instead of writing an extension module – that’s where this PEP can come in handy on the Python side. The proposed ctypes.PyABI will do version management for those people, where something like a version macro defined in a header file is not available.

How is a new “version” of a capsule different from being an entirely new capsule? (I assume “entirely new” just means it has a different name. If “entirely new” means something other than the name and the breaking change itself, I’d appreciate the clarification. It may be that I missed it in the text.)

Someone has to define the PyABI instance, and that’s the wrapper. If you want flexibility here, you have to define one for each ABI version that you want to support, and then whether you choose by the version number returned or the name in the capsule doesn’t seem to matter that much.

By “entirely new,” I meant that it’s a new ABI structure, and a different capsule instance (because ideally, you want to serve both ABIs for a period of time – going with your convention, abi_v1 would exist for a while, and abi_v2 would exist for longer)

In terms of the PyABI instance, the library will likely define that for you, and the user can just import it – then the user just has to ask for specifically what version they want via from_capsule.

If the library defines it, then it knows which version it has. It would need to at least be independent from the one that implements the interface, but unless it has its own complex dependencies, it’d likely end up just matching the implementation.

My understanding was that the ABI module is mainly for C implementations, so that native code can interact with another native module without needing a direct dependency on the version.

This is unavoidable though, since if your ABI changes then you must have a new/different structure, and if you want to keep the old one around then you have two. But this is easy to do - you just have to be careful with your API changes (which you already are doing if you’ve gone down this path, otherwise you’d just YOLO and let your users sort out the mess…)


I don’t see any problem with writing up guidelines on how best to implement an ABI pointer that’s available through a capsule, but I also don’t see anything that needs to be changed in the existing implementation in order to use it safely.

Notice the word likely. It’s possible for the extension module to not match the Python code (at least, that’s what I’ve been told – I don’t work with Python packaging, please correct me if I’m wrong!)

That’s fair – but you do get a little more flexibility with the Py_mod_capsule approach. The main improvement that I can think of is that you can change a field without modifying the type (e.g. making a PyObject * refer to something else).

I actually did write that at first – but it just seemed like it was just putting more work on ABI developers. If a capsule did all the version management automatically, it’s quite more convenient. Petr specifically made a point about this – if a PEP (at least, that’s what I’m assuming you meant by “guidelines”) is going to be written, why not be bolder?

(I was thinking more blog post, tutorial, or book about exposing ABI via capsules, rather than another PEP. Historically, writing advisory PEPs on stuff like this hasn’t achieved much. Your best bet is to write the helper library for projects to use and make it popular.)

This is perhaps true, but I don’t think this idea achieves it. The only part that is going to be automatic is the equivalent of testing each possible name to find the one that is available, after which the caller will have to use the selected version to choose the ABI struct to cast to. So I suspect it works out about even, apart from it taking about 5-7 years until your code is actually simpler because you no longer worry about the old method. Using a set of names and selecting the most appropriate one can be fully implemented today, and the provider can also provide as many of them as they like.

(You might be interested in my proposal for the C API more broadly, which basically follows this pattern, though the ABI “name”/“version” is hidden behind an arbitrary constant.)

Where you actually get some automatic benefit from version management is if you have a rich object on the consumer’s side - something that is able to provide a stable ABI and adapt it to whichever inter-module ABI is in the capsule. But now you have another ABI to maintain! So the only real benefit appears in Python code, but at that point you can safely import the Python code from the module itself without worrying about compilers or any of the reasons you’d need to stabilise or strictly define a native-level ABI.

I really think this would benefit most from being implemented as a helper library for packages that need to do this. You can use as many nasty hacks as needed to work with capsules as they exist, and then we’ll have some real context for making changes if it turns out they’d be helpful.

Right now, this feels like a nice idea that isn’t actually grounded in a problem - and so “YAGNI” applies. But if there is a need for it, a helper library will get popular and prove it.

I’m a little confused on what you meant by this — part of the point of this proposal is so you don’t have to check names, and instead check a number. The caller doesn’t have to do much choosing, they should already know what they want beforehand.

For example, PyObject *capsule = PyCapsule_ImportVersioned("foo.foo_abi", 1, 0) will ensure that capsule is always going to be major version 1.

Regarding turning this into a library, this lack-of-versioning problem came up when trying to write another ABI in an extension module (see this thread, which was decided to be turned into a library instead), so that’s saying something. Turning this into a library is probably doable, but I’m not sure it’s the best approach (and probably doomed to fail, because as an ABI developer, the thought of installing an extra ABI just for managing the version of mine sounds a bit weird, so I doubt it’s something that would get any traction.)

Right, but so is PyCapsule_Import("foo.foo_abi_v1"). And if you want the latest available version that you know about (because it’s no good getting a void * to something you don’t understand), then you do this:

struct ABI3 *abi3 = NULL;
struct ABI2 *abi2 = NULL;
struct ABI1 *abi1 = NULL;
abi3 = (struct ABI3*)PyCapsule_Import("foo.foo_abi_v3", 0);
if (!abi3) {
    PyErr_Clear();
    abi2 = (struct ABI2*)PyCapsule_Import("foo.foo_abi_v2", 0);
    if (!abi2) {
        PyErr_Clear();
        abi1 = (struct ABI1*)PyCapsule_Import("foo.foo_abi", 0);
        if (!abi1) {
            return NULL;
        }
    }
}
// Of course, you'll be checking these three pointers for the rest
// of the function, but that's the nature of supporting multiple ABIs.
if (abi3) {
    (*abi3.func)(1, 2, 3);
} else if (abi2) {
    (*abi2.func)(1, 2);
} else {
    (*abi1.func)(1);
}

This doesn’t get much easier by putting the version number into the capsule. You might as well put the version number into the top of the struct, which would flatten that tree of “imports” just as effectively as a totally new API - either way, you still need to cast the pointer to a different C struct, which leaves most of the complication there.

Thanks for the clarification, I see what you meant now. I’m not totally sure this is a real world scenario, though – the draft doesn’t address anything about getting the “latest major version.” Ideally, you would just get one ABI, and then check for for different functions with PyCapsule_GetSize.

// You could use sizeof(struct ABI1) to ensure that the ABI is exactly what you're looking for, or instead check the size for fallbacks
PyObject *abi1_cap = PyCapsule_ImportVersioned("foo.foo_abi", 1, 0); // Skipping the size check here so we can do it dynamically

structure ABI1 *abi1 = PyCapsule_GetPointer(abi1_cap, "foo.foo_abi");
Py_ssize_t size = PyCapsule_GetSize(abi1_cap);
if (size < offsetof(abi1, new_func)) {
    // Fallback to func()
    abi1->func("42");
} else {
    // It's available!
    abi1->new_func(42);
}

Obligatory ping to developers of capsule-providing libraries @seberg @rgommers @jorisvandenbossche .

1 Like

For the capsule ABI developers – note that this is backwards compatible, so existing code with capsules will still work without changes.

A bit on the go, but was excited and wanted to comment, so sorry if I missed discussion! I was not aware of the Py_mod_capsule, is it new/properly documented?

In general, I like this idea. We just broke the ABI in NumPy, although there a versioning is trivial and always existed. And I had another (capsule, not module!) where I had to add versioning and having that in the capsule would:

  • make adding versioning much easier (since you have an implicit 0th version, you would’t need a new name)
  • And I think it is also just a bit simpler, because you don’t need the additional “safe” struct step.

Also, I think it is often an oversight to not have versioning, adding it here is a great nudge (maybe even adding versioning for free effectively).

I will note that I am not sure that the proposed PyCapsule_ImportVersioned is as useful as it seems unless you would actually store multiple versions.
In practice packages that are not part of CPython cannot break ABI freely. That is, you must have either:

  • A single capsule/struct/API table which retains a “sufficient” level of backwards compatibility
  • Multiple capsules so that users can keep using the old capsule for a while, but in that case you need a DeprecationWarning when it is accessed (at some point)!

Neither of those gets easier with PyCapsule_ImportVersioned as far as I can tell (on first quick sight)!?

Maybe to say what I did for NumPy 2 right now:

  • I force everyone to recompile with NumPy 2 headers and we changed where the API capsule lives (so you can’t keep using it, this behaves like a version, but not quite; there were other reasons for this as well).
  • If you compile with NumPy 2, it doesn’t matter if the capsule contains the NumPy 1 or 2 API table: Where-ever they differ (almost nothing), the headers would do a version dependent lookup.
  • If you want to use new API, then you need to compile NumPy with a newer “target version” and importing NumPy will fail. (Note that NumPy headers do these checks for the user.)

This allowed me to break ABI by changing API. Besides having to recompile, most users don’t notice anything. Now, of course, the table changed very little, otherwise you would need a lot of version dependent macros which wouldn’t be great…
But if the API changed a lot, I would need to ship both an old and a new API side-by-side (i.e. two capsules).
(This is better for downstream: You introduce new API, but nothing needs to recompile right away. It is a bit harder for maintainers/me: You cannot actually finalize any change for at least a year or so. We limited changes instead, the experience may proof we should have done the two capsules path!)

2 Likes

Py_mod_capsule is part of this proposal – you can read about it in the “New module slot” section of my original post.

(PyCapsule_ImportVersioned is just for convenience – most of the work is done by PyCapsule_GetFromModule. It also handles submodules properly, unlike PyCapsule_Import.)

The main improvement that NumPy would probably benefit from with ImportVersioned or GetFromModule is that you can have finer control through the Py_mod_capsule slot – you could do either of the things you specified pretty easily with it.

1 Like

Ah, sorry, should have noticed that you have a getter with the version! So you do solve the issue of being able to introduce two versions side by side and I guess even something like version=-1 to get the newest version which makes things nice even for what we did for NumPy 2 (try to use NumPy 2 table, but have compat for a NumPy 1 API table if that is what we get in runtime).

So, it does seem like it indeed helps implementing any of these transition policies, thanks!

2 Likes

For someone who uses the C API a bit to try to optimize bits of my own code I also find that this proposed ABI would also help me to at least also be able to detect incompatible runtime versions of the capsules I depend on before they become a problem.

Let’s use the builtin datetime module for example. Suppose one of the functions exposed in it’s capsule pointer was removed entirely in a patch build of 3.14 but I shipped a binary wheel in my C extension that depends on that function before it was removed. Now at runtime you will crash as there is no ABI like this that would alert me of this in runtime:

  • a function was removed and which function was removed.
  • the version of the capsule changed since a previous version of 3.14.

As such I agree with this proposal and encourage all capsules including the ones within python/cpython on github to migrate to the new ABI proposed here. Also it would allow for the issues from the old capsule ABI to be bypassed as it would allow addressing the issue with a new ABI function in python’s public c api.

1 Like

That’s an option, but it would probably be up to the core devs and not an actual part of this proposal. As mentioned earlier, you can use CPython’s version to deal with most version-checking needs.