Pre-PEP: Unified slot system for the C API

Here’s a proposal for the C API that’s been cooking for… um, a long time.
Still rough around some of the edges, but I hit a self-imposed deadline to get it out, so here it is.

(Please ignore formatting, this is reST rendered as Markdown.)

Motivation

Python’s C API currently contains two extendable structs used to carry
information, notably:

  • PyType_Spec
  • PyModuleDef

Each has a family of C API functions that use the structure as input,
creating a Python object from it. (Each family works as a single function,
with optional arguments that got added over time.) These are:

  • PyType_From* functions for PyType_Spec
  • PyModule_FromDef* for PyModuleDef

Separating “input” structures from runtime objects allows the internal
structure of the object to stay opaque (in both the API and the ABI),
allowing future CPython versions (or even alternative implementations) to
change the details.

Both structures contain a slots field – an array of
tagged unions <https://en.wikipedia.org/wiki/Tagged_union>_,
which allows for future expansion.
(In practice, these are void pointers taged with an int ID.)

This spec aims to update these structures, in particular to allow:

  • making backward and forward compatibility easier to maintain (for
    example, avoiding the need for new functions with additional arguments)
  • improved type safety, and avoiding what’s technically undefined behaviour

Replacing ModuleDef

The more immediate motivation for this proposal is that

  • the PyObject memory layout differs between regular and free-threading builds,
  • the PyModuleDef struct is usually allocated statically (using PyModuleDef_HEAD_INIT), and
  • PyModuleDef is a Python object.

If we want to introduce an ABI subset usable with both builds, we need to change the main mechanism to create modules.

Let’s design it as best as we currently can, so it can last at least as long as PyModuleDef_Slot.

Example

This proposal adds API to create classes and modules from arrays of slots,
which can be specified as C literals using macros, like this::

static Py_Slot myClass_slots[] = {
   PySlot_STATIC(tp_name, "mymod.MyClass"),
   PySlot_SIZE(tp_extra_basicsize, sizeof(struct myClass)),
   PySlot_FUNC(tp_repr, myClass_repr),
   PySlot_INT64(tp_flags, Py_TPFLAGS_DEFAULT | Py_TPFLAGS_MANAGED_DICT),
   PySlot_END,
}

...
    PyObject *MyClass = PyType_FromSlots(myClass_slots, -1);

The macros simplify hand-written literals.
For more complex use cases, like compatibility between several Python versions,
or templated/auto-generated slot arrays, as well as for non-C users of the
C API, the slot struct definitions can be written out.
For example, if the transition from tp_getattr to tp_getattro
was happening now and the user wanted to support CPython with and without
tp_getattro, they could add a HAS_FALLBACK flag:

static Py_Slot myClass_slots[] = {
   ...
   {   // only used if unsupported
       .sl_id=Py_tp_getattro,
       .sl_flags=PySlot_HAS_FALLBACK,
       .sl_func=myClass_getattro,
   },
   {    // only used if if the slot above is not supported
       .sl_id=Py_tp_getattr,
       .sl_func=myClass_old_getattr,
   },
   PySlot_END,
}

Rationale

Here we explain the design decisions in this proposal.

Using slots

The main alternative to slots is using a versioned struct for input.

There are two variants of such a design:

  • A large struct with fields for all info. As we can see with
    PyTypeObject, most of such a struct tends to be NULLs in practice.
    As more fields become obsolete, either the wastage grows, or we introduce
    new struct layouts (while keeping compatibility with the old ones for a while).

  • A small struct with only the info necessary for initial creation, with other
    info added afterwards (with dedicated function calls, or Python-level
    setattr). This design:

    • makes it cumbersome to add/obsolete/adjust the required info (for example,
      in :PEP:697 I gave meaning to negative values of an existing field; adding
      a new field would be cleaner in similar situations);
    • increases the number of API calls between an extension and the interpreter.

    We believe that “batch” API for type/module creation makes sense,
    even if it partially duplicates an API to modify “live” objects.

Using slots only

The classes PyType_Spec and PyModuleDef have explicit fields
in addition to a slots array. These include:

  • Required information – the names: PyType_Spec.name and PyModuleDef.m_name.
    This proposal adds name slots, and makes them required.
  • Non-pointers (basicsize, flags) – originally, slots were intended to
    only contain function pointers; they now contain data pointers as well as
    integers or flags. This proposal uses an union to handle types cleanly.
  • Items added before the slots mechanism (PyModuleDef.m_slots itself was
    repurposed from m_reload which was always NULL; m_traverse or
    m_methods predate it).

We can do without these fields, and have only an array of slots.
A wrapper class around the array would complicate the design.
Also, if fields in such a class ever become obsolete,
they’d need their own deprecation mechanism.

Nested slot tables

The array of slots can reference another array of slots, which is treated
as if it was merged into its “parent”, recursively.
This complicates slot handling inside the interpreter, but allows:

  • Mixing dynamically allocated (or stack-allocated) slots with static ones.
    This solves the issue that lead to the PyType_From* family of
    functions expanding with values that typically can’t be static
    (i.e. it’s often a symbol from another DLL, which can’t be static
    data on Windows).
  • Sharing a subset of the slots to implement functionality
    common to several classes/modules.
  • Easily including some slots conditionally, e.g. based on the Python version.

Nested “legacy” slot tables

Similarly to nested arrays of PyType_Slot, we also propose supporting
arrays of “legacy” slots (PyType_Slot and PyModuleDef_Slot) in
the “new” slots, and vice versa.

This way, users can reuse code they already have written, without rewriting/reformatting
it, and only use the “new” slots if they need any new features.

Fixed-width integers

This proposal uses fixed-width integers, uint16_t, for slot IDs and
flags.
With the C int type, using more that 16 bits would not be portable,
but it would silently work on common platforms. We can carefully avoid values
over UINT16_MAX, but we’d still waste 16 bits on common platforms.

With these defined as uint16_t, it seems natural to use fixed-width
integers for everything except pointers and sizes.

The proposal does not use bit-fields and enums, whose memory representation is
compiler-dependent, causing issues when using the API from languages other
than C.

The structure is laid out assuming that a type’s alignment matches its size.

Memory layout

On common 64-bit platforms, we can keep the size of the new struct the same
as the existing PyType_Slot and PyModuleDef_Slot. (The existing
struct waste 6 out of 16 bytes due to int portability and padding;
this proposal puts those bits to use for new features.)
On 32-bit platforms, this proposal calls for the same layout as on 64-bit,
doubling the size compared to the existing structs (from 8 bytes to 16).
For “configuration” data that’s usually static, it should be OK.

Single ID space

Currently, the numeric values of module and type slots overlap:

  • Py_bf_getbuffer == Py_mod_create == 1
  • Py_bf_releasebuffer == Py_mod_exec == 2
  • Py_mp_ass_subscript == Py_mod_multiple_interpreters == 3
  • Py_mp_length == Py_mod_gil == 4

This proposal use a single sequence for both, so future slots avoid this
overlap. This is to:

  • Avoid accidentally using type slots for modules, and vice versa
  • Allow external libraries or checkers to determine a slot’s meaning
    (and type) based on the ID.

The 4 existing overlaps means we don’t reach these goals right now,
but we can gradually migrate to new numeric IDs in a way that’s transparent
to the user.

The main disadvantage is that any internal lookup tables will be bigger
(if we use separate ones for types & modules, they’ll contain blanks),
or harder to manage (if they’re merged).

Supporting both NULL-terminated arrays and explicit sizes

In C, it is natural to write array literals terminated by a NULL/zero element.

An alternative is accepting non-terminated arrays with an associated element
count. Notably, this allows:

  • Treating a pointer as an array of a single element
  • Specifying an arbitrary subset of a larger array, without copying memory
  • Better performance if an extra loop is needed to count the elements

This proposal allows arrays-with-a-count as an alternative to zero-terminated
ones. The API for them is meant more for code generators than for people
hand-writing C literals.

Specification

The following functions will be added:

PyObject *PyType_FromSlots(PySlot *slots, Py_ssize_t n_slots);
PyObject *PyModule_FromSlots(PySlot *slots, Py_ssize_t n_slots);
PyObject *PyModuleDef_FromSlots(PySlot *slots, Py_ssize_t n_slots);

The first two create the corresponding
Python object from the given array and slots.
PyModuleDef_FromSlots creates a ModuleDef that describes multi-phase
initialization, to be returned from a PyInit_* function. (Its result
will be an internal subclass of PyModuleDef.)

The n_slots argument may be -1, which means slots is zero-terminated.
Otherwise, it gives the size of the array.
(In this case, zero entries inside the array are invalid.)

Slot structure

The PySlot structure will be defined as follows::

typedef struct PySlot {
    uint16_t sl_id;
    uint16_t sl_flags;
    union {
        uint32_t sl_array_size;
    };
    union {
        void *sl_ptr;
        void (*sl_func)(void);
        Py_ssize_t sl_size;
        int64_t sl_int64;
        uint64_t sl_uint64;
    };
} PySlot;

(The actual definition will be more complex, mainly for C/C++ compiler
compatibility.)

  • sl_id: A slot number, identifying what the slot does.
  • sl_flags: Flags, defined below.
  • A 32-bit union, whose meaning depends on sl_flags. This specification
    defines only one option:
    • sl_array_size: explicit array size; see Py_SLOT_SIZED_ARRAY below
  • An union with the data, whose type depends on the slot.

General slot semantics

When slots are passed to a function that applies them, the function will not
modify the slot array, nor any data it points to (recursively).

After the function is done, the user is allowed to modify or deallocate the
array, and any data it points to (recursively), unless it’s explicitly marked
as “static” (see Py_SLOT_STATIC below).
This means the interpreter needs typically needs to make a copy of all data
in the struct, including char * text.

Flags

sl_flags may set the following bits. Unassigned bits must be set to zero.

  • PySlot_OPTIONAL: If the slot ID is unknown, the interpreter should
    ignore the slot entirely. (For example, if nb_matrix_multiply was being
    added to CPython now, your type could use this.)

  • PySlot_STATIC: The contents of the slot (and all data it points to,
    recursively) are statically allocated. Thus, the interpreter does not need
    to copy the information.
    Implied for function pointers.

  • PySlot_SIZED_ARRAY: sl_ptr points to an array, whose size is given
    in sl_array_size. Without this flag, arrays are zero-terminated
    (as with the existing Py_tp_members, for example).
    Must not be used for numbers or function pointers.

  • PySlot_SKIP_IF_NULL: Skip this slot if its data is NULL/zero. Intended
    for templated or auto-generated slots.
    (The check will be done before type specific handling, so all fields of
    the data union (sl_ptr, sl_int, …) must be zeroed.)

  • PySlot_HAS_FALLBACK: If the slot ID is unknown, the interpreter will
    ignore the slot. If it’s known, it should ignore subsequent slots up to (and including)
    the first one without HAS_FALLBACK.

    Effectively, consecutive slots with the HAS_FALLBACK flag, plus the first
    non-HAS_FALLBACK slot after them, form a “block” where the the interpreter
    will only consider the first slot in the block that it understands.
    If the entire block is to be optional, it should end with a Py_slot_end
    with OPTIONAL flag.

Convenience macros

The following macros will be added to the API::

#define PySlot_DATA(NAME, VALUE) \
   {.sl_id=Py ## NAME, .sl_ptr=VALUE}

#define PySlot_FUNC(NAME, VALUE) \
   {.sl_id=Py ## NAME, .sl_func=VALUE}

#define PySlot_SIZE(NAME, VALUE) \
   {.sl_id=Py ## NAME, .sl_size=VALUE}

#define PySlot_INT64(NAME, VALUE) \
   {.sl_id=Py ## NAME, .sl_int64=VALUE}

#define PySlot_UINT64(NAME, VALUE) \
   {.sl_id=Py ## NAME, .sl_uint64=VALUE}

#define PySlot_STATIC(NAME, VALUE) \
   {.sl_id=Py ## NAME, .sl_flags=Py_SLOT_STATIC, .sl_ptr=VALUE}

#define PySlot_END {.sl_id=0}

New slot IDs

The following new slot IDs will be added:

  • Py_slot_end (just a new name for zero)
    • With sl_flags=0, marks the end of a zero-terminated slots array.
    • With sl_flags=Py_SLOT_OPTIONAL, this slot is ignored.
  • Py_slot_subslots: array of PySlot structures, treated as if
    they appeared at this point in the array. (XXX: HAS_FALLBACK blocks can’t span subslots?)
  • Py_tp_slots: array of “legacy” PyType_Slot structures.
  • Py_mod_slots: array of “legacy” PyModuleDef_Slot structures.

New slots will be added to cover existing members of PyType_Spec and
PyModuleDef:

  • Py_tp_name (mandatory for type creation)
  • Py_tp_basicsize (of type Py_ssize_t!)
  • Py_tp_extra_basicsize (equivalent to setting PyType_Spec.basicsize
    to -extra_basicsize)
  • Py_tp_itemsize
  • Py_tp_flags
  • Py_mod_name (mandatory for module creation)
  • Py_mod_doc
  • Py_mod_size
  • Py_mod_methods
  • Py_mod_traverse
  • Py_mod_clear
  • Py_mod_free

New slots will have unique numbers (that is, Py_slot_*, Py_tp_*
and Py_mod_* won’t share IDs).

Slots numbered 1 through 4
(Py_bf_getbufferPy_mp_length and Py_mod_createPy_mod_gil)
will get new (larger) numbers.
The old numbers will remain as aliases, and will be used when compiling for
Python versions below 3.14.

Backwards Compatibility

Yes!
Forward compatibility too please!

Security Implications

None known

How to Teach This

Rewrite the “Extending and Embedding” tutorial to use this.

Reference Implementation

Not yet.

Rejected Ideas

Stuff that could be neat but is out of scope for this proposal:

  • PyType_ApplySlots
  • slots for adding constants (à la PyModule_AddIntConstant)
  • module slot to create a type and add it to a module

Open Issues

Add yours.

Copyright

The usual.

3 Likes

A few small comments:

The data union is going to be a bit awkward to use in C++ versions less that C++20 because the designated initializer .sl_uint64 = value syntax won’t work which makes it hard to build static arrays of them. I don’t see an easy way around that. And I’m also not sure what point it’s reasonable to say “just upgrade C++ if it’s a problem” but I think C++20 might be a bit too recent for that.

Presumably PySlot_STATIC is intended as a suggestion rather than a requirement - you don’t guarantee that it won’t be copied?

How much overlap is there between the slots that types accept and the slots that modules accept? My impression is that there is very little. I’d be a bit worried that making them come from one big unified list of numbers implies more interchangeability than really exists. That’s probably something that people will work out fairly quickly though.

Named unions it is then.
(The convenience macros are needed anyway to make the simple cases convenient, so hopefully it can be a bit more cumbersome to write out the initializer.)

Yes.

No overlap; only Py_slot_subslots & Py_slot_end should be usable in both. (Plus the 4 that overlap now.)
So, if a user uses the wrong one they should get a meaningful exception, not a bunch of bits cast to the wrong type.

1 Like

I’m not sure named unions make it either better or worse than anonymous unions.

The problem in C++ <20 is just that you can’t statically initialize anything except the first member of a union in C++ (because there’s no way to initialize the second member without having to initialize the first).

I don’t think that should really change your proposal - unions are clearly the right datatype to use. But it’s just a bit awkward without a an up-to-date C++ compiler.

I suspect Cython wouldn’t use this initially because we’re trying not to impose to many unnecessary compiler requirements. But it’d probably happily use it in anything personal where I do control the compiler. That’s fine since the existing mechanisms aren’t going away.

Remove first ‘needs’.