PEP 697 – Limited C API for Extending Opaque Types

encukou · October 6, 2022, 2:18pm

Hello,
I’ve posted PEP 697, dealing with a pretty specific pain point when extending classes using the C API: namely, tight coupling to the superclass.
I’ve floated the idea on python-dev in May and got positive feedback, so I wrote a PEP that tries to give (lots of) background and possible API.
AFAIK the feature would be helpful for pybind11 (point 2 in this list) and HPy, but I still need to check if this iteration would be good for them.

PEP 697 text

PEP: 697
Title: Limited C API for Extending Opaque Types
Author: Petr Viktorin <encukou@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Aug-2022
Python-Version: 3.12


Abstract
========

Add `Limited C API <https://docs.python.org/3.11/c-api/stable.html#stable-application-binary-interface>`__
for extending types with opaque data,
by allowing code to only deal with data specific to a particular (sub)class.

Make the mechanism usable with ``PyHeapTypeObject``.


Motivation
==========

The motivating problem this PEP solves is creating metaclasses (subclasses of
:py:class:`python:type`) in “wrappers” – projects that expose another type
system (e.g. C++, Java, Rust) as Python classes.
These systems typically need to attach information about the “wrapped”
non-Python class to the Python type object -- that is, extend
``PyHeapTypeObject``.

This should be possible to do in the Limited API, so that these generators
can be used to create Stable ABI extensions. (See :pep:`652` for the benefits
of providing a stable ABI.)

Extending ``type`` is an instance of a more general problem:
extending a class while maintaining loose coupling – that is,
not depending on the memory layout used by the superclass.
(That's a lot of jargon; see Rationale for a concrete example of extending
``list``.)


Rationale
=========

Extending opaque types
----------------------

In the Limited API, most ``struct``\ s are opaque: their size and memory layout
are not exposed, so they can be changed in new versions of CPython (or
alternate implementations of the C API).

This means that the usual subclassing pattern -- making the ``struct``
used for instances of the *base* type be the first element of the ``struct``
used for instances of the *derived* type -- does not work.
To illustrate with code, the `example from the tutorial <https://docs.python.org/3.11/extending/newtypes_tutorial.html#subclassing-other-types>`_
extends :external+python:c:type:`PyListObject` (:py:class:`python:list`)
using the following ``struct``:

.. code-block:: c

    typedef struct {
        PyListObject list;
        int state;
    } SubListObject;

This won't compile in the Limited API, since ``PyListObject`` is opaque (to
allow changes as features and optimizations are implemented).

Instead, this PEP proposes using a ``struct`` with only the state needed
in the subclass, that is:

.. code-block:: c

    typedef struct {
        int state;
    } SubListState;

    // (or just `typedef int SubListState;` in this case)

The subclass can now be completely decoupled from the memory layout (and size)
of the superclass.

This is possible today. To use such a struct:

* when creating the class, use ``PyListObject->tp_basicsize + sizeof(SubListState)``
  as ``PyType_Spec.basicsize``;
* when accessing the data, use ``PyListObject->tp_basicsize`` as the offset
  into the instance (``PyObject*``).

However, this has disadvantages:

* The base's ``basicsize`` may not be properly aligned, causing issues
  on some architectures if not mitigated. (These issues can be particularly
  nasty if alignment changes in a new release.)
* ``PyTypeObject.tp_basicsize`` is not exposed in the
  Limited API, so extensions that support Limited API need to
  use ``PyObject_GetAttrString(obj, "__basicsize__")``.
  This is cumbersome, and unsafe in edge cases (the Python attribute can
  be overridden).
* Variable-size types are not handled (see `var-sized`_ below).

To make this easy (and even *best practice* for projects that choose loose
coupling over maximum performance), this PEP proposes an API to:

1. During class creation, specify that ``SubListState``
   should be “appended” to ``PyListObject``, without passing any additional
   details about ``list``. (The interpreter itself gets all necessary info,
   like ``tp_basicsize``, from the base).

   This will be specified by a negative ``PyType_Spec.basicsize``:
   ``-sizeof(SubListState)``.

2. Given an instance, and the subclass ``PyTypeObject*``,
   get a pointer to the ``SubListState``.
   A new function will be added for this.

The base class is not limited to ``PyListObject``, of course: it can be used to
extend any base class whose instance ``struct`` is opaque, unstable across
releases, or not exposed at all -- including :py:class:`python:type`
(``PyHeapTypeObject``) mentioned earlier, but also other extensions
(for example, NumPy arrays [#f1]_).

For cases where no additional state is needed, a zero ``basicsize`` will be
allowed: in that case, the base's ``tp_basicsize`` will be inherited.
(With the current API, the base's ``basicsize`` needs to be passed in.)

The ``tp_basicsize`` of the new class will be set to the computed total size,
so code that inspects classes will continue working as before.


.. _var-sized:

Extending variable-size objects
-------------------------------

Additional considerations are needed to subclass
:external+python:c:type:`variable-sized objects <PyVarObject>`
while maintaining loose coupling as much as possible.

Unfortunately, in this case we cannot decouple the subclass from its superclass
entirely.
There are two main memory layouts for variable-sized objects, and the
subclass's author needs to know which one the superclass uses.

In types such as ``int`` or ``tuple``, the variable data is stored at a fixed
offset.
If subclasses need additional space, it must be added after any variable-sized
data::

   PyTupleObject:
   ┌───────────────────┬───┬───┬╌╌╌╌┐
   │ PyObject_VAR_HEAD │var. data   │
   └───────────────────┴───┴───┴╌╌╌╌┘

   tuple subclass:
   ┌───────────────────┬───┬───┬╌╌╌╌┬─────────────┐
   │ PyObject_VAR_HEAD │var. data   │subclass data│
   └───────────────────┴───┴───┴╌╌╌╌┴─────────────┘

In other types, like ``PyHeapTypeObject``, variable-sized data always lives at
the end of the instance's memory area::

   heap type:
   ┌───────────────────┬──────────────┬───┬───┬╌╌╌╌┐
   │ PyObject_VAR_HEAD │Heap type data│var. data   │
   └───────────────────┴──────────────┴───┴───┴╌╌╌╌┘

   type subclass:
   ┌───────────────────┬──────────────┬─────────────┬───┬───┬╌╌╌╌┐
   │ PyObject_VAR_HEAD │Heap type data│subclass data│var. data   │
   └───────────────────┴──────────────┴─────────────┴───┴───┴╌╌╌╌┘

The first layout enables fast access to the items array.
The second allows subclasses to ignore the variable-sized array (assuming
they use offsets from the start of the object to access their data).

Which layout is used is, unfortunately, an implementation detail that the
subclass code must take into account.
Correspondingly, if a variable-sized type is designed to be extended in C,
its documentation should note the mechanism used.
Since this PEP focuses on ``PyHeapTypeObject``, it proposes API for the second
variant.

Like with fixed-size types, extending a variable-sized type is already
possible: when creating the class, ``base->tp_itemsize`` needs to be passed
as ``PyType_Spec.itemsize``.
This is cumbersome in the Limited API, where one needs to resort to
``PyObject_GetAttrString(obj, "__itemsize__")``, with the same caveats as for
``__basicsize__`` above.

This PEP proposes a mechanism to instruct the interpreter to do this on its
own, without the extension needing to read ``base->tp_itemsize``.

Several alternatives for this mechanism were rejected:

* The easiest way to do this would be to allow leaving ``itemsize`` as 0 to
  mean “inherit”.
  However, unlike ``basicsize`` zero is a valid value for ``itemsize`` --
  it marks fixed-sized types.
  Also, in C, zero is the default value used when ``itemsize`` is not specified.
  Since extending a variable-sized type requires *some* knowledge of the
  superclass, it would be a good idea to require a more explicit way
  to request it.
* It would be possible to reserve a special negative value like ``itemsize=-1``
  to mean “inherit”.
  But this would rule out a possible future where negative ``itemsize``
  more closely matches negative ``basicsize`` -- a request for
  additional space.
* A new flag would also work, but ``tp_flags`` is running out of free bits.
  Reserving one for a flag only used in type creation seems wasteful.

So, this PEP proposes a new :external+python:c:type:`PyType_Slot` to mark
that ``tp_itemsize`` hould be inherited.
When this flag is used, ``itemsize`` must be set to zero.
Like with ``tp_basicsize``, ``tp_itemsize`` will be set to the computed value
as the class is created.


Normalizing the ``PyHeapTypeObject``-like layout
''''''''''''''''''''''''''''''''''''''''''''''''

Additionally, this PEP proposes a helper function to get the variable-sized
data of a given instance, assuming it uses the ``PyHeapTypeObject``-like layout.
This is mainly to make it easier to define and document such types.

This function will not be exposed in the Limited API.


Relative member offsets
-----------------------

One more piece of the puzzle is ``PyMemberDef.offset``.
Extensions that use a subclass-specific ``struct`` (``SubListState`` above)
will get a way to specify “relative” offsets -- offsets based on this ``struct``
-- rather than to “absolute” ones (based on ``PyObject*``).

One way to do it would be to automatically assume “relative” offsets
if this PEP's API is used to create a class.
However, this implicit assumption may be too surprising.

To be more explicit, this PEP proposes a new flag for “relative” offsets.
At least initially, this flag will serve only a check against misuse
(and a hint for reviewers).
It must be present if used with the new API, and must not be used otherwise.


Specification
=============

In the code blocks below, only function headers are part of the specification.
Other code (the size/offset calculations) are details of the initial CPython
implementation, and subject to change.

Relative ``basicsize``
----------------------

The ``basicsize`` member of ``PyType_Spec`` will be allowed to be zero or
negative.
In that case, it will specify the inverse of *extra* storage space instances of
the new class require, in addition to the basicsize of the base class.
That is, the basicsize of the resulting class will be:

.. code-block:: c

   type->tp_basicsize = _align(base->tp_basicsize) + _align(-spec->basicsize);

where ``_align`` rounds up to a multiple of ``alignof(max_align_t)``.
When ``spec->basicsize`` is zero, ``base->tp_basicsize`` will be inherited
directly instead (i.e. set to ``base->tp_basicsize`` without aligning).

On an instance, the memory area specific to a subclass -- that is, the
“extra space” that subclass reserves in addition its base -- will be available
through a new function, ``PyObject_GetTypeData``.
In CPython, this function will be defined as:

.. code-block:: c

   void *
   PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls) {
       return (char *)obj + _align(cls->tp_base->tp_basicsize);
   }

Another function will be added to retreive the size of this memory area:

.. code-block:: c

   Py_ssize_t
   PyObject_GetTypeDataSize(PyTypeObject *cls) {
       return cls->tp_basicsize - _align(cls->tp_base->tp_basicsize);
   }

The new ``*Get*`` functions come with an important caveat, which will be
pointed out in documentation: They may only be used for classes created using
negative ``PyType_Spec.basicsize``. For other classes, their behavior is
undefined.
(Note that this allows the above code to assume ``cls->tp_base`` is not
``NULL``.)


Inheriting ``itemsize``
-----------------------

If a new slot, ``Py_tp_inherit_itemsize``, is present in
``PyType_Spec.slots``, the new class will inherit
the base's ``tp_itemsize``.

If this is the case, CPython will assert that:

* ``PyType_Spec.itemsize`` must be set to zero.
* The ``Py_tp_inherit_itemsize`` slot's
  ``~PyType_Slot.pfunc`` must be set to NULL.

A new function, ``PyObject_GetItemData``, will be added to safely access the
memory reserved for items, taking subclasses that extend ``tp_basicsize``
into account.
In CPython it will be defined as:

.. code-block:: c

   void *
   PyObject_GetItemData(PyObject *obj) {
       return (char *)obj + Py_TYPE(obj)->tp_basicsize;
   }

This function will *not* be added to the Limited API.

Note that it **is not safe** to use **any** of the functions added in this PEP
unless **all classes in the inheritance hierarchy** only use
``PyObject_GetItemData`` (or an equivalent) for per-item memory, or don't
use per-item memory at all.
(This issue already exists for most current classes that use variable-length
arrays in the instance struct, but it's much less obvious if the base struct
layout is unknown.)

The documentation for all API added in this PEP will mention
the caveat.


Relative member offsets
-----------------------

In types defined using negative ``PyType_Spec.basicsize``, the offsets of
members defined via ``Py_tp_members`` must be relative to the
extra subclass data, rather than the full ``PyObject`` struct.
This will be indicated by a new flag, ``PY_RELATIVE_OFFSET``.

In the initial implementation, the new flag will be redundant. It only serves
to make the offset's changed meaning clear, and to help avoid mistakes.
It will be an error to *not* use ``PY_RELATIVE_OFFSET`` with negative
``basicsize``, and it will be an error to use it in any other context
(i.e. direct or indirect calls to ``PyDescr_NewMember``, ``PyMember_GetOne``,
``PyMember_SetOne``).

CPython will adjust the offset and clear the ``PY_RELATIVE_OFFSET`` flag when
intitializing a type.
This means that the created type's ``tp_members`` will not match the input
definition's ``Py_tp_members`` slot, and that any code that reads
``tp_members`` will not need to handle the flag.


Changes to ``PyTypeObject``
---------------------------

Internally in CPython, access to ``PyTypeObject`` “items”
(``_PyHeapType_GET_MEMBERS``) will be changed to use ``PyObject_GetItemData``.
Note that the current implementation is equivalent: it only lacks the
alignment adjustment.
The macro is used a few times in type creation, so no measurable
performance impact is expected.
Public API for this data, ``tp_members``, will not be affected.


List of new API
===============

The following new functions/values are proposed.

These will be added to the Limited API/Stable ABI:

* ``void * PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)``
* ``Py_ssize_t PyObject_GetTypeDataSize(PyTypeObject *cls)``
* ``Py_tp_inherit_itemsize`` slot for ``PyType_Spec.slots``

These will be added to the public C API only:

* ``void *PyObject_GetItemData(PyObject *obj)``


Backwards Compatibility
=======================

No backwards compatibility concerns are known.


Assumptions
===========

The implementation assumes that an instance's memory
between ``type->tp_base->tp_basicsize`` and ``type->tp_basicsize`` offsets
“belongs” to ``type`` (except variable-length types).
This is not documented explicitly, but CPython up to version 3.11 relied on it
when adding ``__dict__`` to subclasses, so it should be safe.


Security Implications
=====================

None known.


Endorsements
============

XXX: The PEP mentions wrapper libraries, so it should get review/endorsement
from nanobind, PyO3, JPype, PySide &c.

XXX: HPy devs might also want to chime in.


How to Teach This
=================

The initial implementation will include reference documentation
and a What's New entry, which should be enough for the target audience
-- authors of C extension libraries.


Reference Implementation
========================

XXX: Not quite ready yet


Possible Future Enhancements
============================

Alignment
---------

The proposed implementation may waste some space if instance structs
need smaller alignment than ``alignof(max_align_t)``.
Also, dealing with alignment makes the calculation slower than it could be
if we could rely on ``base->tp_basicsize`` being properly aligned for the
subtype.

In other words, the proposed implementation focuses on safety and ease of use,
and trades space and time for it.
If it turns out that this is a problem, the implementation can be adjusted
without breaking the API:

- The offset to the type-specific buffer can be stored, so
  ``PyObject_GetTypeData`` effectively becomes
  ``(char *)obj + cls->ht_typedataoffset``, possibly speeding things up at
  the cost of an extra pointer in the class.
- Then, a new ``PyType_Slot`` can specify the desired alignment, to
  reduce space requirements for instances.
- Alternatively, it might be possible to align ``tp_basicsize`` up at class
  creation/readying time.


Rejected Ideas
==============

None yet.


Open Issues
===========

Is negative basicsize the way to go? Should this be enabled by a flag instead?


Footnotes
=========

.. [#f1] This PEP does not make it “safe” to subclass NumPy arrays specifically.
   NumPy publishes `an extensive list of caveats <https://numpy.org/doc/1.23/user/basics.subclassing.html>`__
   for subclassing its arrays from Python, and extending in C might need
   a similar list.


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

wjakob · October 6, 2022, 2:36pm

Hi @encukou,

thank you for this! It looks great, and it would put current practice for stashing data in PyTypeObject onto a more solid foundation. Of course it’s fine for the PEP to mention nanobind. If there is a reference implementation, I’d be happy to port nanobind to this and give more technical feedback. For now, I cannot see a major issue.

Thanks,
Wenzel

fangerer · October 10, 2022, 4:05pm

Hi @encukou,

thank you for that PEP and as @wjakob already wrote: looks good to me as well in general.
This will definitively help C extensions and in particular HPy.

There are just two minor points I want to bring up here:

You proposed to use a negative basicsize (or alternatively some flag) to mark it as relative. Is that really necessary? In the limited API, a custom type structure will not embed the base type. So, IMO, the sign does not add any extra info, right? The only reason I see is to keep the type spec compatible to the unlimited API (I mean, if one does not use the limited API; not sure how you call it).
I’m slightly concerned that function PyObject_GetTypeData is a bit too expensive in certain situations because while the pointer computation would usually just be an offset computation, it now needs to load the type, the base and the base’s basicsize to compute the pointer. But I assume, if extension authors are aware of that, it should be acceptable since the cost for loading the data pointer can often just be amortized.

Again, thank you for that work. I’m looking forward to have that.

– Florian

encukou · October 11, 2022, 10:30am

The Limited API is a subset of the “unlimited” API, so this reason is rather important :‍)

Do you think it would be OK to store the offset on the class? You’d still need to load the type & the offset from it, but not the base. The important part is that the API would not change that way.
Do you have another possible way to do this, which would need a different API?

The PEP shows a naive implementation because:

The code nicely illustrates what’s happening (if you know how CPython does things currently), so it’s useful in the PEP even if the actual implementation ends up entirely different
I’d like to test an implementation that pythoncapi-compat, Cython or HPy can easily backport to earlier versions (using either non-limited API or a slow getattr).
I don’t know if the speedup worth adding an extra field to every type object.

encukou · October 19, 2022, 4:30pm

An initial implementation is in my extend-opaque branch. I’ll rebase it as I work on it.
It’s currently missing:

tests for Relative member offsets
PyType_GetSlot safeguard (see below)
docs for everything

Unsurprisingly, while writing it I found some errata for the PEP:

The slot name should be Py_slot_inherit_itemsize, since Py_tp_* slots should correspond to actual tp_* members of the type struct.
Using Py_slot_inherit_itemsize with PyType_GetSlot will raise SystemError.
The result of PyObject_GetTypeDataSize may be higher than requested by -basicsize (e.g. due to alignment). It is safe to use all of it (e.g. with memset).
[edit] Mention that basicsize == 0 always worked. I’ll add explicit docs & tests though.

wjakob · October 19, 2022, 9:37pm

Hi @encukou,

I’ve started playing around with this. Two first observations:

I think that there is a typo: Py_tp_inherit_itemsize is mentioned in the PEP, but it’s called Py_slot_inherit_itemsize in your branch. Edit: I see now that you actually mentioned this above.

After compiling your branch in debug mode and running the nanobind test suite in ABI3 mode, I get an assertion failure very early on when nanobind creates various internal types that are needed for the framework. Specifically, it creates a custom subclass of the property type that is internally used to implement static fields of C++ classes.

This type is defined as

static PyType_Slot nb_static_property_slots[] = {
    { Py_tp_base, &PyProperty_Type /* lookup for this pointer is actually done at runtime */ },
    { Py_tp_descr_get, (void *) nb_static_property_get },
    { 0, nullptr }
};

static PyType_Spec nb_static_property_spec = {
    .name = "nanobind.nb_static_property",
    .flags = Py_TPFLAGS_DEFAULT,
    .slots = nb_static_property_slots
};

In particular, itemsize and basicsize are zero in the PyType_Spec type. This now asserts in typeobject.c:3604:

* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #4: 0x0000000100237950 python3` PyType_FromMetaclass.cold.4  + 40 at typeobject.c:3604
   3601                     assert(spec->basicsize <= 0);
   3602                 }
   3603                 else {
-> 3604                     assert(spec->basicsize > 0);
   3605                 }
   3606             }
   3607             break;

This used to work. I am curious about what is the right way to handle this. Basically the storage layout is identical to property, all it does is to bend around Py_tp_descr_get to implement something differently.

wjakob · October 28, 2022, 3:41pm

Hi @encukou — any ideas regarding the above? I would love to track this down, it’s the last missing bit for Python 3.12 limited API bliss

encukou · October 31, 2022, 3:35pm

Sorry for the delay.

This is weird. Did “inheriting” the size by setting PyType_Spec.basicsize to 0 always work? AFAIK, it’s not documented or tested. Well, at least the PEP will add the docs/tests…

The assert failure is weird: that’s code for Py_tp_members, not descr_get. Are you sure it’s coming from initialization of nb_static_property?
… Oh, I see: the nb_static_property_slots is actually different than in your example – it also contains Py_tp_methods and Py_tp_members, which are copied from PyProperty_Type. Why can’t these be inherited from property?

encukou · October 31, 2022, 4:36pm

I’ve removed the assert from my branch – the PEP doesn’t call for it.
Still, I think it would be best to not rely on PyType_GetSlot(&PyProperty_Type, Py_tp_members) – in the future, property might decide to initialize its descriptors in a different way.

wjakob · November 1, 2022, 9:35am

Hi @encukou,

many thanks for looking into this. Here is a two-part response:

Would you find it reasonable to add the PyType_Spec.basicsize == 0 special case to the specification in your PEP? You’ve now added a way of extending the basicsize without knowing the size of the original type, but not wanting to extend that size represents an (IMO) important use case of such an API.

FWIW, nanobind has been doing that all this time down to its minimum supported version (Python 3.8). This has been tested on Linux/macOS/Windows by the CI & test suite. As far as I can tell, this would standardize behavior that is already functional in present Python version.
Why the hackery of looking up and then reassigning Py_tp_members and Py_tp_methods?

This is something I don’t quite understand myself, perhaps you have an idea? nanobind needs to create a custom property type to deal with some subtleties related to static class properties. But that is not even the issue, it seems to me that anything deriving from property using PyType_FromSpec is broken by default. Let me show you what I mean – let’s create a simple property with a dummy getter.
```
>>> property(lambda x:5)
```
(yay, no error message)

In contrast, here is a tiny CPython API extension module (not even using nanobind): inheritance_issue/inheritance_issue.c at master · wjakob/inheritance_issue · GitHub, which you can install with
```
pip install git+https://github.com/wjakob/inheritance_issue
```
All this extension does is to call PyType_FromSpec to create a new type inheritance_issue.my_property that inherits from property. If no docstring is specified via the doc=.. parameter (which is a case I need to handle), this now fails:
```
>>> import inheritance_issue
>>> inheritance_issue.my_property(lambda x:5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'nanobind.my_property' object attribute '__doc__' is read-only
```
In the C file inheritance_issue/inheritance_issue.c at master · wjakob/inheritance_issue · GitHub, there is an #ifdef FIX_ISSUE, which looks up tp_methods and tp_members from the base type and reassigns them. If you set this define and recompile, then it works.
```
>>> import inheritance_issue
>>> inheritance_issue.my_property(lambda x:5)
```
I would love to remove this workaround with a better solution. In any case, this perhaps provides more context on why it is there.

encukou · November 1, 2022, 12:22pm

It’s already there :‍)

That is curious. I’ll take a look. Thank you for the sans-nanobind reproducer!
It looks like the workaround isn’t needed for Py_tp_methods, could you start by removing that?

wjakob · November 1, 2022, 2:47pm

Great!

Indeed, I can delete the workaround for Py_tp_methods and the test suite still passes.

encukou · November 1, 2022, 2:57pm

I filed this as issue #98963. It’s a nasty one, due to the way the __doc__ descriptor interacts with the __doc__ class-level attribute :‍(
I guess your workaround is the best you can do now. Other options aren’t too appealing:

if we decide to ignore the AttributeError, do nothing (on fixed Python versions), but I don’t think that’s a great option for CPython
if you don’t mind nb_static_property objects being mutable:
- use Py_TPFLAGS_MANAGED_DICT if/when it makes its way into into the limited API
- before that, use tp_dictoffset, and allocate extra space for a dict pointer
add your own __doc__ member (and allocate space for the pointer)

wjakob · November 1, 2022, 9:36pm

Thanks for tracking this down! Adding a custom __doc__ member sounds like perfectly fine workaround. I don’t think the extra space matters at all in this case.

I have adapted nanobind to use the functionality in this PEP via your extend-opaque branch. The result is committed in nanobind’s limited_api branch (GitHub - wjakob/nanobind at limited_api, specifically the top two commits). The set of changes entails:

Creating the static property type using a negative basicsize along with a Py_tp_members for the __doc__ field that is referenced using the new PY_RELATIVE_OFFSET flag.
Using negative basicsize to add extra storage to instances of nanobind’s type metaclasses (nb_type, nb_enum).
Using PyObject_GetTypeData() to get access to said extra storage.
Generally refactoring the codebase a bit to rely on auto-inheritance of GC slots (tp_traverse, tp_clear) rather than manually copy-pasting them from parent to child classes via PyType_GetSlot and PyType_FromSpec.

With these changes, the test suite passes in with a debug build of nanobind and a debug build of your branch . I had to slightly cheat by cherry-picking the PR that made outgoing vector calls limited API-compatible on top of your changes.

It wasn’t 100% clear to me what the final convention for inheriting both basicsize and/or itemsize is. Can both be set to zero to request this? I am doing so now, and it appears to work. It’s possible that the PEP document is slightly out of sync with what is done in the PR – for example, the Py_tp_inherit_itemsize is mentioned in the PEP but does not appear in your branch.

There are also some places in the implementation where I still need to call PyType_GetSlot(). It was not clear to me to what extent that is a potential limited API violation. Just to give an example, let’s look at the basic example of a tp_dealloc function implementation from the CPython documentation: Type Objects — Python 3.11.0 documentation

static void foo_dealloc(foo_object *self) {
    PyTypeObject *tp = Py_TYPE(self);
    // free references and buffers here
    tp->tp_free(self);
    Py_DECREF(tp);
}

In the limited API, the tp->tp_free access isn’t legal, and one needs to use PyType_GetSlot(tp, Py_tp_free). Is that okay?

By the way: the main function of the PR I am not using in my changes is PyObject_GetTypeDataSize(), but it could of course still be useful for other applications.

encukou · November 2, 2022, 12:28pm

My current thinking is that Py_slot_inherit_itemsize (renamed from py_tp_*) will be required to inherit from variable-sized types using negative basicsize.

With basicsize > 0, the subclass presumably “knows” the superclass’s memory layout – including any vvariable part
With basicsize==0, the subclass either doesn’t use the underlying memory directly, or “knows” the layout.
With basicsize < 0, you can subclass an arbitrary class without knowing much about it. It’s dangerous to do this with tuple-like classes, so I’ll add a safeguard – requiring Py_tp_inherit_itemsize if the superclass has tp_itemsize>0.

It’s not. These are often cases where CPython could provide better API, and in the far future we might need to deprecate certain slots, but PyType_GetSlot itself is perfectly fine.

By the way: the main function of the PR I am not using in my changes is PyObject_GetTypeDataSize(), but it could of course still be useful for other applications.

That makes sense. Guess we could do without it, but it shouldn’t be a burden, and I prefer to expose introspection tools.

encukou · November 16, 2022, 12:36pm

Writing the docs for this, I found that the Py_slot_inherit_itemsize is hard to explain. That suggests it’s a bad idea.

So, I’m thinking about dropping the slot and going back to a type flag – something like this:

a class can set Py_TPFLAGS_ITEMS_AT_END to say its variable-sized data is at the end of the instance memory (rather than a fixed offset). The flag is inherited in subclasses.
if as class has tp_itemsize>0, then to subclass it with basicsize < 0 either:
- the superclass must have Py_TPFLAGS_ITEMS_AT_END set, or
- Py_TPFLAGS_ITEMS_AT_END must be given in the spec (here, the programmer is asserting that the base class layout is compatible, even though the base class doesn’t (yet) advertise it using the new flag).

This seems much friendlier – and easier to explain – to authors of both base and derived classes.

wjakob · November 17, 2022, 11:30pm

I can’t really imagine a case where a subclass would want to mess with the items themselves. To me it seems that itemsize must always be inherited so that the binary layout makes sense for both the parent and child class. The only thing the subclass can do is to re-use the item memory layout as is and potentially append to the basicsize, and that memory region is separate from the items.

Likely you have more advanced use cases in mind. I would be curious to hear about them.

encukou · November 21, 2022, 4:35pm

I can’t really imagine a case where a subclass would want to mess with the items themselves. To me it seems that itemsize must always be inherited so that the binary layout makes sense for both the parent and child class. The only thing the subclass can do is to re-use the item memory layout as is and potentially append to the basicsize, and that memory region is separate from the items.

All true, but that’s not the issue I’m trying to solve.
The API makes is easy to extend an arbitrary class – one you don’t know anything about. I’d like to make it hard to extend an incompatible class. Like a tuple, for example.
You should definitely always use PyTuple_* API to manipulate tuple contents, and not mess with the item size.

The issue is that you can’t extend tuple using the mechanism I’m proposing, because it stores the items in the place PyObject_GetTypeData expects the subclass’s data. And there’s currently no way a class can advertise whether it uses tuple-like layout (items at fixed offset), or a PyHeapTypeObject-like one (items at end). Only the latter works with the API I’m proposing.

encukou · December 1, 2022, 4:31pm

I’ve updated the PEP.

The changes include:

Use a flag, Py_TPFLAGS_ITEMS_AT_END, rather than a slot. This way the subclass doesn’t need to worry about items (if the superclass is set up right).
The result of PyObject_GetTypeDataSize may be higher than requested by -basicsize (e.g. due to alignment). It is safe to use all of it (e.g. with memset).
Mention that basicsize == 0 and itemsize == 0 already work. (I’ll add explicit docs & tests though.)

encukou · January 12, 2023, 3:19pm

Submitted to the SC: PEP 697 – Limited C API for Extending Opaque Types · Issue #161 · python/steering-council · GitHub