PEP 697 – Limited C API for Extending Opaque Types

It’s already there :‍)

That is curious. I’ll take a look. Thank you for the sans-nanobind reproducer!
It looks like the workaround isn’t needed for Py_tp_methods, could you start by removing that?

1 Like

Great!

Indeed, I can delete the workaround for Py_tp_methods and the test suite still passes.

I filed this as issue #98963. It’s a nasty one, due to the way the __doc__ descriptor interacts with the __doc__ class-level attribute :‍(
I guess your workaround is the best you can do now. Other options aren’t too appealing:

  • if we decide to ignore the AttributeError, do nothing (on fixed Python versions), but I don’t think that’s a great option for CPython
  • if you don’t mind nb_static_property objects being mutable:
    • use Py_TPFLAGS_MANAGED_DICT if/when it makes its way into into the limited API
    • before that, use tp_dictoffset, and allocate extra space for a dict pointer
  • add your own __doc__ member (and allocate space for the pointer)
2 Likes

Thanks for tracking this down! Adding a custom __doc__ member sounds like perfectly fine workaround. I don’t think the extra space matters at all in this case.

I have adapted nanobind to use the functionality in this PEP via your extend-opaque branch. The result is committed in nanobind’s limited_api branch (GitHub - wjakob/nanobind at limited_api, specifically the top two commits). The set of changes entails:

  • Creating the static property type using a negative basicsize along with a Py_tp_members for the __doc__ field that is referenced using the new PY_RELATIVE_OFFSET flag.
  • Using negative basicsize to add extra storage to instances of nanobind’s type metaclasses (nb_type, nb_enum).
  • Using PyObject_GetTypeData() to get access to said extra storage.
  • Generally refactoring the codebase a bit to rely on auto-inheritance of GC slots (tp_traverse, tp_clear) rather than manually copy-pasting them from parent to child classes via PyType_GetSlot and PyType_FromSpec.

With these changes, the test suite passes in with a debug build of nanobind and a debug build of your branch :tada:. I had to slightly cheat by cherry-picking the PR that made outgoing vector calls limited API-compatible on top of your changes.

It wasn’t 100% clear to me what the final convention for inheriting both basicsize and/or itemsize is. Can both be set to zero to request this? I am doing so now, and it appears to work. It’s possible that the PEP document is slightly out of sync with what is done in the PR – for example, the Py_tp_inherit_itemsize is mentioned in the PEP but does not appear in your branch.

There are also some places in the implementation where I still need to call PyType_GetSlot(). It was not clear to me to what extent that is a potential limited API violation. Just to give an example, let’s look at the basic example of a tp_dealloc function implementation from the CPython documentation: Type Objects — Python 3.11.0 documentation

static void foo_dealloc(foo_object *self) {
    PyTypeObject *tp = Py_TYPE(self);
    // free references and buffers here
    tp->tp_free(self);
    Py_DECREF(tp);
}

In the limited API, the tp->tp_free access isn’t legal, and one needs to use PyType_GetSlot(tp, Py_tp_free). Is that okay?

By the way: the main function of the PR I am not using in my changes is PyObject_GetTypeDataSize(), but it could of course still be useful for other applications.

My current thinking is that Py_slot_inherit_itemsize (renamed from py_tp_*) will be required to inherit from variable-sized types using negative basicsize.

  • With basicsize > 0, the subclass presumably “knows” the superclass’s memory layout – including any vvariable part
  • With basicsize==0, the subclass either doesn’t use the underlying memory directly, or “knows” the layout.
  • With basicsize < 0, you can subclass an arbitrary class without knowing much about it. It’s dangerous to do this with tuple-like classes, so I’ll add a safeguard – requiring Py_tp_inherit_itemsize if the superclass has tp_itemsize>0.

It’s not. These are often cases where CPython could provide better API, and in the far future we might need to deprecate certain slots, but PyType_GetSlot itself is perfectly fine.

By the way: the main function of the PR I am not using in my changes is PyObject_GetTypeDataSize(), but it could of course still be useful for other applications.

That makes sense. Guess we could do without it, but it shouldn’t be a burden, and I prefer to expose introspection tools.

2 Likes

Writing the docs for this, I found that the Py_slot_inherit_itemsize is hard to explain. That suggests it’s a bad idea.

So, I’m thinking about dropping the slot and going back to a type flag – something like this:

  • a class can set Py_TPFLAGS_ITEMS_AT_END to say its variable-sized data is at the end of the instance memory (rather than a fixed offset). The flag is inherited in subclasses.
  • if as class has tp_itemsize>0, then to subclass it with basicsize < 0 either:
    • the superclass must have Py_TPFLAGS_ITEMS_AT_END set, or
    • Py_TPFLAGS_ITEMS_AT_END must be given in the spec (here, the programmer is asserting that the base class layout is compatible, even though the base class doesn’t (yet) advertise it using the new flag).

This seems much friendlier – and easier to explain – to authors of both base and derived classes.

1 Like

I can’t really imagine a case where a subclass would want to mess with the items themselves. To me it seems that itemsize must always be inherited so that the binary layout makes sense for both the parent and child class. The only thing the subclass can do is to re-use the item memory layout as is and potentially append to the basicsize, and that memory region is separate from the items.

Likely you have more advanced use cases in mind. I would be curious to hear about them.

1 Like

I can’t really imagine a case where a subclass would want to mess with the items themselves. To me it seems that itemsize must always be inherited so that the binary layout makes sense for both the parent and child class. The only thing the subclass can do is to re-use the item memory layout as is and potentially append to the basicsize, and that memory region is separate from the items.

All true, but that’s not the issue I’m trying to solve.
The API makes is easy to extend an arbitrary class – one you don’t know anything about. I’d like to make it hard to extend an incompatible class. Like a tuple, for example.
You should definitely always use PyTuple_* API to manipulate tuple contents, and not mess with the item size.

The issue is that you can’t extend tuple using the mechanism I’m proposing, because it stores the items in the place PyObject_GetTypeData expects the subclass’s data. And there’s currently no way a class can advertise whether it uses tuple-like layout (items at fixed offset), or a PyHeapTypeObject-like one (items at end). Only the latter works with the API I’m proposing.

1 Like

I’ve updated the PEP.

The changes include:

  • Use a flag, Py_TPFLAGS_ITEMS_AT_END, rather than a slot. This way the subclass doesn’t need to worry about items (if the superclass is set up right).
  • The result of PyObject_GetTypeDataSize may be higher than requested by -basicsize (e.g. due to alignment). It is safe to use all of it (e.g. with memset).
  • Mention that basicsize == 0 and itemsize == 0 already work. (I’ll add explicit docs & tests though.)
3 Likes

Submitted to the SC: PEP 697 – Limited C API for Extending Opaque Types · Issue #161 · python/steering-council · GitHub

4 Likes

Great, I am happy to see this submitted. I have one clarification with regards to the decision tree lines

Py_TPFLAGS_ITEMS_AT_END used: itemsize is inherited.
Py_TPFLAGS_ITEMS_AT_END not used: Fail. (Possible conflict.)

This flag could be specified in two different places (spec and base). Does it need another sub-decision tree, or is it okay if either class has the flag?

It’s okay if either class has the flag. It’s in the previous section, so I hope the simple “used” is OK for the big-picture decision tree.

2 Likes

Here’s my belated critique of PEP 697.

In summary

Overall, I agree that this is a problem that needs solving, and the idea
of decoupling the structs needed for extension classes from the object layout
is important.

However, the details are too tied to the current VM implementation and will hurt
our ability to improve the VM in the future.

The overhead of using this API is likely to be too large for tools like Cython and
MyPyC, as well as extensions like NumPy. If the API isn’t good enough to implement tuple or list using it, then it is unreasonable to expect third-parties to use it as well.

Detailed criticisms

The section of variable-size layouts is too tied to the existing implementation, which we are trying to move away from.
We already moved the __dict__ to a fixed offset in Python objects, which allows much faster attribute access.
The only reason we can’t do that for variable sized objects like subclasses of tuple, is the constraints imposed by the C API. Let’s not make the situation worse.

The decision tree for layout seems too complicated. This is likely to lead to errors in understanding, implementation, or both.

PyObject_GetTypeData and PyType_GetTypeDataSize should be defined by what they do, not how they are implemented.
In other words, instead of defining PyObject_GetTypeData(obj, cls) as return (char *)obj + _align(cls->tp_base->tp_basicsize);
it should be defined as “returns a pointer to the struct for class cls within the object obj”.

The parts I like

The Py_RELATIVE_OFFSET is good, as it decouples field access from the absolute layout, which is what we want.
Putting the offset calculations behind an API is also a step forward.

What do I suggest instead?

Something like this:

Grand Unified Python Object Layout · Issue #553 · faster-cpython/ideas · GitHub

Right, extensions that mess with the internals for speed aren’t the intended use case. But there are plenty of extensions that can trade a few nanoseconds for safety & consistency.

I think the API could be good enough for most uses of list/tuple. But doing that would mean changing their layout, which would be a much bigger change.

That’s mostly because the PEP explains how the API ties into the current implementation. If you only look at the proposed API, I hope you’ll find it a reasonable abstraction. For example, if your Under Grand Unified Python Object Layout comes to be, it shouldn’t be hard to adapt the proposed API to it.

Similar situation here. The status quo is complex.
The leaves of the tree tell you how the proposal deals with each case, but the structure of the tree is mostly status quo – what to think of when you try to extend arbitrary classes today.

Yes, they should be, and I tried to make that clear with this disclaimer:

In the code blocks below, only function headers are part of the specification. Other code (the size/offset calculations) are details of the initial CPython implementation, and subject to change.

Saying what they do with the current class layout makes the proposal much more understandable for readers who are already familiar with the current layout. That’s the only reason the function bodies are in the PEP.

For example, if your Under Grand Unified Python Object Layout comes to be, it shouldn’t be hard to adapt the proposed API to it.

Almost.
Py_TPFLAGS_ITEMS_AT_END flags specifies the layout (at least the name implies it does).
The PEP says:

Py_TPFLAGS_ITEMS_AT_END, which will indicate the PyHeapTypeObject-like layout.

What does “PyHeapTypeObject-like” mean here?
Is defining the placement necessary, or is it just saying that the struct data is variable sized?
Would Py_TPFLAGS_VARIABLE_SIZED_STRUCT be OK?

One other thing I overlooked is the reference to the “base class” in the PEP. The problem with that is it assumes single-inheritance. By defining the size of the object, rather than the size of the struct, multiple inheritance becomes impossible (or somewhat fragile, at least):

Suppose we have a class C1 that needs a struct S1 and a class C2 that needs a struct S2, both defined to inherit from object.
Now define two classes,

class D(C1, C2): pass
class E(C2, C1): pass

To compute the offsets of S1 and S2 in D() and E() we need to reverse the computation used to get basicsize in the first place.

If the extension defines the size, itemsize and alignment of the struct, not the class, and leave placement to the VM, then things are a lot more robust.

Yes, the flag names a specific pre-existing layout, so the interpreter can know that a class uses it. If a better layout is added in the future, this one can be phased out (along with the name that the PEP adds for it).

Items at end, rather than at a fixed offset (the second set of diagrams).

No, that’s would imply just tp_basicsize > 0.

That’s, again, part the status quo that the PEP does not try to change. When C structs are involved, you’re currently limited to single inheritance.
However, the new API should be compatible with leaving placement to the VM. Again, don’t look at the proposed function bodies, those are for the current CPython and subject to change.

The Python Steering Council has reviewed PEP 697 and is accepting it pending a couple things to clarify in the text:

(1) We’re accepting the negative spec->basesize. The PEP text still lists that as an Open Issue to be resolved. Please mark it as such and just mention the flag alternative under “Rejected Ideas”.

Rationale: The meaning of this internal field is changed by this PEP regardless so if there were code accessing these internals without knowing about the special negative or about an equivalent flag, the runtime consequences would be similar: memory or data corruption. We think the negative is a practical approach. The implementation detail will remain hidden within CPython internals.

One caveat: We suspect some debugger related code that does directly use at internal values might need to be updated. Take a look at Tools/gdb/libpython.py when making your PR; that might need updating? If users have debugging tools that do the same, they’ll need to do similar but as these are internals this is a normal kind of change to expect to adapt to within a release.

(2) While the current PEP-697 text ultimately covers the details of its specific scope. It isn’t immediately obvious to the reader in the Abstract the scope limitations that non-heap-allocated data variable size types are intentionally excluded.

While re-reading a few times so I could write this up, I noticed some typos and realize that I should just propose edits to take care of both suggestions above along with those anyways so… Cleanup PEP-697 before PSC acceptance. by gpshead · Pull Request #3041 · python/peps · GitHub

Thanks for your work on this and writing the PEP!

1 Like

Thank you!
To clarify, does this mean the PEP is accepted now that I merged your PR?

It’ll remain hidden in the internals of whatever code creates types, not just CPython.
PyType_Spec is used to create types, but can’t be retrieved from a PyTypeObject. Usually it’s defined as a static (immortal and local to one source file), or on the heap (so the spec doesn’t outlive the type created from it).
If some extension decides to make its type specs public for some reason – well, it doesn’t need to switch to negative basicsize if that would break its users.

Negative values will only appear in spec->basicsize. When the actual type is created, type->tp_basicsize is set to the computed positive value, so introspection tools (including Tools/gdb/libpython.py) won’t be affected.

1 Like

To clarify, does this mean the PEP is accepted now that I merged your PR?

Yep! Accepted!

1 Like