C-API, dynamic metaclass instances, and ABI stability


In NumPy, I have decided to go the MetaClass route to describe datatypes. There are various reasons for this, but generally, I like the outcome and system:

  • array.dtype is an instance of a DType (type/class). The DType (like a typical type) describes behaviour with respect to the possible values.
  • array.dtype additionally can store “parameters”:
    • Storage parameters (fixed length strings)
    • “homogeneous properties” that might be attached to a single value normally, but apply to all array elements. For example a physical unit.
  • Most importantly, the split clearly gives a clear level of abstraction: Users write new DTypes (class/types) and functions operate on arrays with certain DTypes (the class, the user provided functionality deals with the instances) (This is a multi-method like dispatching.)

This may be a bit different from Python, where dispatching based on the type directly is not common (multi-methods). But I don’t see much of an alternative: NumPy requires a bit more structure in its (d)types than most Python types.

This means, we have roughly:

class DType(metaclass=DTypeMeta):
    parametric : bool
    abstract : bool

    def __common_dtype__(cls, other : DTypeMeta) -> DTypeMeta:
        Returns a DType that can describe the values of both self
        and other (or return NotImplemented).
        # some logic.
        return cls

    def __dtype_setitem__(self, item_pointer : `char *`, value):
         """ C-level method, setting `item_pointer` to represent value. """

    # And some more functions/metadata.

There are a few classmethod, and others are methods, but in general I like the idea of using the type (and thus a metaclass instance) as the level of abstraction that the user can modify.

The tricky part

In the above the methods are fairly natural in C, and I want to have them easily and quickly available in C. NumPy has to call many of these commonly on the C-level.

I have done just that, DTypeMeta is a C-subclass and extends the (heap)type struct with additional slots. In a sense, I am adding my own slot fields (although right now, not as an nb_slots like pointer).

That is a bit awkward: The limited API PyType_FromSpec functions, do not support metaclasses that extends the type struct.

The Problem

I need to allow users to write new DTypes, ideally in C and dynamically. I am not worried about Python ABI stability (unless it concerns things like HPy).

However, I need users to subclass np.dtype and call a (PyType|PyDType)_FromSpec API provided by NumPy to “fill” in the DType specific slots. (Basically PyType_Ready, but with a FromSpec API.)

I have no yet figured out a good way to do this:

  • I can fix DTypeMeta.__basicsize__ by allocating an opaque void *npy_dtype_slots struct (so that I can extend it in the future). This should make things fairly clean for static types/DTypes. Add my own InitDType_FromSpec(...) which calls PyType_Read() for the user.
  • From Python, things seem OK: The metaclass can call type.__new__ which I think allocates the correct size. And then fill the slots (based on Python side definitions – which could be capsules).

Both of those together may just be good enough: Static declaration can do most things, and from Python dynamic declaration should be fine (or at least solvable). Some things might not work (or be ugly), but so be it.

But I am wondering if I am missing some better solution to all of this? Maybe there are some ways to avoid the whole problem, even a good pattern to completely extending the type struct. (I am aware that ABCs store things on a special slot, but it seemes a bit convoluted.)


  • One thing that I could probably try is to modify/fix PyType_FromSpec to gracefully allocate the larger struct size for metaclasses (that might allow me to create a function that calls PyType_FromSpec internally)? That might solve the problem of allowing dynamic creation of new DTypes from C (and it would be nice if users don’t have to use the static type API!).
  • I thought that I could allow the user to only create a “mixin” baseclass NewDType(user_mixin, np.dtype) created by NumPy. But that seems