Trying to understand exact behavior of interpreter for a 'class' statement

AndySomogyi · February 29, 2020, 9:02pm

Hi, I’m trying to understand the exact code that interpreter runs, I’m working on writing some extensions, using the C API that provide meta classes.

When the interpreter encounters a ‘class’ statement, does it end up calling

static PyObject *
type_call (PyTypeObject *type, PyObject *args, PyObject *kwds)

in typeobject.c?

I’m assuming the interpreter parses a ‘class’ statement, packages up all the items it finds, and then presumably it calls the tp_call of the PyType_Type object. Would this be correct?

thanks

ammaraskar · March 1, 2020, 12:37am

The best bet in terms of finding what the interpreter does for a particular piece of Python code is to first start by disassembling the exact opcodes. For example:

>>> def f():
...   class A:
...     def __init__(self):
...        pass
...     def a_method(self):
...        pass
...
>>> import dis
>>> dis.dis(f)
  2           0 LOAD_BUILD_CLASS
              2 LOAD_CONST               1 (<code object A at 0x03A01910, file "<stdin>", line 2>)
              4 LOAD_CONST               2 ('A')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               2 ('A')
             10 CALL_FUNCTION            2
             12 STORE_FAST               0 (A)
             14 LOAD_CONST               0 (None)
             16 RETURN_VALUE

Disassembly of <code object A at 0x03A01910, file "<stdin>", line 2>:
...

Disassembly of <code object __init__ at 0x03A016B8, file "<stdin>", line 3>:
...

Disassembly of <code object a_method at 0x03A01640, file "<stdin>", line 5>:
...

So we can see for the A class, it’s using the LOAD_BUILD_CLASS opcode to load up some sort of build class function and then calling with A's code object and name.

Now let’s go to ceval.c and see what LOAD_BUILD_CLASS is doing exactly: https://github.com/python/cpython/blob/0b0d29fce568e61e0d7d9f4a362e6dbf1e7fb80a/Python/ceval.c#L2169-L2198

    bc = _PyDict_GetItemIdWithError(f->f_builtins, &PyId___build_class__);
    if (bc == NULL) {
        if (!_PyErr_Occurred(tstate)) {
            _PyErr_SetString(tstate, PyExc_NameError,
                                "__build_class__ not found");
        }
        goto error;
    }

Alright so it loads up the __build_class__ builtin and then that is what gets used. Next let’s look at the implementation for __build_class__ in bltinmodule.c: https://github.com/python/cpython/blob/0b0d29fce568e61e0d7d9f4a362e6dbf1e7fb80a/Python/bltinmodule.c#L102

and here is where we find the bulk of the machinery for how classes get built and where your answer will actually lie. We see that when a metaclass isn’t explicitly provided it uses PyType_Type as the metaclass:

    if (meta == NULL) {
        /* if there are no bases, use type: */
        if (PyTuple_GET_SIZE(bases) == 0) {
            meta = (PyObject *) (&PyType_Type);
        }

and then what does it do with this meta object? Well after doing some checks for the __prepare__ mechanism, it ends up calling the meta object:

        PyObject *margs[3] = {name, bases, ns};
        cls = PyObject_VectorcallDict(meta, margs, 3, mkw);

and like you’ve theorized, this ends up going to PyType_Type's tp_call slot which is filled with type_call.

AndySomogyi · March 2, 2020, 6:05pm

Thanks, big help.

I’m trying to figure out how to fill out the PyTypeObject for a statically defined C API type that has static methods and properties on the type. Seeing how the interpreter does it, I can probably figure it out looking at the type_call function.

Looks like I’ll need one PyTypeObject to define the main instance type, and another PyTypeObject to describe the type itself.

Basically what I want is a type with static properties, and allow users to subclass this type. And each subclass to have it’s own copy of the static methods. i.e. something like this:

""" base class, actually defined using C API in C library"""
class Foo:
    var = 0

class DerivedA(Foo):
    var = 1

class DerivedB(Foo):
    var = 2


>>> print(Foo.var)
>>> 0
>>> print(DerivedA.var)
>>> 1
>>> print(DerivedB.var)
>>> 2