Heap type with base type: what about tp_dealloc?

NicolasT · September 9, 2024, 11:33pm

Hello,

In a PoC I’m working on, I’d like to use the Py_mod_create slot in a PyModuleDef to return a module that’s of a custom type, subclassing PyModule_Type. A value of this type can, then, contain extra data that can then be used by Py_mod_exec, m_free and similar functions, whilst retaining special-cased support for PyModule_Type modules in the import machinery.

I started implementing this type as a heap-allocated type (trying to target the limited API), using a static PyType_Spec. Since the object structure of moduletype isn’t public, I use a negative basicsize in the spec. Finally, I use PyType_FromSpecWithBases(&spec, (PyObject *)&PyModule_Type) to instantiate the type object.

This works, but I’m puzzled about implementing the Py_tp_dealloc slot for my type, for two reasons, both related to the fact I need to (also) call PyModule_Type->tp_dealloc from my custom tp_dealloc function (unless I’m mistaken and this wouldn’t be required?!):

module_dealloc calls PyObject_GC_UnTrack. I didn’t set Py_TPFLAGS_HAVE_GC in my custom PyType_Spec, but maybe that’s be required given the base type has it? If I need the flag, I should also call PyObject_GC_UnTrack before doing anything else in my tp_dealloc hook (including before calling PyModule_Type->tp_dealloc). Would it be an issue if the function gets called twice, then? The docs aren’t conclusive about this.
Given this is a heap type, I have to Py_DECREF the type of an instance in tp_dealloc. Currently PyModule_Type is not a heap type, so I should not expect it to do so, but in some later implementation it might be transformed into a heap type, and then I should not decref the object’s type in tp_dealloc (otherwise the type gets decref’ed twice). What’s a common pattern to be “future-proof” here? Should I check the flags of PyModule_Type for the Py_TPFLAGS_HEAP type, and if it is, assume it’ll perform the decref for me?

Thanks in advance!

encukou · September 13, 2024, 1:04pm

If you only need extra data, you can set PyModuleDef.m_size; you shouldn’t need a module subclass.

I didn’t set Py_TPFLAGS_HAVE_GC

Py_TPFLAGS_HAVE_GC is inherited from the base. A module subclass will have GC.

I should also call PyObject_GC_UnTrack before doing anything else in my tp_dealloc hook

Currently, PyObject_GC_UnTrack is idempotent. You can call it, clear out stuff, and then call PyModule_Type->tp_dealloc.
I’m not sure how things should be, but, I don’t think we can change this detail now and so we should probably document/guarantee it.

Should I check the flags of PyModule_Type for the Py_TPFLAGS_HEAP type, and if it is, assume it’ll perform the decref for me?

Yes. Sorry for the inconvenience.

storchaka · September 13, 2024, 1:21pm

If your data includes references to other Python objects, you should also implement tp_travel and can also implement tp_clear (the latter is optional if you are sure that this will not create closed loops that can’t be broken at other link).

Consider also using m_free, m_traverse and m_clear slots instead of creating a new module subclass.

NicolasT · September 13, 2024, 6:34pm

Thanks! No worries about the “inconvenience”, I don’t mind, just wanted to make sure i’m “doing it right”.

I can’t use m_size since

I need to run some code in the module’s tp_new, and some related code in tp_dealloc, and
What’s stored as extra data needs to be in-place before the Py_mod_exec function(s) are called (the extra data contains something-alike-a-function-pointer to “execute” the module, which I need to invoke in a (C) Py_mod_exec function). I guess I could do that after a call to PyModule_FromDefAndSpec in the create_module method of my loader, but that feels… not great. In a sense, this data isn’t related to the module (implementation) itself, but to the implementation of this “framework”.