PEP 442 introduced the tp_finalize callback to Python type definitions (as a one-to-one equivalent of Pythons classes’ __del__ function, as far as I understand it), and recommends using this for any non-trivial destruction.
If tp_finalize is set, the interpreter calls it once when finalizing an instance.
However, this does not seem to be true for C-defined static Python types that do not define a custom deallocator (either directly or inherited from their base). From my understanding:
Static types with no custom tp_dealloc will inherit the deallocator from the base python type (PyBaseObject_Type), which is object_dealloc.
object_dealloc is extremely simple, it just calls the tp_free of the given object’s type.
Therefore, tp_finalize will never be called for these objects.
Types defined on the heap using PyType_FromSpec and similar will inherit by default the subtype_dealloc deallocator.
subtype_dealloc is much more complex and will call PyObject_CallFinalizerFromDealloc.
Questions
Assuming that I am understanding the current behavior of CPython correctly:
Is it expected that PyBaseObject_Type deallocator does not call tp_finalize?
If yes, what are the reasons for this exception?
And would it make sense then to expose subtype_dealloc as a generic dealloc callback for C-defined static types?
I don’t think this was entirely deliberate, but in general, if you have a trivial tp_dealloc, you’re unlikely to need a non-trivial tp_finalize, am I mistaken?
That said, I understand that in some cases you might want to just implement tp_finalize and let the default tp_dealloc call it for you.
I my use-case, I am trying to move some very low-level structures in Blender to non-trivial C++, i.e. they will now require constructor and destructor call. (WIP PR, Python-related changes are mainly in bpy_rna.cc - but this is a big and complex code area).
To make this work with their python binding (and also make current code a bit more sensible), the idea was to use:
tp_alloc: undefined, so default python allocator
tp_new: defined to create the python object (and other optimizations that do not matter here)
tp_init: defined to actually construct the python objects’ data (using placement new to initialize the wrapped C++ object)
tp_finalize: defined to ensure the wrapped C++ object is properly destructed
tp_dealloc: undefined, so default python destructor.
tp_free: undefined, so default python deallocator.
With current limitation regarding default deallocator of statically defined python types, tp_finalize is never called, and the C++ object destruction has to happen in a custom defined tp_dealloc, before manually calling tp_free.
I wonder if this is something we touched when we made static types immortal? (@eric.snow might know.)
It does sound like a bug, though it may be one of these cases where the impact of doing it causes more bugs than not doing it (i.e. it’s a lose-lose situation).
Is overriding tp_dealloc to call both tp_finalize and then tp_free out of the question? If it’s always your own finalize, then you can guarantee (manually) that you aren’t going to resurrect the object, which is the main complexity here.
Current approach in the patch is to move what ‘should be in’ tp_finalize into tp_dealloc, this feels both simpler and safer. Especially since these C+±defined static types can be used as bases of Python-defined types…
If your tp_finalize only deallocates C++ objects, then it should not be a problem to move the deallocations into tp_dealloc. The main reason for tp_finalize is if you want to do arbitrary Python things with self, for example call a Python method on it.