Return child object when parent methods are called

I am inheriting DataFrame object in PySpark and adding new functions to it.

It works fine if the newly added functions are called on the object. But the parent class has been defined in a such way to return an object of Parent class whenever its functions are called. So, as soon as I call a parent class method from child object, the parent object is returned which means my child class functions won’t work any more.

Python builtins behave like this too, and it makes subclassing a real PITA.

It is a poor design. Personally, I call it a bug, but others may disagree. You could try reporting it to the PySPark project and asking them to fix the problem. It usually happens because classes do something like this:

class SomeClass:
    def method(self, arg):
        # Return a new instance.
        obj = SomeClass()  # No! Wrong! This is BAD! Don't do this!!! BAD BAD BAD!!!
        # Should be this:
        obj = type(self)()  # Yes! Subclass friendly!
        obj.something = whatever
        return obj

As horrible, painful and silly as it seems, the only thing you can do is wrap every single method from the parent:

class MyDataFrame(DataFrame):
    # Do this for every method that returns a data frame.
    def method(self, arg):
        return type(self)(super().method(arg))
1 Like

Overriding every method is what I thought too, but I believed there might be better solution! :laughing: