Should __post_init__ be exempt from the LSP?

NeilGirdhar · November 27, 2023, 5:29am

Should __post_init__ be exempt from the LSP? After all, it is never called polymorphically. It is only ever called from the exact same class in which it is defined, and only ever called in one way.

from dataclasses import dataclass, InitVar
from typing import Any
from typing_extensions import override

@dataclass
class Top:
    def __post_init__(self) -> None:
        pass


@dataclass
class C(Top):
    c: InitVar[int]

    @override
    def __post_init__(self, *, c: int, **kwargs: Any) -> None:  # LSP violation.
        super().__post_init__(**kwargs)


@dataclass
class D(C):
    d: InitVar[int]

    @override
    def __post_init__(self, *, d: int, **kwargs: Any) -> None:
        super().__post_init__(**kwargs)

davidfstr · November 28, 2023, 1:45pm

What does this mean? What is “the LSP”?

Jelle · November 28, 2023, 2:04pm

The LSP is the Liskov substitution principle. I think what Neil wants concretely is that type checkers do not check __post_init__ for compatibility with superclasses. For example, running mypy on his example currently produces:

main.py:16: error: Signature of "__post_init__" incompatible with supertype "dataclass"  [override]
main.py:16: note:      Superclass:
main.py:16: note:          def __post_init__(self: C, c: int) -> None
main.py:16: note:      Subclass:
main.py:16: note:          def __post_init__(self: C, *, c: int, **kwargs: Any) -> None
main.py:25: error: Signature of "__post_init__" incompatible with supertype "dataclass"  [override]
main.py:25: note:      Superclass:
main.py:25: note:          def __post_init__(self: D, c: int, d: int) -> None
main.py:25: note:      Subclass:
main.py:25: note:          def __post_init__(self: D, *, d: int, **kwargs: Any) -> None

Similarly, Pyright says

Method "__post_init__" overrides class "Top" in an incompatible manner
  Parameter "c" is missing in base  (reportIncompatibleMethodOverride)

mikeshardmind · November 29, 2023, 12:20am

I’m of two minds here, but the super().__post_init__ use in the example shows why this should be checked in at least some fashion.

The way I’m leaning:

I don’t think any methods should be exempt from LSP fully. Ignoring __init__ , __call__ etc only when overriding the implementation provided by object or type seems reasonable since those are implementation details that come as a consequence of the object model, not of intent. Ignoring them when overriding other bases seems more problematic and there are known issues with some type checkers and being unable to actually check type[T] use.

The other idea, which wasn't thought out enough before pressing post

However, the latter issue with type[T] where T is object actually leads me to an idea I think may be worth exploring more:

Modify object and type such that the methods intended to have their types modified in subclasses accept arbitrary args and kwargs. This should be fully backwards compatible, it’s only allowing new behavior, but it removes an aful lot of special casing and a type system hole. This could be done entirely within typecheckers and typeshed if it’s too disruptive to do it in the actual objects themselves, the actual runtime behavior could be raising a type error still, raising an error is always allowed

I haven’t explored this for consequences in depth yet, as type checkers special casing these methods means it isn’t as simple as just check what would happen if that were the typeshed definitions.

Edit: annnnd, nevermind, I realized I didn’t think that through enough almost right after posting, that doesn’t work because then all subclasses of everything in existence need to be typed to accept arbitrary everything and error on not getting what they expect

NeilGirdhar · November 29, 2023, 2:30am

For people who are watching this thread and may not realize the pros and cons of LSP, consider:

class C:
    @classmethod
    def f(cls) -> None:
        pass

class D(C):
    @classmethod
    def f(cls) -> None:
        pass

def f(x: C,
      y: type[C]) -> None:
    x.f()  # X
    y.f()  # Y
    C.f()  # Z

If f obeys LSP:

X, Y, and Z can be checked using C.f’s annotation, and
D.f must be compatible with C.f.

If f does not obey LSP:

Only Z can be checked using C.f’s annotation, but
D.f need not be compatible with C.f.

As far as I know, __init__ and __new__ are the only methods that do not obey LSP currently. This fits with how __init__ is used: child classes often override it in incompatible ways, e.g., adding a required parameter. This is why it’s also impossible to type check:

def f(t: type[C]) -> C:
    t(1, 2, 3)  # Cannot be type checked since t.__new__ and t.__init__ are unknown.

On the other hand, every other method benefits from the type checking for cases X and Y.

The idea here is that __post_init__ doesn’t benefit from X and Y since it is never called explicitly. And in general it is often overridden in incompatible ways—which correctly triggers type checkers that expect it to follow LSP.

As an aside I’m definitely avoiding any proposal to allow users to opt out of LSP for certain methods. LSP violations nearly always indicate a design problem. Such an escape hatch is a lot more likely to be misused.