Enforcing init signature when implementing it as an abstractmethod

oscarbenjamin · December 30, 2024, 6:39pm

What I mean here is something like this although really only one decorator should be used:

from typing import subclass_compatible, not_subclass_compatible

class Base:
    @subclass_compatible
    @classmethod
    def compat(cls) -> None:
        pass
    @not_subclass_compatible
    @classmethod
    def incompat(cls) -> None:
        pass

class Sub(Base):
    # type check error (not compatible with Base)
    @classmethod
    def compat(cls, a: str) -> None:
        pass
    # This is fine (does not need to be compatible)
    @classmethod
    def incompat(cls, a: str) -> None:
        pass

def func(typ: type[Base]):
    typ.compat() # fine
    typ.incompat() # type check error (incompatible method)

Decorators could be used for any class method or __new__ or __init__. Then a type checker can distinguish which methods are supposed to be compatible across subclasses. That makes it possible for the type checker to do two things:

Enforce that the method is always compatible in subclasses.
Assume that the method is safe to call when given a type[Base].

The current situation is that type checkers assume that all class methods and __new__ and __init__ are subclass-compatible and safe to call but then do not enforce that compatibility for __new__ and __init__ leaving the soundness gap. Having an explicit way to distinguish whether these methods are intended to be compatible allows closing the gap.

There are situations besides __new__ and __init__ where I have class methods that are not intended to be compatible but type checkers do not understand this and complain. Currently the only option for this is hundreds of type: ignore.

NeilGirdhar · December 30, 2024, 6:44pm

In this case, I suggest that you give your factory functions different names so that they don’t collide. Or, if you really need them to have the same name, type them as

class C:
  @classmethod
  def create(cls, **kwargs: Any) -> Self:
    ...

Hmm, we’ve obviously had very different experiences. I’ve also thought this more than a few times, but I think this exception is a bad idea.

First of all, giving people any escape hatch (the decorator you propose) for LSP would create problems where the escape hatch is abused, and with unexpected type results when it is used.

But more importantly, I haven’t seen a big need for this in practice. The only time it came up for me is with some class factories that I wanted to call create. I ended up just giving them different names. Seems a small price to pay in order to protect LSP.

As an aside, if I were just brainstorming on a more ideal solution—not something I’m proposing—it would be for your @factory decorator to exist, but for it to have more effects:

the decorated method is not checked for LSP,
the method is only available when it’s called with an explicit class (but this is difficult to enforce), and
the method isn’t inherited.

So, something like A.create(foo, bar), but for B < A, B.create doesn’t exist. You have to explicitly create that if you want it.

As far as the type checker is concerned, create may as well be a free function A__create. This eliminates the LSP problems.

But if we agree that this works, then I think you can see why I don’t think there’s a problem: you should have just created the free functions in the first place instead of tucking them in with the class.

dcreager · December 30, 2024, 7:37pm

Yep, to clarify, I was proposing that you’d have to be explicit about the parameters you expect to pass in:

def fn(typ: Callable[[int, int], T]) -> T:
    return typ(1, 2)

CallableAs would be nice sugar, as you suggest.

Michael H:

well… no, because that would mean we’re intentionally separating it such that given x = X(), type(x) doesn’t mean CallableAs[X] that breaks things like the below:
class Point[T: Coordinates]:
    def __init__(self, coords: T):
        self.coordinates: T = coords
    def __add__(self, other: Self) -> Self:
        return type(self)(self.coords + other.coords)

That’s a good example, thank you! In a world where type[T] is not callable, can you model this using a protocol? I.e. __add__ expects its self parameter to implement

class PointConstructable(Protocol):
    def __class__(self) -> Callable[[T], Self]: ...

I don’t think we can add this as a constraint just for __add__, so it would percolate up and become an additional constraint on the Point class as a whole. And if we also enforce that protocol implementation for subclasses, that has the effect of prohibiting subclasses that e.g. override __init__ in an incompatible way. (Maybe that’s assuming too much that the type function delegates to a __class__ method?)

mikeshardmind · December 30, 2024, 7:49pm

Even if we do all of that (and I don’t think we should, I think we should just stop having LSP exceptions), there would still be consequences for HKTs and generic type var bounds, unless we forbid the use of those with any type where __class__ isn’t CallableAs[Self], which we can’t do in a gradual type system, because Any exists.

And then, even if we make type[X] not imply the hypothetical CallableAs[X], we’re now taking validly typed code (anything that does type(self)(...)) and telling people to change the typing of it, rather than telling the people who have unsound code to fix theirs, all while adding more complexity and exceptions.

pf_moore · December 30, 2024, 8:10pm

This is my understanding as well. The LSP says that you should be able to substitute an instance of a type with an instance of any subtype of that type, and have things work. It’s not (as far as I know) intended to say that the types are substitutable, just that instances are.

So in particular, the LSP doesn’t say that classmethods (or __init__ in particular) must have the same signatures. I’m assuming that contortions like obj.__class__.some_classmethod are exceptions, in much the same way that explicitly accessing __mro__ would be…)

I don’t think that demanding that classmethods (and in particular __init__) have the same signature in subclasses as they do in superclasses is practical, to be honest. Enforcing that in type checkers won’t make the language any sounder, it will simply mean that people work around the problem, likely via # type: ignore. Which makes things worse, not better.

mikeshardmind · December 30, 2024, 8:40pm

Is it really that hard for people to add unused *args: object, **kwargs: object to an __init__ if they intend subclasses to extend the args/kwargs taken? Why would someone add a type ignore instead of doing this?

Is it necessarily worse that someone adds a type ignore when they have something unsound where they can’t fix the root so that at least there’s a signal there’s unsoundness that’s known about? (reminder, right now that same unsoundness exists without warning for everyone, and issues like this keep coming up where people want this checked, to the point that there are all sorts of suggestions about abusing callable instead)

And yet the exceptions prevent this, and the “solution” people are pushing is to make people write completely unobvious things to fix it, and to force people with well-typed sound code to change their typings/way of doing things, rather than acknowledge that the foundation is rotten, but fixable, and have the people with unsound code change their typings.

The LSP violations do make more than just types unsafe (mentioned above with __replace__), the specific case causing this one to get excluded is dataclasses having a generated __replace__, as well as prevent valid code from being checked properly. It’s not only valid to use type(self)(...) in methods that are meant to “just work” in subclasses, but this is an idiomatic way to ensure things like operators work with retaining subclasses. real world examples of this exist and none of the “alternatives” people are suggesting treat this as valid.

The type system is also more brittle than it should be because of this, it is actively going to prevent at least two of the most requested things for python’s type system from being able to have a well-defined meaning.

ntessore · December 30, 2024, 9:03pm

In my experience, uses of type(self)(...) or self.__class__(...), while indeed idiomatic at this point, are almost (or maybe even fully) exclusively what the new copy.replace(...) operation is meant to replace. So can __replace__ not be made sound instead, given that it has the inherently much laxer **changes signature?

mikeshardmind · December 30, 2024, 9:09pm

No, because generated __replace__ methods exist that correspond (loosely, more so with the body annotations) with __init__ due to dataclasses, type checkers are already planning on excluding it, see mypy maintainer comment here: Dataclasses, `__replace__` and LSP violations · Issue #18216 · python/mypy · GitHub

It’s cascading issues all over the place rather than just telling people how to safely enable subclasses with different parameters.

NeilGirdhar · December 30, 2024, 9:19pm

Hmm, I don’t think that’s right. After all, subclasses can be passed to functions expecting a superclass:

class T:
    @classmethod
    def f(cls) -> None:
        print("okay")

class U(T):
    pass

class V:
    pass

def f(t: type[T]) -> None:
  t.f()

f(U)  # Okay.
f(V)  # Not okay.

Or, in other words, the rule you gave applies to types:

In this case the type is type[T], and T is an instance of it, and so is U, but V is not.

alexw · December 30, 2024, 10:19pm

OP here, thanks to everyone for the discussion so far (and especially @ImogenBits and @mikeshardmind for suggesting work-arounds), this has been really informative to read as a long-time Python user but a relative novice to the nuances and history of the typing system.

I’m wondering if it would be feasible to add a type-checker-level feature to give users some assistance and options without necessarily reinterpreting or changing the typing spec one way or the other. I was thinking this could be done with a flag along the lines of # type: ignore, but in this case adding an additional check at the author’s explicit instruction rather than the spec’s. Repurposing the example from my first post, it would look something like the following:

from abc import ABC, abstractmethod

class AbstractA(ABC):
    #type: require_init x: int
    #type: require_init y: int
    
    @abstractmethod
    def __init__(self, x: int, y: int):
        pass
        
class RealA(AbstractA):
    def __init__(self, x: int):  ## Would have a type checker error
        self.x = x

inst = RealA(3)  ## Could have an error, maybe passthrough the error above?

I’m thinking here about the use case I mentioned above where AbstractA might be a library function and RealA is the current user’s code. Having the type checker at least say something would be a lot more helpful to that user than waiting for them to get an AttributeError at runtime and then need to dig into code they didn’t write to track it down.

On the other hand, I can see this causing confusion since the user would technically be getting errors they shouldn’t be getting according to the spec and previous community decisions. Perhaps it could be just a warning instead with language that clearly indicates it results from the author’s affirmative choice when writing the abstract class. I’d have to think a little about what that text would be.

There were valid concerns mentioned above about #type: ignore, and I agree with that, I hate every time I use it because it feels like duct-taping over something I should be fixing more robustly. In this case however, what I’m suggesting doesn’t really mask ambiguity as much as it highlights that ambiguity while giving the user an immediate option to improve the robustness of their code. And regardless of what is or isn’t decided in the future, deprecating a feature like this in type checkers themselves would be a lot easier than implementing spec-related decisions IMO.

pf_moore · December 30, 2024, 10:39pm

They won’t necessarily know that client code will want to subclass them, much less that the subclass will want to have a different signature. So all that will happen is that everyone will cargo-cult *args: object, **kwargs: object to every __init__.
It’s incorrect - the class __init__ doesn’t take extra args, so saying that it does is misrepresenting the class signature.

My argument is that this isn’t unsound - you’re incorrectly applying the LSP to types rather than to type instances. Although I’ll admit I’m not a typing expert, so if the LSP is meant to apply to types as well as instances, then I’d appreciate links to documentation that explains how that works.

OK, but doesn’t that (merely) mean that a class which uses this idiom should declare (somehow) that it requires subclasses to leave the signature of __init__ unchanged? In the absence of type annotations, you’d have to document that restriction (as it’s not a requirement of the object model, but rather a requirement of your code), so I don’t see why, in the typing system, this shouldn’t require an explicit annotation.

Disclaimer: I don’t have a direct need for any of this, my interest is basically theoretical, as I’m not comfortable when the type system insists on principles that invalidate perfectly idiomatic Python code. I’m happy if the situation is merely “the type system can’t express that, so you’ll have to leave it untyped”, but not when it gets expressed as “your code is unsound” or similar, suggesting that there’s something wrong with the code as opposed to limitations in the type system.

mikeshardmind · December 31, 2024, 1:11am

It’s not incorrect to apply it here because types are just objects in python too. There’s specifically a carved out set of exemptions from LSP in existing type checkers “pragmatic” reasons, that happen to prevent the checking some people want here because without those exemptions, this checking would just happen automatically. LSP as a broad principle says that if you swap a type for a subtype, all code that was valid before the swap remains valid afterward. It’s what allows things like:

class A:  ...
class B(A): ...

def foo(x: A) -> None:  ...
foo(B())  # the idea that subtypes are valid substitutes is what allows this

The actual specification points out that constructor calls for type[T] should be evaluated as if it has the signature of T’s constructor

When a value of type type[T] (where T is a concrete class or a type variable) is called, a type checker should evaluate the constructor call as if it is being made on the class T (or the class that represents the upper bound of type variable T ). This means the type checker should use the __call__ method of T ’s metaclass and the __new__ and __init__ methods of T to evaluate the constructor call.

https://typing.readthedocs.io/en/latest/spec/constructors.html#constructor-calls-for-type-t

but that this could be unsafe (continued quote from above):

It should be noted that such code could be unsafe because the type type[T] may represent subclasses of T , and those subclasses could redefine the __new__ and __init__ methods in a way that is incompatible with the base class. Likewise, the metaclass of T could redefine the __call__ method in a way that is incompatible with the base metaclass.

This is the only place in the specification where it says this could be redefined incompatibly, and nowhere else is this explained in any more detail, but all current type checkers don’t check this, and some as a result of checking this are already planning on not checking loosely related instance methods.

That’s what the type of __init__ (technically, the sum total effect of init, new, metaclass call, etc) is though, explicitly stating how I expect the type to be constructable. So type[T] means I’m expecting something considered to be compatible with the type object T.

Tinche · December 31, 2024, 11:41am

Doesn’t this weaken safety here considerably?

If I do this and then mistype the name of an argument with a default value,

The type checker doesn’t complain
The runtime doesn’t complain
The class isn’t initialized the way I intend, causing an issue later on

So, the worst kind of bug.

pf_moore · December 31, 2024, 12:16pm

But if B is a subclass of A then both A and B are of type type (ignoring metaclasses for the sake of simplicity). So there’s no LSP at play here, because there’s no subtyping relationship (of the types of the objects A and B).

In actual fact, what is going on here is that A.__init__ and B.__init__ are methods on two different objects of type type. The real oddity here (if “oddity” is the right word) is that those two methods are allowed to have different signatures. But that’s fundamental to how __init__ works - you wouldn’t expect set.__init__ and dict.__init__ to have the same signature, even though they are the same method on two instances of type type.

I was unable to find the definition of what type[T] actually is in the specification. Could you please provide a pointer? The section you linked to discusses one specific aspect of the behaviour of type[T] but doesn’t link back to where type[T] is defined in the first place. This is one of the most frustrating things about typing discussions for me - people casually refer to things like type[T] and assume the reader knows what that is, but it’s almost impossible to actually find the definition unless you’re an expert already. It’s not even possible to search, as looking for the term type in the typing spec is an exercise in futility.

The reason I want to find that definition is that the thing I don’t understand about your argument is why type[T] is different from T in the context of my argument above that T.__init__ is “obviously” allowed to have a different signaure than a subtype - at least when tyying to apply the LSP to types. You clearly have a different intuition, based on your knowledge of what type[T] means - but I can’t follow that intuition without a better understanding of type[T].

oscarbenjamin · December 31, 2024, 2:08pm

Michael H:

LSP as a broad principle says that if you swap a type for a subtype, all code that was valid before the swap remains valid afterward. It’s what allows things like:
class A:  ...
class B(A): ...

def foo(x: A) -> None:  ...
foo(B())  # the idea that subtypes are valid substitutes is what allows this

We are talking here though about swapping type[A] and type[B] rather than A and B. It isn’t always valid to swap a type for a subtype when the type being swapped is a parameter of a generic type which is why we have the concept of variance. The normal rules for determining variance would say that type[T] is invariant in T.

The idea that type[T] is covariant in T so that type[B] is a subtype of type[A] comes from an imagined version of LSP that sometimes works but never fully existed in reality. In practice some class methods are intended to be compatible across subclasses but some are not with __new__ being the most notable exception.

Long before Python had type annotations I can remember discussing on some ancient version of these discussion groups that LSP does not apply to types and constructors. The specific example at the time was because some library broke when I called it with a namedtuple:

from collections import namedtuple

Point = namedtuple('Point', ('x', 'y'))

p = Point(1, 2)

def library_code(obj: object) -> object:
    if isinstance(obj, (list, tuple)):
        obj = type(obj)(obj)
    return obj

library_code(p)
# TypeError: Point.__new__()
# missing 1 required positional argument: 'y'

mikeshardmind · December 31, 2024, 3:34pm

It’s frustrating for those who have strong theory background too, the “specification” shouldn’t be called that as it does not contain all of the relevant information or required definitions to implement a type checker, and the table of contents, index, and glossary are all useless for this as well.

You’d have to go to pep 484, I can’t find it in the current specification

treating type generics as invariant just breaks the ability for the type system to interact with things that are typed appropriately, but it doesn’t really matter

It’s covariant currently, so unless that’s changed, the rules for that apply.
It has to be covariant, because python exposes, and has idiomatic use of type in code, so substitution has to include accessing the type object.

def ex(x: A):
    type(x)(...)

It isn’t magic here. For subtyping to work at all, the things you can do with an instance of a type have to remain possible with instances of a subtype. type the one argument function call this time, is also typed as (T) -> type[T] making type invariant would mean removing subtyping from the language or making type the function do type erasure, and saying it’s just not supported by the type system, and neither of those are good outcomes either.

MegaIng · December 31, 2024, 4:03pm

I am going to be a bit more direct. The Python developer community are not going to change their __init__ signatures. So either:

The typing people of python have to figure out a way to correctly describe __init__ and marrying it with a partial LSP.
Or the typing people have to accept that the system will never be fully able to describe even the most simple of python types.

Asking people to do something absurd like everyone adding *args: object, **kwargs: object to their __init__/__new__ functions will be catastrophic for the typing side of python:

IDE’s like PyCharm will probably refuse to implement such checks in their builtin checker because it “breaks” too much code, and they will probably default to suppressing such checks on third party checkers like mypy plugins.
Basically all configs for almost all projects will default to containing suppression for the relevant error codes.
Projects are going to not adapt typing if they get these errors by default and have to go out of their way to suppress it.

By arguing for this position (that __init__ violating LSP is a problem that should be solved by changing how __init__'s are written) you IMO are going to hurt the progress of the typing project for python.

Python is never going to be fully theoretically safe. If you want that, use (or make) a different language. Yes, many aspects can be improved by small code changes to conform to the stricter world that typing prescribes. This one specifically is IMO not going to happen.

pf_moore · December 31, 2024, 5:10pm

Thanks. So from my reading of the PEP, type[C] is (in effect) a union of C and all its subclasses? If we think of it like that, then consider the following:

❯ bat -p .\tyex.py
class A:
    def fn(self, i1: int) -> int:
        return i1 + 3
class B:
    def fn(self, s1: str, s2: str) -> str:
        return s1 + s2

an_instance: A | B = A()

print(an_instance.fn(1))

❯ py .\tyex.py
4
❯ mypy .\tyex.py
tyex.py:10: error: Missing positional argument "s2" in call to "fn" of "B"  [call-arg]
tyex.py:10: error: Argument 1 to "fn" of "B" has incompatible type "int"; expected "str"  [arg-type]
Found 2 errors in 1 file (checked 1 source file)

That’s pretty bizarre behaviour. It’s very clear that an_instance has actual type A, and the method call is correctly typed. And both of classes A and B are entirely correct in themselves. It’s only calling fn, which is defined differently in the two classes, on a variable declared as a union type containing both of those classes, that could be considered invalid, and even then the type checker (in this instance, at least) should be perfectly capable of inferring the correct runtime type and validating that the usage is correct. But regardless of all this, it’s not hard to reason about the typing of this code.

I’m willing to believe that simply defining type[C] as “a union of C and all its subclasses” is a little simplistic in practice (in the sense that I assume there are nuances beyond simply “it’s easier to type” where type[C] is more useful in practice than the corresponding union) but given that’s basically what the PEP says, I find it difficult to see why __init__ doesn’t correspond precisely to fn in the above. And yes, that includes the fact that it’s incorrect to call obj.fn if all you know is that obj is of type A | B (i.e., it’s invalid to call cls(...) if all you know is that cls is of type type[A]).

Going back to the fundamental question, the following is invalid code:

class A:
    def __init__(self, i1: int, i2: int):
        self.i1 = i1
        self.i2 = i2
class B(A):
    def __init__(self, i1: int):
        self.i1 = i1
        self.i2 = 12

def dodgy(obj: A):
    assume_its_an_a = type(obj)(1, 2)

dodgy(B())

It would be nice if type checkers could catch the error and report it, but the error is in the call in dodgy, not in the definitions of A and B, and that’s the important point here.

It would also be nice if the author of A had a way to say "subclasses must preserve the calling signature of __init__, if that’s a requirement they want to impose on subclasses (maybe so they can write code like dodgy without it actually being dodgy ). But it has to be opt-in, as A’s author might not have any intention of writing such code (in which case, why constrain subclasses unnecessarily?)

While this is broadly true (and is basically what the LSP says) the critical point is how we define “things you can do”. Python’s introspection capabilities let you get at things that are far wider than any sensible interpretation of “things you can do with an instance of a type”. And IMO, one of those capabilities is the ability to use type() to get an object that is the type of the instance.

If someone wants to explicitly define what “things you can do” (in the context of Python) fall under this rule, then that would be fine^[1]. But without such a precise definition, the LSP has to be just a design guideline, and not a hard and fast rule.

although I suspect it would be more effort than it’s worth in practice ↩︎

mikeshardmind · December 31, 2024, 7:25pm

I agree with you to some extent that allowing use of type at runtime to get the type object isn’t something worth preserving, but the problem is the new shiny alternative to that for the common case (copy.replace, added in 3.13, with corresponding __replace__ dunder) is already also being excluded from LSP checks despite being an instance method designed for this.

The only solution offered by a type checking maintaner, is a manually maintained Callable[..., T] where ... isn’t a literal ellipsis but a stand-in for what has to be manually kept in sync with the constructor type.

It seems like everyone is saying the people who want this to be safe in their code is wrong for wanting that checked, where to me it seems like the people who are wrong are the ones expecting subclasses that violate LSP under the current working definitions to work with static type checking. It’s optional. If you don’t want to adhere to the conventions of static-type systems, don’t use it rather than make it less capable for the people who actually want it. This isn’t even possible to implement in a new type checker, the specification doesn’t allow it, so it won’t be interoperable as-is with library use.

Not everything should be a subclass. If it isn’t going to be compatible, write a function that wraps the needed behavior not a subclass. Or at the very least, if we’re saying use of type at runtime is never going to be type safe, lets actually commit to that, make type() at runtime erase type information and only be typed as returning type[object], and don’t also close off every reasonable alternative and present unusably bad workarounds.

MegaIng · December 31, 2024, 7:45pm

OP proposes a decorator for __init__ to tell static type checkers that it should conform to LSP. This could easily be extended to other exceptions like __replace__ or __new__. This would be an opt-in to more safe behavior. Is there an issue with this proposal that I am missing?

As a future step callability of type could be removed for types where it isn’t guaranteed to be correct (i.e. types that aren’t final nor have LSP guarantees for __new__ and/or __init__), but I think this is a common enough use case that we should have a simple solution first before throwing away this option.

OP proposes the decorator to be @abstractmethod. This would restrict it’s usage to ABCs or Protocols, but that is IMO a bit to limiting - maybe a new decorator should be added instead.

Enforcing __init__ signature when implementing it as an abstractmethod

Enforcing init signature when implementing it as an abstractmethod