True constructors

NeilGirdhar · February 11, 2025, 9:21pm

You don’t need the ABCMeta metaclass. That exists only for when you want to register subclasses, which is unsupported by static type checkers anyway.

oscarbenjamin · February 11, 2025, 9:30pm

You never need it but it is already there:

In [39]: type(Hashable)
Out[39]: abc.ABCMeta

In [40]: type(Protocol)
Out[40]: typing._ProtocolMeta

In [41]: type(Protocol).__bases__
Out[41]: (abc.ABCMeta,)

If you subclass these you end up with ABCMeta whether you want it or not.

Rosuav · February 11, 2025, 9:37pm

Maybe, or maybe I’ve learned that every aspect of something becomes a feature and will have people depending on it - aka every change breaks someone’s workflow. “Explicit promises” are by no means the entire surface of an object, and people will have expectations about the object that you didn’t document. So, if it’s a visible behaviour, it is a provable property.

Remember how hash randomization in Python 2.7 was a major breaking change and had to be explicitly enabled with an env var? The dictionary never documented anything about iteration order, the string never documented anything about the way it calculated hashes. But so much code depended on it that it was deemed worthy of backward compatibility guarantees.

pf_moore · February 11, 2025, 9:37pm

That’s not what the quoted text says.

I agree that your interpretation is more reasonable, but the quoted requirement is much more mathematical in its nature, and as such needs a more mathematical context to make any particular statements about it. I haven’t read Liskov’s paper, which is where this text apparently comes from, but I imagine it’s much more mathematically rigorous than the discussions here. Have you? Can you explain your position on a mathematical basis?

I’m happy to discuss the LSP as a principle that can be applied in a practical sense in programming languages, but I think that a good does of “practicality beats purity” is needed, unless you’re working in a language (like Haskell) that has a strong mathematical basis. Python is not such a language.

In practical terms, it’s clearly possible to make a useful programming language that implements (multiple) inheritance but only partially respects the LSP. After all, Python exists And while the LSP is a useful idea to consider when designing a language, it very definitely isn’t necessary for inheritance.

class A:
    pass

class B(A):
    pass

def breaks_substitutability(obj: a):
    assert a.__class__ == A

Obviously, this is a silly example, but before you dismiss it, why is it silly? Is looking up __class__ somehow cheating? Is failing an assert not an example of “not being able to accept B < A”? Because it’s not at all clear to me why the breaks_substitutability function is exempt from the LSP - or if it’s not exempt, then what the LSP actually means for it.

Note - this is all somewhat off-topic, but still relevant because I don’t think your dogmatic insistence that the LSP is “a necessary aspect of inheritance” is helping your arguments, which seem to be aimed solely at people who accept your interpretation of the LSP.

mikeshardmind · February 11, 2025, 9:41pm

LSP only has a useful meaning here within a type system. We could use the one that the python typing specification provides, we could also use the implied duck-compatible apis that predate it with terms like “pathlike” and “iterator protocol” (structural typing, with or without using protocols, abcs, or both to present it), I’m sure we could find any number of definitions, and it would be better to just look at this as trying to better describe subclassing and forwarded to parent vs new kwargs.

I think people are focusing too heavily on use of the term and not on the shape of the problem trying to be solved here. I think the proposal needs to address which behaviors you are trying to make better guarantees of via syntax, and which ones are out of scope.

NeilGirdhar · February 11, 2025, 9:46pm

Yes, I mean for your speed test, when comparing protocols with ordinary inheritance, the ordinary inheritance case should not have ABCMeta as the metaclass since that might unnecessarily slow things down.

NeilGirdhar · February 11, 2025, 9:47pm

I understand, but that is a language design issue. It’s not what LSP entails.

NeilGirdhar · February 11, 2025, 9:52pm

It’s not “cheating”. The reason this doesn’t work is simply that the base class does not promise that all of its descendants will have the same __class__ value.

When you define __class__, you do it as:

class A:
  __class__: str

not

class A:
  __class__: Literal['A']

Yes, exactly.

And one of the key problems that this proposal aims to address is that __init__ is a method that breaks LSP, and consequently super().__init__'s signature is practically impossible to know.

Therefore, the solution here is to make constructors simply not inherited. But doing that causes other problems, which then necessitates other solutions than calling super.

gcewing · February 11, 2025, 10:34pm

Ms. Liskov didn’t express it very well. What I think she was getting at
is that you should be able to replace an object with an instance of a
subclass without breaking anything. But what counts as “breaking”
depends on what behaviours are important to your program. A behaviour
change that “breaks LSP” in one context might not do so in another.

From that perspective, if LSP is broken, then there is a bug in your
program somewhere. So whether LSP is a design princple or a necessity
depends on whether you think that “programs should not have bugs” is a
firm requirement or merely a design guideline.

Rosuav · February 11, 2025, 11:58pm

Would you say, then, that this means that LSP is a design principle and not a fundamental of inheritance?

All programs have bugs. I don’t know how you would make “code is bug-free” a reuqirement, ever.

sirosen · February 12, 2025, 1:56am

The initial problem statement here contains good elements but places much too much emphasis on narrow use-cases and a specific design principle. The proposed solution would be harmful for the language, but there are underlying problems which can be addressed in other ways.

Much too much attention is being given in this entire discussion to the LSP. Design principles can guide our thinking. But non-adherence to a principle is not inherently a problem. See also: overzealous adherence to “DRY”, which often leads novice developers to write a helper function to avoid repeating a single line of code. Putting emphasis on LSP rather than on the ways that type checking isn’t meeting users’ needs is a mistake.

Here’s a significant problem, no theory or principles needed:
The tools we have for type annotating **kwargs for use in super() are insufficient. Type annotations and type checking should learn to play nicer with super().__init__(**kwargs). As it stands, I rarely find myself able to use **kwargs capturing in signatures anymore, as it’s too destructive to the IDE and type-checking workflows which expect to be able to statically validate function call parameters.
This makes type checking difficult to apply to some well-established patterns.

I would like to see type checking develop ways of lifting the signature from one callable or type into another. You can only do this today by rearranging your code to declare parameters using Unpack[TypedDict] – I appreciate the work which was done to create that mechanism, but I do not find it adequate.^[1]

If dataclasses and __post_init__ in particular don’t behave well with super() invocations, that sounds like a problem to be addressed in dataclasses. In general, special treatment of dataclasses gives them unwarranted primacy in the language – IMO dataclass_transform is a pragmatic and useful short-term fix, but ideally could be formalized into something which cleanly supports msgspec, attrs, pydantic, etc.

Aside on how to create alternate constructors today, with LSP enforcement.

If you want to have type checkers enforce signature matching between subclass and superclass initializers, it’s perfectly valid to define alternate constructors as classmethods, inherit them normally, and mypy and pyright will happily yell at you for breaking their rules.

By way of example, type checkers reject this:

import typing

class A:
    def __init__(self, **kwargs: typing.Any) -> None:
        super().__init__(**kwargs)
        self._slot: int = 0

    @classmethod
    def new(cls, value: int) -> typing.Self:
        obj = cls()
        obj._slot = value
        return obj

class B(A):
    def __init__(self, **kwargs: typing.Any) -> None:
        super().__init__(**kwargs)
        self._slot2: str = ""

    @classmethod
    def new(cls, value: int, othervalue: str) -> typing.Self:
        obj = cls()
        obj._slot = value
        obj._slot2 = othervalue
        return obj

So if all you want is “an alternate constructor declaration which type checkers do not special-case”, you already have the tools to go and build that. I don’t expect many people would want to write their programs like this, and when you realize that, I think it becomes clear that most developers would not want to write @constructor-designated methods either.

Actually, I find the mechanism with TypedDict to be downright bad. But I don’t want to be unfair – clearly it meets some people’s needs, or it wouldn’t have been accepted. ↩︎

NeilGirdhar · February 12, 2025, 3:10am

I understand what you’re getting at, but the only reason LSP is even mentioned is to explain which signatures we know and which we don’t. LSP is kind of rule that ensures that things are logically consistent, and that rule is applied in many places, and not applied in others—as you say, for user needs.

Now, the places in which it’s not applied: __init__, __post_init__, __replace__, etc. cause various typing problems. They mean that various things cannot be checked, and code can violate invariants despite being type checked.

This is because readers of code have an internal model of what the annotations are promising. I’m not just being “zealous about LSP”. You, as a reader of code, can usually expect that if x is annotated as T, then x exposes T’s interface. Whenever anything violates LSP, then the reader’s expectation is broken. You’ll call x.f() and it will break at runtime despite the static type checker not complaining.

This is why LSP is such a fundamental rule. It’s already a fundamental rule in the programmer’s mental model.

Of course, rules can be broken for convenience. If we can find a way to write code just as conveniently without breaking LSP, that would be a huge win.

I agree that this would be helpful, and actually proposed lifting the superclass signature a couple years ago.

Unfortunately, this doesn’t help us with __init__ because we can’t know the signature of super().__init__. That’s the problem.

100% agree with you.

I tried proposing some fixes a few years ago, and it came up again.

Even if you could get the little that the above issues are asking for, there is no way to call super().__post_init__ unless you place strong restrictions on your entire inheritance hierarchy of what parameters __post_init__ can accept. There is no clean solution.

The reasons that this idea is built around dataclasses are that:

it ensures that we know all of the member variables of any class, which helps when combining initialized base classes into a derived class, and
it allows us to usually statically verify that there are no collisions (within reason).

As I explained in the OP, that doesn’t work in general because you can’t know the superclass signature. There is no way to statically check super().__init__ calls.

Also, if you define constructors are ordinary classmethods, you run into problems with inheritance. Either you, end up doing something like:

class C:
  @classmethod
  def f(cls, ...) -> Self:
    return cls(...)  # Broken for subclasses!

which is broken for subclasses D < C for which D.f makes a bad call to D, or you treat the class method as essentially a static method:

class C:
  @classmethod
  def f(cls, ...) -> C:
    return C(...)

which works, but provides no help if you want to delegate to superclasses.

This idea makes delegation to super nicer. (Although, having thought about it bit more, I think it could be made even nicer.)

sirosen · February 12, 2025, 3:42am

If the reason it is brought up is to explain an underlying issue, I encourage you to instead talk about the underlying issue itself. Your initial presentation made it sound like “this loophole in a very important principle is allowed, and that’s a problem”. That leads to people having unproductive debates about the importance of the principle, rather than talking about whatever the actual problem is.

As you say, we can’t statically know the type of super() in a multiple inheritance scenario. That’s hard to deal with in general. In a closed system, where all relevant classes and subclasses are discoverable, it is possible to know what types (a union) are possible for super(). That may be an interesting and productive angle for type checkers to pursue. Maybe we could add a class decorator to typing to tell checkers that they may treat a class as though it never appears in an inheritance tree outside of the view of their analysis (i.e. designate private vs public classes).

NeilGirdhar · February 12, 2025, 3:58am

Yes, that’s what happened. Hopefully, my last comment makes it more clear why violating LSP makes programming harder.

The problem with that approach is that the interface of a union is the subset that is supported by every member in the union. So your union approach would lead to many false negatives. Now, I realize given your last comment that just stating this rule as fact may cause an unproductive debate about why the interface of a union is what it is. I’d rather not dive into these type theory questions.

Also, it’s rarely a “closed system”. Users are typically free to inherit from whatever they want. And in doing so, the base classes cannot know what they superclasses are. That means that the call to super would pass type checking when a library is released, but a user of the library would get type errors in the library when he inherits from classes within the library! It would be better if the library designer were guided by type errors to get the calling structure right.

sirosen · February 12, 2025, 4:06am

I strongly contest the notion that closed systems are a rare case.

Firstly, many packages may have classes which are implementation details and not considered part of their public interfaces. Importing from pip._internal, for example, means you’ve violated the usage contract for the package.

Secondly, I would point at Python applications which are not published as packages. That means cloud deployed Django apps, pyinstaller builds of GUI apps, you name it. If it’s written in Python but not published for other code to import and reuse, the classes can’t be subclassed by someone else.

Sure, there are tons of cases this doesn’t cover. But there are a lot which it does. I wouldn’t even attempt to guess at which is more common. We have almost no insight into closed source usage.

NeilGirdhar · February 12, 2025, 5:09am

Sure, I think I misunderstood what you meant by closed system. I think you can already mark the classes as final for some of the cases you mention, and then super().__init__ is known?

oscarbenjamin · February 12, 2025, 1:08pm

I think that a lot of the problems you are describing here stem from:

I consider this to be in the same sort of realm as monkey-patching. Nothing stops someone downstream from monkey-patching your module but that does not in any way mean that you need to worry about it as a supported use. I don’t generally mark classes as final but at the same time it is rare that I would actually consider it reasonable that downstream code should subclass any of my classes. Essentially I consider that classes are implicitly final unless otherwise stated.

In the exceptional case that I might want to permit downstream subclassing there needs to be some design for how the subclassing is supposed to work that is both documented and tested. That design needs to include things like which methods or attributes a subclass should provide or is permitted to override and how exactly the superclass code promises to make use of them. An example could be a mixin or abstract base class where the contract is “implement method X and the mixin will provide methods Y and Z that call X”. It is only by following the given design that downstream code can reasonably be expected to subclass upstream classes.

The problems you are referring to seem to be related to trying to make a fully general subclass design so that any combination of classes can be inherited even using arbitrary multiple inheritance. I think that this is neither possible nor even desirable because useful subclassing needs to have some particular contract about how it is supposed to work.

I would say that the answer to your question about knowing the signature of super is that if you are subclassing a class that is designed to be subclassed then that design dictates how the subclass needs to be implemented. It is only by following the particular subclassing design that you can know how to call __init__ or even whether it is permitted for a subclass to override/implement __init__ at all.

NeilGirdhar · February 12, 2025, 1:33pm

If you can arbitrarily subclass something and the signature of __init__ is not explicitly constrained, then you can’t know how to call __init__. It doesn’t matter what the superclass design doc says. Because anyone can subclass you and their __init__ may not match.

I don’t agree with this. In fact, the Python language goes out of its way to make its classes support subclassing. It is an object-oriented language after all. Subclassing is not some kind of monkey-patching. It’s not “extraordinary”. Subclassing is the idiomatic way to express is-a relationships. If you have an interface that accepts X, and someone wants to modify the behavior of X, but still pass the object to the interface, the idiomatic approach is subclassing. Anything else would be very strange, and would make code unnecessarily hard to read and hard to write.

If you want to block subclassing, then you should absolutely make your class final. But this should be reserved for when you absolutely want to ensure that behavior doesn’t change.

The multiple inheritance section is one tiny section of the proposal, and it’s only there to ensure that multiple inheritance continues to work. I don’t understand gravitating towards it, especially when you don’t seem to like multiple inheritance in contemporary Python. If you want to avoid it, you can continue to avoid it. No one is forcing you to do it.

JamesParrott · February 12, 2025, 1:53pm

Neil, for the avoidance of doubt, please can you indulge me one second?

By “addressing warts” is the intention solely to give Python coders new tools that make it easier to design classes that would automatically satisfy LSP in __init__?

It’s not to go one further, and make that the only way subclassing can be done in the language? Deprecating the existing co-operative multiple inheritance mechanics would be such a large breaking change, it would justify Python 4.

sirosen · February 12, 2025, 2:48pm

I find this a little bit strong for my taste, but thank you for bringing it up.

Subclassing may be explicitly supported, explicitly not supported (good example: jsonschema validators), or somewhere in-between.

The dialogue here seems to be one of extremes. One of Python’s virtues is it’s flexibility and ability to support multiple paradigms. We should therefore be flexible in how we think about its features.