This idea entails an idealized version of classes in Python. There are some benefits, but I recognize that there may not be enough justification for it. What I’m hoping is to start the conversation about how we could make classes in Python more perfect so that one day we will have collected a motivating enough set of justifications to change things.
Background
There are a variety of warts that we hope to address.
Calling super in the initializer
In multiple inheritance, sending parameters to various base classes is brittle:
class Y:
def __init__(self, y: int, **kwargs: Any):
super().__init__(**kwargs)
self.y = y
class Z:
def __init__(self, z: int, **kwargs: Any):
super().__init__(**kwargs)
self.z = z
class X(Y, Z):
def __init__(self, x: int, **kwargs: Any):
super().__init__(**kwargs) # Hopefully send y to Y and z to Z.
self.x = x
This relies on:
- every initializer for every class delegating all unknown parameters to super, and
- that there are no collisions between parameter names in the inheritance chain.
Initializers, class factories, and __replace__
disobey LSP
Consider the above classes X
, Y
, Z
. We can see that __init__
does not obey LSP. In many circumstances, we can’t even know the signature of __init__
. Even in the above code, super().__init__
does not have a known signature since you can inherit from X
and insert all kinds of classes into the MRO list.
This problem gets even worse for class factories. If you wanted to add a class factory like:
class Y:
@classmethod
def create(cls, y: int) -> Y:
return Y(y)
Then X < Y
cannot also define create
with incompatible parameters.
This problem plagued __replace__
, which was finally exempted from LSP.
Dataclasses do not easily support multiple inheritance
__post_init__
is necessary if you want your dataclass to have member variables that are not passed to the constructor. But, if you want to use __post_init__
and have it support multiple inheritance, you need to call super. But you can’t call super without checking if the superclass even defines __post_init__
. And even if it does, there’s no easy way to find out what the super parameters are!
@dataclass
class Y:
y: int
def __post_init__(self) -> None:
if hasattr(super(), '__post_init__'):
# Hopefully, there are no parameters on this!!
super().__post_init__() # pyright: ignore
assert self.y > 0
@dataclass
class X(Y):
x: int
def __post_init__(self) -> None:
if hasattr(super(), '__post_init__'):
# Hopefully, there are no parameters on this!!
super().__post_init__() # pyright: ignore
assert self.x > 0
Redundancy in dataclass definition
Consider this real code for a mixer:
class Mixer2d(Module):
input_size: InitVar[int]
height: InitVar[int]
width: InitVar[int]
patch_size: InitVar[int]
hidden_size: InitVar[int]
mix_patch_size: InitVar[int]
mix_hidden_size: InitVar[int]
num_blocks: InitVar[int]
t1: JaxArray
conv_in: eqx.nn.Conv2d = eqx.field(init=False)
conv_out: eqx.nn.ConvTranspose2d = eqx.field(init=False)
blocks: list[MixerBlock] = eqx.field(init=False)
norm: eqx.nn.LayerNorm = eqx.field(init=False)
def __post_init__(self, # noqa: PLR0917
streams: Mapping[str, RngStream],
input_size: int,
height: int,
width: int,
patch_size: int,
hidden_size: int,
mix_patch_size: int,
mix_hidden_size: int,
num_blocks: int,
) -> None:
This illustrates three classes of member variables. True members and InitVar
s that are specified twice (in the class body, and __post_init__
). Some fields are marked init=False
to specify that they won’t be parameters. It would be a lot simpler to only have true members in the class body.
Proposal
Consider instead adding two special decorators for constructors @constructor
and @transformer
, and special language and typing support for such constructors and transformers. These constructs can only be used on a special form of dataclass, which we’ll call a C-class.
All constructors are class-methods that return Self
. All transformers are regular methods that return Self
. These work as usual. If x = X()
, then X.some_constructor(...)
calls the constructor and builds an X
always. And similarly, x.some_transformer(...)
builds an X
always. init
is a special constructor that can be called as X.init(...)
or just X(...)
All C-classes are dataclasses. Therefore, init
is automatically generated unless specified.
Constructors and transformers are never inherited. And calling super().init
(or any other constructor or transformer) is not allowed.
You would only ever choose to define __post_init__
if you left the init
constructor unspecified (otherwise, you can do whatever you want in init
). Therefore, __post_init__
accepts whatever init
accepts, and cannot call super (since that has unknown signature).
C-classes disallow InitVar
since the init constructor can be specified instead with whatever parameters you want. It would also complicate inheritance too much since you would be inclined to forward these to super. Instead, we discuss inheritance below.
The field parameter init=False
is likewise disallowed since you can simply not specify it in a constructor if you don’t want that constructor to accept it.
Example
@c_class
class Point:
x: int
y: int = field(default=1, converter=int)
@constructor
def init(cls, x: int, y: int = 1) -> Self:
return cls.__new__(x, y)
@constructor
def on_diagonal(cls, z: int, /) -> Self:
return cls.__new__(z, z)
@c_class
class TDPoint(Point):
_: KW_ONLY
z: int
@constructor
def init(cls, x: int, y: int = 1, *, z: int) -> Self:
point = Point(x, y)
return cls.__new__(***point, z=z)
@constructor
def on_diagonal(cls, z: int, /) -> Self:
return cls.__new__(z, z, z)
@constructor
def on_diagonal_alternative(cls, z: int, /) -> Self:
point = Point.on_diagonal(z)
return cls.__new__(***point, z=z)
Here, Point.init
and TDPoint.init
are provided just for illustration. They would be automatically generated had they not been defined.
Inheritance
Inheritance is illustrated in on_diagonal_alternative
. ***
unpacks positional-only arguments as position arguments, and the rest of the arguments as keyword arguments. The rules for collisions are the same as the rule for collions of members with base classes. If a parameter is explicitly given, it overrides anything provided by a ***
-unpacking. This makes it easy to construct a class with many base classes. Just ***
each of them, and fill in what you want to override and what’s missing.
The default init
In the context of inheritance, the default generated init
is as follows:
- Collect all of the parameters of all superclass
init
functions (collect all positional parameters first, and then keyword-only parameters, etc.) This is yourinit
parameter list. If there are any collisions, that’s a definition error. - In the body, call all
init
constructors for all your parent classes and produce an object for each. - Glue all of these objects together along with any parameters that are unique to this class using the
***
operator.
To address the enormous cost computational cost of generating this code and doing this, CPython would accelerate the common cases (e.g., when parent classes also don’t specify init
).
Closing remarks
This is intended to be an outline for repairing some of the warts with classes in Python. It probably doesn’t have enough benefits to pay for the cost of adding the various decorators and the ***
operator. But if people identify other warts with classes there may eventually be enough to justify such changes.