A recent pyright bug report uncovered an ambiguity in the specification of dataclass_transform
.
The behavior relates to libraries that provide dataclass-like behaviors through the use of a metaclass (like pydantic
) rather than through a decorator (like attrs
or the stdlib dataclass
module). The question is whether __init__
synthesis should be skipped when some intervening class in the MRO provides a custom __init__
method.
For decorator-based libraries, the answer is clear: it should follow the behavior of the stdlib dataclass
module. For libraries that use metaclasses, the behavior is less clear and is currently unspecified. Not surprisingly, this leads to a divergence in behaviors. Pydantic’s runtime behavior differs from that currently assumed by mypy and pyright.
from pydantic import BaseModel
class A(BaseModel):
x: int
class B(A):
def __init__(self) -> None: ...
class C(B):
y: int
# Which of the following is correct?
C() # OK at runtime, error according to mypy and pyright
C(1) # Runtime error, error according to mypy and pyright
C(1, 1) # Runtime error, OK according to mypy and pyright
This also affects multiple inheritance use cases:
from pydantic import BaseModel
class A(BaseModel):
x: int
class B:
def __init__(self) -> None: ...
class C(B, A):
y: int
C() # OK at runtime, error according to mypy and pyright
C(1) # Runtime error, error according to mypy and pyright
C(1, 1) # Runtime error, OK according to mypy and pyright
# Swapping the base classes changes the behavior
class D(A, B):
y: int
D() # Runtime error, error according to mypy and pyright
D(x=1) # Runtime error, error according to mypy and pyright
D(x=1, y=1) # OK at runtime, OK according to mypy and pyright
The correct behavior for dataclass_transform
is currently unspecified in this case. Perhaps we should clarify the typing spec so pydantic’s behavior and type checker assumptions are aligned. This would involve adding a new bullet to the Dataclass Semantics section that says:
- When
dataclass_transform
is applied to a decorator function, synthesis of an__init__
method is skipped if the class decorated with that function defines its own__init__
method. Ifdataclass_transform
is applied to a metaclass, synthesis of an__init__
method is skipped if any class in the MRO prior to the base class constructed from that metaclass defines its own__init__
method. Whendataclass_transform
is applied directly to a base class, synthesis of an__init__
method is skipped if any class in the MRO prior to that base class provides its own__init__
method.
The only downside that I see to specifying this behavior is that we might find that other libraries (beside pyantic) that use matclasses or base classes to introduce dataclass-like behaviors might differ from this specified behavior. But given the importance of pydantic in the Python ecosystem, there’s a good argument that we should standardize on its behavior in this case.
Thoughts?