My codebase included (over many modules) code like:
class C0:
pass
@dataclass
class DC2:
my_field: bool = True
class C1(C0, DC2): # original state
pass
And lo it was good:
print(len(C1.__dataclass_fields__)) # 1, life is good
But then I tried turning the non-dataclass parent into a dataclass:
@dataclass
class DC1:
pass
class C2(DC1, DC2): # messes up __dataclass_fields__
pass
print(len(C2.__dataclass_fields__)) # 0 ??
Making the child a dataclass changes the state back to the expected one:
@dataclass
class DC2(DC1, DC2):
pass
print(len(DC2.__dataclass_fields__)) # 1
If until now it just seemed suspicious, Iâm pretty sure this next one qualifies as a bug (or at the very least a glaring footgun): changing the order of the C2 parents flips the state too!
class C3(DC2, DC1):
pass
print(len(C3.__dataclass_fields__)) # 1 ?!?
Perhaps thereâs some documented warning like âdonât mix dataclasses and non-dataclasses via inheritanceâ? A few places in the docs make me suspect otherwise, eg:
dataclasses.is_dataclass(obj)
Return True if its parameter is a dataclass (including subclasses of a dataclass)
Before I open a github bug and perhaps suggest a PR to change this behavior, could you experts kindly lend an opinion - is any of this somehow intentional? Is there something else Iâm missing?
This is not really fixable because of the way dataclasses are implemented. Itâs not really possible for them to implement behavior that happens on inheritance unless you manually call the decorator.
What you are seeing is straight inheritance of these fields without recomputation. So of course changing the order matters - this is true for all class attributes, including methods.
I donât see a reasonable fix that doesnât break some other usecases. ( The closest I can think of is dataclasses also generating an __init_subclass__ method - but especially with respect to respecting user-defined versions of such methods this seens tricky)
IMO linters and type checkers should get warning for this edge cases, but I donât think anything should change in python itself.
Doesnât help. The examples already include other workarounds, but maybe the root problem can be addressed - as one can imagine, in real code it wasnât as direct.
As far as I can tell your examples show the behaviour Iâd expect.
This seems to be more to do with forgetting to use the @dataclass decorator, as the dataclass attribute youâre checking isnât defined on the child class, so itâs looked up on the parents according to the MRO.
In your examples both DC1 and DC2 have __dataclass_fields__. DC1âs is empty and DC2âs has one entry. In C2âs MRO, DC1 comes before DC2 so you get the attribute from DC1 which is empty while in C3âs MRO, DC2 comes first so you get its attribute with 1 entry.
When you decorate the child class with @dataclass it does its own resolution checking for fields from parent classes and defines its own __dataclass_fields__ so itâs not retrieved from the parents any more.
You canât really prevent people from forgetting to use the @dataclass decorator just by virtue of it being a decorator. The class is first constructed as a non-dataclass and then the @dataclass decorator converts it so a parent class canât check if a child is a dataclass because initially it wonât be.
If you want to make sure that everything in the inheritance tree is a dataclass your best bet is probably to wrap dataclass in an __init_subclass__ method (providing you donât use slots=True) along with a check that every class in the MRO (except object) is a dataclass and then use that class as your base class instead of using @dataclass directly. You should be able to use typing.dataclass_transform with this to make static tools behave.
The easier way to do this is to make a Dataclass class that calls dataclass on cls in __init_subclass__, decorate that with dataclass_transform (so that type checkers understand what it does), and inherit from that instead of decorating your classes with @dataclass. This wil ensure that all child classes of a dataclass are automatically dataclassesâeven if you forget.
would it be feasible for a type checker to alert, say when a non-dataclass has a dataclass on its mro?
Are you asking whether itâs possible to implement such a check statically? The answer is yes, a type checker or linter could implement this check statically. Or are you asking whether I think a type checker should implement such a check? The answer is âit dependsâ. This isnât really a type checking issue, so type checker maintainers might be reluctant to add such a check. Iâm also skeptical that this is a common source of bugs, so taking the time to add such a check (and translating error messages to other languages, etc.) would be difficult to justify without added evidence that this is a common problem.
What you describe are the inner implementation details of dataclass, and how they cause the behavior observed (and thank you for that!). However, Thatâs still a long way from âthe behaviour one would expectâ - I doubt any user would expect a change in order of dataclass parents would cause fields to be added or subtracted.
How would one go about collection such evidence? Is there some process in place?
I can say that in our case this was the root cause of some long standing bugs.
Alternatively - would you consider accepting a PR for such an added rule?
This isnât really the right forum to be discussing pyright feature requests. Feel free to open a discussion thread or enhancement request in the pyright issue tracker.
dataclasses uses a decorator, so the generation of new methods is not inherited and needs to be applied on each class you wish to have the dataclass methods. This is the only thing that is specific to dataclasses.
The observed behaviour then follows from standard inheritance as it does for hand written classes.
class Base1:
def __init__(self, arg1="arg1"):
self.arg1 = arg1
def __repr__(self):
return f"{self.__class__.__name__}(arg1={self.arg1!r})"
class Base2:
def __init__(self):
pass
def __repr__(self):
return f"{self.__class__.__name__}()"
class Child1(Base1, Base2):
pass
class Child2(Base2, Base1):
pass
print(Child1()) # Child1(arg1='arg1')
print(Child2()) # Child2()
This is not to say that itâs not easy to forget to place the @dataclass decorator, just that if it is missing then this behaviour is what I would expect.
If you want the dataclass features applied automatically to inheriting classes to avoid this possibility, then you need to make the base class perform the application. Note that you may want to force kw_only=True as the order of arguments will depend on the inheritance order.
from dataclasses import dataclass, field
from typing import dataclass_transform
@dataclass_transform(field_specifiers=(field,))
class DCBase:
def __init_subclass__(cls, /, **kwargs):
# optional: check for slots=True in kwargs and error
dataclass(cls, **kwargs)
class DC1(DCBase):
arg1: str = "arg1"
class DC2(DCBase):
arg2: str = "arg2"
class IC1(DC1, DC2):
pass
class IC2(DC2, DC1):
pass
print(IC1()) # IC1(arg2='arg2', arg1='arg1')
print(IC2()) # IC2(arg1='arg1', arg2='arg2')
If this __init_subclass__ approach is something that is commonly useful then maybe it would be better if the dataclasses module provided this functionality directly.
One problem with adding an __init_subclass__ approach is that the method is called after the class has been created so itâs too late to add __slots__ by that point. The current decorator can âcheatâ by creating an entirely new class that looks like the old one but with slots and returning it, but this doesnât work inside __init_subclass__ (and has some of its own issues anyway). This is fine in your own code as you can just choose to not support slots but in the stdlib people would probably expect it to work, so this would probably need to be implemented with a metaclass.
In my own dataclass-like package[1] I created both a decorator and a metaclass/base class implementation and while there are some applications where the decorator is more appropriate, most of the time I find I end up using the base class. I would be curious to know if this held true in general in the case that both tools were available from dataclasses.
Itâs intended to be more like a construction kit for building dataclass-like tools, but it also includes the âprefabâ implementation I built with those tools. âŠď¸