In Python 3.12.5, this is a valid dataclass definition:
from dataclasses import dataclass
@dataclass
class Foo:
a: float = 4.2
b: int = 42
c: tuple = (4.2, 42)
d = 999
e = "foo"
What might not be obvious to some is that in this example, d and e are class variables. (All instances of Foo will share the same values for d and e, and trying to do Foo(d=1) will result in an “unexpected keyword argument” error). Additionally, the usual issues with mutable class attributes arise if d or e are of a mutable type. My issues with this are:
The dataclasses documentation doesn’t mention anywhere (that I can see) that attributes without type hints become class attributes. → It is “mentioned” by way of inference, see top reply.
This is a recipe for bugs (oops, I forgot to add the type hint, but it still works? until it doesn’t). → This may be a candidate for a linter rule, see this comment.
This introduces another serious pitfall for learners, who need to be told about class attributes and about this subtle behaviour before being introduced to dataclasses. → Type annotations are “optional”, but they seem to really matter.
There is already a special way to set class variables in dataclass definitions. → This is intended to be optional, in the same way that type annotations are.
Because I’m obviously too late to protest this behaviour, I’d instead like to propose another keyword arg to the @dataclass decorator that would raise an error if any attributes are declared without a type hint. This would rely on whatever magic generates the __init__ being able to see these bare attribute declarations. I don’t know if that is currently possible, and if not, this proposal becomes more complicated and probably ends up being wishful thinking. Although, I would still maintain that the documentation should clearly mention this behaviour.
The documentation does define the behavior clearly:
@dataclass decorator examines the class to find field s. A field is defined as a class variable that has a type annotation.
So it can be inferred that what @dataclass doesn’t see as a field, i.e. a name without a type annotation, is left untouched and therefore becomes a regular class variable.
Check for an unannotated class variable sounds like something that can be suggested as a possible checker for pylint.
Ah OK it is true that I misunderstood that all attributes of a dataclass are both class attributes and (maybe) instance attributes (if they are annotated). However, I still find the behaviour confusing and unclear. Is there a difference between un-annotated attributes and those declared as typing.ClassVar? I think given the hazard, this warrants some better signposting.
I don’t personally use pylint and would prefer a way to check for this at runtime.
Type annotation is optional in Python, so it’s fine to leave a class variable unannotated, but if you do want to type-annotate a class variable in a data class, it would be treated as a field if you annotate it directly with a type, so the workaround is to enclose the type in typing.ClassVar so @dataclass would leave it untouched while still giving type checkers a type hint to associate with the name.
I see, that does make sense. Maybe there is nothing that can be done here, so I will think about proposing some kind of warning in mypy for un-annotated attributes.
FWIW a runtime check should be fairly straightforward to implement yourself with your own decorator or common base class with a __init_subclass__ that implements the checking logic. You just have to compare the keys of cls.__annotations__ with cls.__dict__, although you’d need to special case methods and internal attributes in order to avoid false positives.
Sounds doable. I guess my main gripe is that, although type annotations are optional, as @blhsing pointed out, in this particular case they sort of change the semantics of the attribute declaration. But now that I understand why it is the way it is, I will try to use checks or get mypy to help with avoiding this pitfall.
While I admit that the class variable issue can be surprising, I am wondering if you were really bitten by this or if you overstating the matter. Consider your example class:
o1 = Foo()
o2 = Foo()
assert o1.d == o2.d # ok
o1.d = 42 # make an instance variable, hiding the class variable
assert o1.d == 42 # ok
assert o2.d == 999 # ok
Foo.d = 0
assert o2.d == 0 # also as expected
My case invoved inheritance, I have a dataclass with all attributes annotated which I export as API, and then I expect users to subclass it and add more attributes. That’s when I encountered this, as I was trying to come up with the very simplest use case (user subclasses, adds atrributes, but omits type hints). The fact that neither my linter nor the runtime complained about it until I tried to instantiate the subclass and override one of the new parameters in the constructor made me hesitant to offer a dataclass as API in this case. I think I’ll still go ahead with it, but I’ll need to make this clear to users that they MUST declare annotations.
Perhaps you are right. In fact there is already a rule (disabled by default) which at least handles the case where the class attribute is mutable: RUF012, so I’ll make a note to always enable that one.
Actually, it turns out that particular rule is already a can of worms, so I’m not rushing to suggest another similar one. It was originally restricted in scope to dataclasses! In the linked issue, similar concern is expressed, i.e. due to the fact that omitting type hints is dangerous in this case, it seems reasonable to want a warning; at the same time, type hints are “optional” and so it is considered bad form to have a linter fail if the hint is omitted.
OK I proposed this to ruff as well, considering that the RUF012 rule is not inherited from Pylint, the Ruff-only ruleset seems like the appropriate place for this, if any. Maybe the mypy issue was a bit premature, I didn’t mean to overburden mypy maintainers. I have cross-referenced them so both dev teams can coordinate which scope is more appropriate.