I’m not sure if this is a bug or intentional behavior, but I stubbed my toe on this in some production code and wanted to see if there was a better way to address it.
I have a dataclass class that should have a property that is an instance of another class
@dataclass
class FirstClass:
second_class: SecondClass = SecondClass()
I would expect every instance of FirstClass to have it’s own instance of SecondClass here, what actually ends up happening though is every instance of FirstClass shares the same instance of SecondClass
The workaround I found was to create a __post_init__() that creates the instance instead
The class body is executed just once. So there is just a single instance of SecondClass created. This single object is then assigned to the second_class field of all the FirstClass objects created.
To have unique object per instance you have to postpone the object creation to the time FirstClass object is being created. There are multiple ways:
from dataclasses import dataclass, field
class SecondClass:
pass
@dataclass
class FirstClass:
second_class: SecondClass = SecondClass()
second_class_per_object: SecondClass = field(default_factory=SecondClass)
objects = []
for i in range(3):
obj = FirstClass()
print(f'{id(obj.second_class) = }, {id(obj.second_class_per_object) = }')
objects.append(obj) # Keep the instances to prevent memory (and ID) reuse.
Note that it doesn’t only apply to custom classes, but to every mutable type: lists, dicts, etc…
Also: to help avoid these errors, " the dataclass() decorator will raise a TypeError if it detects an unhashable default parameter."
Perhaps your custom class was hashable?
I have proposed that the dataclass decorator automatically add a default factory if you set a default to a Callable – I think that would be nifty, the field() call is so much busier – yes, you could want the Callable itself as a value – but I think that’s far less common, so you could use a custom field in that case.
However backward compatibility issue probably make this a no go.