Dataclass single reference to created field

Jeremy-Code-F · November 18, 2022, 4:14pm

Hi,

I’m not sure if this is a bug or intentional behavior, but I stubbed my toe on this in some production code and wanted to see if there was a better way to address it.

I have a dataclass class that should have a property that is an instance of another class

@dataclass
class FirstClass:
    second_class: SecondClass = SecondClass()

I would expect every instance of FirstClass to have it’s own instance of SecondClass here, what actually ends up happening though is every instance of FirstClass shares the same instance of SecondClass

The workaround I found was to create a __post_init__() that creates the instance instead

@dataclass
class FirstClass:
    # second_class: SecondClass = SecondClass()
    second_class: Optional[SecondClass] = None

    def __post_init__(self):
        self.second_class = SecondClass()

Here is some simple repro code

from dataclasses import dataclass
from typing import Optional


class SecondClass:
    def __init__(self):
        self.some_field = []

@dataclass
class FirstClass:
    second_class: SecondClass = SecondClass()


def main():
    one = FirstClass()
    two = FirstClass()

    one.second_class.some_field.append("test")

    print(one.second_class is two.second_class)
    print(len(one.second_class.some_field))
    print(len(two.second_class.some_field))

if __name__ == '__main__':
    main()

It prints out

True
1
1

Is there an expected way to do what I’m trying to accomplish here? Or is what I fell into the ‘right’ way to do it?

Thanks!

vbrozik · November 18, 2022, 5:00pm

The class body is executed just once. So there is just a single instance of SecondClass created. This single object is then assigned to the second_class field of all the FirstClass objects created.

To have unique object per instance you have to postpone the object creation to the time FirstClass object is being created. There are multiple ways:

__init__() for usual classes
__post_init__() for dataclasses (as you did)
or more convenient: field() for dataclasses
- dataclasses — Data Classes — Python 3.11.0 documentation

from dataclasses import dataclass, field

class SecondClass:
    pass

@dataclass
class FirstClass:
    second_class: SecondClass = SecondClass()
    second_class_per_object: SecondClass = field(default_factory=SecondClass)

objects = []
for i in range(3):
    obj = FirstClass()
    print(f'{id(obj.second_class) = },  {id(obj.second_class_per_object) = }')
    objects.append(obj)  # Keep the instances to prevent memory (and ID) reuse.

id(obj.second_class) = 139988826782112,  id(obj.second_class_per_object) = 139988826839920
id(obj.second_class) = 139988826782112,  id(obj.second_class_per_object) = 139988826834400
id(obj.second_class) = 139988826782112,  id(obj.second_class_per_object) = 139988826834592

ChrisBarker-NOAA · November 19, 2022, 1:47am

Yup – using field(default_factory=... is the standard way, and well documented:

https://docs.python.org/3/library/dataclasses.html#mutable-default-values

Note that it doesn’t only apply to custom classes, but to every mutable type: lists, dicts, etc…

Also: to help avoid these errors, " the dataclass() decorator will raise a TypeError if it detects an unhashable default parameter."

Perhaps your custom class was hashable?

I have proposed that the dataclass decorator automatically add a default factory if you set a default to a Callable – I think that would be nifty, the field() call is so much busier – yes, you could want the Callable itself as a value – but I think that’s far less common, so you could use a custom field in that case.

However backward compatibility issue probably make this a no go.