In which case what is wrong with using an __init__
method for that?
Taking the example from the PEP:
def str_or_none(x: Any) -> str | None:
return str(x) if x is not None else None
@dataclasses.dataclass
class InventoryItem:
id: int = dataclasses.field(converter=int)
skus: tuple[int, ...] = dataclasses.field(converter=tuple[int, ...])
vendor: str | None = dataclasses.field(converter=str_or_none)
names: tuple[str, ...] = dataclasses.field(
converter=lambda names: tuple(map(str.lower, names))
)
stock_image_path: pathlib.PurePosixPath = dataclasses.field(
converter=pathlib.PurePosixPath, default="assets/unknown.png"
)
shelves: tuple = dataclasses.field(
converter=tuple, default_factory=list
)
With __init__
that is
@dataclasses.dataclass
class InventoryItem:
id: int
skus: tuple[int, ...]
vendor: str | None
names: tuple[str, ...]
stock_image_path: pathlib.PurePosixPath
shelves: tuple
def __init__(self,
id: int | str,
skus: Iterable[int | str],
vendor: Vendor | None,
names: Iterable[str],
stock_image_path: str | pathlib.PurePosixPath = "assets/unknown.png",
shelves: Iterable = (),
):
self.id = int(id)
self.skus = tuple(map(int, skus))
self.vendor = str(vendor) if vendor is not None else None,
self.names = tuple(map(str.lower, names))
self.stock_image_path = pathlib.PurePosixPath(stock_image_path)
self.shelves = tuple(shelves)
Some might consider this boilerplate but I don’t because nothing here is really redundant. The types for the fields are not redundant. The signature of __init__
with types and defaults for parameters is not redundant. The code in the body of the __init__
method is not redundant. The field names are repeated a few times but no line of code here is redundant. If there were no converters then there would be redundancy because the types in the signature of __init__
would be the same as the types of the fields and each line in the body of __init__
would just be self.x = x
. Without converters the __init__
method looks like redundant boilerplate but as soon as you want to have actual code in __init__
it is not boilerplate any more.
The example with __init__
has a few more lines of code but that comes from the inclusion of types in the signature of __init__
. It might seem like the types of the parameters for __init__
are redundant but they are not. For example the parameter for str_or_none
might be typed as Any
but that does not necessarily mean that you would want to accept Any
as an input for the vendor
parameter in the InventoryItem
constructor. I have guessed here that the type should be Vendor | None
but in the original code it is unclear what it is supposed to be.
I don’t think that trying to make something that should usually be code in an __init__
method look declarative makes anything easier to understand or makes it any easier to write the code. It is better to put the code in an __init__
method all in one place rather than writing auxiliary functions like str_or_none
and noun-ifying simple code into “converters” and “default factories”. It is definitely easier to understand what the signature of __init__
is if you can see the __init__
method rather than scanning through default factories and converter functions. It is also easier to understand what is actually executing in the constructor if you can see the body of the __init__
method. The fact that behind the scenes the dataclass
decorator will go and textually build the code for this __init__
method is a clear sign that maybe what you should be doing is just writing an __init__
method.
What does not quite work with __init__
is frozen dataclasses. It does not seem to be possible to use either __init__
or __new__
with a frozen dataclass without using object.__setattr__
which is awkward. You can add an alternate classmethod constructor like InventoryItem.new(...)
but then that cannot be used with the ordinary InventoryItem(...)
syntax. Maybe there is a way to improve defining conversions or validation for frozen dataclasses.