Answering my own “why?”, I think I figured out the problem with just lifting the default arguments:
_NO_DEFAULT_VALUE = object()
def custom_field(*, default: Any = _NO_DEFAULT_VALUE) -> Any:
...
@my_dataclass
class Bar:
a: str = custom_field()
b = Bar() # This should be an error because `a` is not specified.
If the spec said “field specifiers must respect default arguments”, it would make it needlessly complex to specify a “default has not been provided” situation – for every combination of the other arguments, I would need to write an overload both with and without the default
argument.
(This is unlike the init
situation, where the semantic value of init
must be one of True
or False
and there is no “user didn’t specify” situation. Making it safe to lift the default value from the field specifier signature.)
Even the spelling I came up with in OP would not fully solve this:
def custom_field(*, default: Literal[...] = ...):
because as far as the typechecker knows, maybe I just want the callsite to specify custom_field(default=...)
.
I’ll think some more about this, but right now I don’t have any more ideas about how to piggyback the feature onto the default
parameter.
So the only viable idea would be to introduce a new parameter.
Spec draft
auto_default
is an optional bool parameter that indicates whether this field can automatically provide a default value. If unspecified, defaults to False
.
If set to True
, the dataclass will generate a default value for the field in case it is neither provided as an argument to __init__
, nor specified via one of default
, default_factory
, factory
.
Field specifier functions can use overloads that implicitly specify the value of auto_default
using a literal bool value type (Literal[False]
or Literal[True]
).
Motivation
There is currently no way to specify that a certain field of a dataclass will be filled in automatically if the user does not provide a value. The only way to do it is to explicitly provide a default value, or a default factory, while specifying the field. That solution could be needlessly repetitive for certain kinds of DSLs.
Consider a protobuf DSL. The message structure looks like this:
message Foo {
required string name = 1;
optional uint32 value = 2; [default=5]
optional uint32 amount = 3;
repeated uint32 array = 4;
}
The protobuf specification implies the following behavior:
class Foo(proto.Message):
name: str = proto.required(1)
value: int = proto.optional(2, default=5)
amount: int | None = proto.optional(3, default=None)
array: list[int] = proto.repeated(4, default_factory=list)
From the user’s point of view, the default=None
and default_factory=list
are completely superfluous, they are implied by the fact that the field is optional
or repeated
respectively.
We would like the corresponding class to look like this:
class Foo(proto.Message):
name: str = proto.required(1)
value: int = proto.optional(2, default=5)
amount: int | None = proto.optional(3)
array: list[int] = proto.repeated(4)
Other work
I am one of at least two people who want this (given that this issue exists).
On the other hand, a feature like this is not discussed in the dataclass_transform
PEP, indicating that this idea isn’t super popular in the wider community of dataclass_transform
users?
Backwards compatibility
Existing implementations might already be using the name auto_default
. A survey of the ecosystem would need to be done before selecting a name.
Alternatives
none come to mind right now