@dataclass_transform: How to treat "kw_only" parameter defaults of field specifiers?

Hi everyone!

We’re implementing dataclass_transform support in PyCharm and I came across one puzzling bit in the spec that mypy and pyright seem to interpreter differently. Sorry if it was brought up already, I haven’t found a relevant discussion. Namely, it’s the behavior regarding absence of an explicit kw_only argument in a field specifier call. The specification states:

If true, the field will be keyword-only. If false, it will not be keyword-only. If unspecified, the value of the kw_only parameter on the object decorated with dataclass_transform will be used, or if that is unspecified, the value of kw_only_default on dataclass_transform will be used.

So it seems that the order of precedence for treating a field as a keyword-only parameter of a dataclass constructor is the following (from the most priority to the least):

  1. Explicit kw_only keyword argument in the corresponding field specifier call
  2. kw_only argument in the decorator application or the dataclass inheritance list
  3. kw_only_default keyword argument in the application of @dataclass_transform itself

However, where does the default value of the optional kw_only parameter of the field specifier fall here? Should it be taken into account at all?

For instance, take the following example:

from typing import Callable, Type, TypeVar, dataclass_transform

def my_field(kw_only=False):
    ...


@dataclass_transform(field_specifiers=(my_field,))
def my_dataclass(**kwargs) -> Callable[[type], type]:
    ...



@my_dataclass(kw_only=True)
class Order:
    id: str = my_field()
    addr: list[str]


Order()  # pyright: (id: str, *, addr: list[str]), mypy: (*, id: str, addr: list[str])

Here, pyright takes the default value of the my_field’s kw_only default making id an ordinary parameter of the constructor despite the fact that @my_dataclass has kw_only=True. At the same time, Mypy seems to ignore the default value in the field specifier, considering id a keyword-only parameter like the subsequent addr.

What would be the right policy here? In particular, how “if unspecified” should be treated: as an absence of a keyword argument in a call or as an absence of the parameter in a field specifier definition?

It also raises a question how type checkers should treat a mistmatch between values of *_default arguments in dataclass_transform application and default values of the corresponding parameter defaults of a decorated function, or __init_subclass__ or __new__ of a decorated class.

For instance here:

from typing import Callable, dataclass_transform, reveal_type


def my_field():
    ...


@dataclass_transform(kw_only_default=False, field_specifiers=(my_field,))
def my_dataclass(kw_only=True, **kwargs) -> Callable[[type], type]:
    ...


@my_dataclass()
class Order:
    id: str
    addr: list[str]

Order() # pyright: (id: str, add: list[str]), mypy: (id: str, add: list[str])

It seems that both pyright and mypy ignore the default value of kw_only here according to the spec, but it might be confusing for someone reading the declaration of my_dataclass. I guess the answer will be that type checkers can report this if they deem it necessary, just want to check if there was some subtle reason for not doing that.

Interestingly this first example seems to be a case where the actual behaviour of attrs and dataclasses differ.

import inspect
from dataclasses import dataclass, field
from attrs import define, field as attr_field


@dataclass(kw_only=True)
class DC:
    id: str = field(kw_only=False)
    addr: list[str]


@define(kw_only=True)
class Attr:
    id: str = attr_field(kw_only=False)
    addr: list[str]


print(inspect.signature(DC))  # (id: str, *, addr: list[str]) -> None
print(inspect.signature(Attr))  # (*, id: str, addr: list[str]) -> None

I was actually surprised by dataclasses’ behaviour as the documentation indicates that kw_only will mark all fields as kw_only if True.

The stdlib dataclass design allows individual field specifiers to override the normal behavior of the containing dataclass. For example, if the dataclass has kw_only=True behavior, an individual field can be marked as kw_only=False or vice versa. The field specifier always takes precedence. This is true regardless of whether the call to the field specifier function includes an explicit kw_only=True argument or the argument value is implicit because the field specifier call has a default argument value for the kw_only parameter.

With that context in mind, we can update your precedence list to include an item between 1 and 2 that says “Implicit kw_only provided by default argument value in field specifier call”.

Mypy doesn’t currently implement support for this part of the spec, which is why you’re seeing behavior that differs from pyright. This is reflected in the typing spec conformance results here. (Search for “dataclasses_transform_field” and hover over the mypy column for details.) You can find the source for this conformance test here.

If there are ways we can make the spec clearer in this regard, please feel free to propose an update to the wording.


For your second question, I think it’s fine for a type checker to report an inconsistency here, but it’s not mandated. I’ve never seen this condition come up in real code. Normally a dataclass-like function does one of the following:

  1. It supports only kw_only=True or kw_only=False and doesn’t provide a way to override this behavior in the function call or class. Pydantic’s BaseModel is an example of this, since it always assumes kw_only=True.
  2. It provides a default behavior with regard to kw_only and supports the ability to pass None for this parameter. This is how the define call in attrs works, for example. It’s also how the stdlib dataclass works.

I’ve never seen an example where a dataclass-like function accepts a kw_only parameter with a default argument other than None, like in your my_dataclass example above. But if you think that is something that might occur, you could check for it and warn the user of the contradiction.

Got it, thanks for such a detailed answer! I’ve completely overlooked this comment about mypy in conformance test results. Also, given examples you provided of existing dataclass-like APIs, I guess practicality indeed beats purity when it comes to reporting a mismatch between the default of kw_only and the value of kw_only_default.

BTW for completeness I probably should have also included an additional item on the precedence list: “4. False by default (as specified in the typing spec)”. I’ll think how the wording in this part of the spec can be made a bit more straightforward for interpretation.