Dataclass_transform: add inherit_defaults option

dataclass_transform (PEP 681) is a great extension that allows type checkers to support a myriad of dataclasses-like structures, in a consistent manner without the need of plugins.

However, one topic that I could not find mentioned in either PEP 681 nor PEP 557 (dataclasses) is how to deal with overridden fields defaults.

Unfortunately not all libraries behave in the same way. On one hand we have stdlib dataclasses and pydantic dataclasses which inherit the default from the parent class. On the other hand we have attrs, attr and pydantic models which do not inherit from the parent.

This poses a challenge for type checkers as they need to decide what the correct behavior is, producing false positives and false negatives when an alternate library is used. And in fact pydantic and pyright have different behaviors. pyright requires that overriden fields must specify a default if the parent had one. While mypy assumes the default is not inherited.

For sake of brevity I will only present results for stdlib dataclasses and pydantic models, but could post the extra examples if required.

The following dataclasses code inherits the default value from the parent and does not produce any errors at runtime.

import dataclasses

@dataclasses.dataclass(frozen=True)
class Base:
  x: int = 3

@dataclasses.dataclass(frozen=True)
class Child(Base):
  x: int

print(Base())
print(Child(x=1))
print(Child())

pyright generates the false positive

test_dataclass.py:9:5 - error: "x" overrides a field of the same name but is missing a default value

mypy also generates a false positive

test_dataclass.py:13: error: Missing positional argument "x" in call to "Child" [call-arg]

the following pydantic model code

import pydantic

class Base(pydantic.BaseModel, frozen=True):
  x: int = 3

class Child(Base, frozen=True):
  x: int

print(Base())
print(Child(x=1))
print(Child())

raises an error at runtime because of the missing x argument in the last call

and pyright generates a false positive about the overriden field and a false negative about the last call to Child

test_pydantic_model.py:7:5 - error: "x" overrides a field of the same name but is missing a default value (reportGeneralTypeIssues)

while mypy correctly generates the error

test_pydantic_model.py:10: error: Missing named argument "x" for "Child" [call-arg]

One alternative to support both design choices is to extend PEP 681 with a new attribute called inherit_defaults, that when set to True would mean that defaults from the parents are inherited. In that case type checkers will need to also check that the inherited default is compatible with the overriden type. For example, if the original field was x: int | None = None, overriding it in a subclass as just x: int should produce an error as None is not compatible with int.

Another option would be to extend? some other PEP (484 or 557) and mandate that typecheckers require overriden fields with defaults to also specify one. This is what is currently implemented by pyright (motivated by this issue), which is neither equivalent to the hypotheticals inherit_defaults=True, nor inherits_defaults=False. This take is currently the safest, as forcing the default definitions ensures this works for all dataclasses alternatives. However, this is limiting for authors, as once a field has a default, there is no way to remove it.

What is the use case for this? On the face of it I’m sympathetic to pyright’s decision to make this an error; I’m not sure why you’d want this.

If we do want to support this sort of behavior, the default behavior of dataclass_transform should match that of dataclasses. We can optionally add a switch to cover the behavior of other libraries, but I’m not sure this pattern is common enough that it’s worth adding to the spec.

1 Like

More generally, extending @dataclass_transform is a topic that came up several times (e.g. support_replace), although I don’t know if adding flags will scale well in the future.

One of the use cases we have is to deal with server side defaults. In this case the request would allow for an optional type but the response would ensure the type is set.

Something like

class Request(pydantic.BaseModel, frozen=True):
  # field would have a server side default
  # either coming from the db or something that is computed asynchronously and
  # cannot be expressed as a default factory  
  field: str | None = None

  # Many other fields that are not optional
  ...

and the response would be

class Response(Request, frozen=True):
  # field is guaranteed to have a non optional value
  field: str

Notes:

  • defining the class the other way around (Request inheriting from Response) does not work, as both mypy and pyright rightfully complain that “int | None” is not assignable to type “int”.

This seems like a way to make the issue of covariant type worse than it already is, and I’m very against that.

Using your definitions above, here’s where that becomes a problem:

def example(foo: type[Base]) -> Base:
    return foo()

example(Child)

I know there’s already problems with constructor safety, but there’s no reason to make that worse here.

There’s already a method to handle defaults that require calling something with field and default_factory, and those should be used here to fill this in.

The issue you present already manifests with any classes that override the constructor. So I don’t understand why that could be an argument to reject the proposal.

def example_obj(foo: type[object]) -> object:
    return foo()

def example_exc(foo: type[Exception]) -> Exception:
    return foo()

class MyException(Exception):
    def __init__(self, custom_message: str):
        super()
        self.custom_message = custom_message

print("isinstance(MyException(), object):", isinstance(MyException("x"), object))
print("isinstance(MyException(), Exception):", isinstance(MyException("x"), Exception))

example_obj(MyException)
example_exc(MyException)

That code is type checked by both pyright and mypy. However, as you point out it fails at runtime with TypeError: MyException.__init__() missing 1 required positional argument: 'custom_message'

Yes, and I consider type checkers not erroring for that case to be a mistake. See where I said “I know there’s already problems with constructor safety, but there’s no reason to make that worse here.”

Your use case here doesn’t seem like it’s worth making dataclasses even more complicated and less type safe, it’s already covered by using default_factory instead.

As I mentioned in my reply default_factory does not work when the value needs to be fetched asynchronously, for example from the database.

Also note that in the OP I also mentioned that we could adopt the strategy taken by pyright, as I think it is important that there is clarification on how to treat this case so that type checkers are consistent.

Another option to make this work without the pitfalls of type inconsistencies, is to have a mechanism to inherit field from another dataclass, instead of subclassing.

Something like

@dataclass
class Base:
  x: int | None = None
  ... # other fields

@dataclass(inherit_fields_from=Base)
class Child:
  x: int

in this way we could override certain fields, and we would not have any of the typing issues as there would be no inheritance relationship between the two classes.

I’m having trobule imagining the real world use case from what you’ve described here. If I was fetching data asynchronously from the database to fill defaults before responding, I’d have it before constructing a frozen dataclass that relies on that and can just pass it in. This sounds like there are other design issues that this is exposing, and that any number of those can be fixed instead of making dataclasses even less type-safe.

It’s possible to handle this with a sentinel value for “not provided by user” that is distinct from None (which might actually be a providable value, but is also frequently omitted in json apis) and structuring the relationship differently.

I don’t use pydantic, preferring msgspec, but msgspec comes with it’s own marker sentinel value for the purpose of handling “There needs to be a value here, but we need to know the user didn’t provide it so that any response handles that case”

Note that the use case I presented was using pydantic model classes. The scenario I’m mentioning is as follows:

app = fastapi.FastAPI()

class CreateEntityRequest(pydantic.BaseModel, frozen=True):
  optional_field: int | None = None
  ... # other fields

class EntityResponse(CreateEntityRequest, frozen=True):
  id: uuid.UUID
  optional_field: int

@app.post("/entity")
async def create_entity(body: CreateEntityRequest) -> EntityResponse:
  # result would include a concrete value for optional_field set by a server side db default
  result = await insert_entity_into_db(body) 
  return EntityResponse.model_validate(result)

@app.post("/entity_computed_default")
async def create_entity_computed_default(body: CreateEntityRequest, user_id: uuid.UUID) -> EntityResponse:
  if body.optional_field is None:
    body.optional_field = await get_optional_field_for_user(user_id)
  result = await insert_entity_into_db(body) 
  return EntityResponse.model_validate(result)

Also, I’d appreciate if replies were less dismissive and didn’t blame this on design decisions. Specially considering I already presented two different alternatives to deal with the type inconsistencies without making the code less type-safe.

The most recent code has an error, as this includes modifying a model declared as frozen, so unless pydantic is also incorrectly handling frozen models, this can’t be your real code that works at runtime.

It would help to have the full picture including all type declarations here because you are giving information that appears to conflict, and this is making it hard to imagine why you need a different default that would be incompatible. Even your most recent example, even when being charitable and removing frozen=True from it so that it should at least function, doesn’t seem to benefit from changing or omitting the default value.