Better typing for copy/replace functions that operate on dataclasses

dataclasses.replace() (and related, like model_copy() in pydantic, or .evolve() in attrs) copies a dataclass instance and lets you override selected fields, but the function’s current signature uses **changes: Any. As a result, static type checkers cant catch mistakes (e.g. typos in field names) and language servers can’t help with refactoring (Edit: This is technically incorrect as e.g. mypy has specialized plugins that do enable catching those errors)

One way to fix this is to introduce a type operator DataclassKwargs[T] that introspects a dataclass T and produces a TypedDict with one key per init=True field. All keys would be marked NotRequired, since when calling replace() you can provide any subset of fields:

from typing import Unpack, DataclassKwargs

def replace[T](obj: T, /, **changes: Unpack[DataclassKwargs[T]]) -> T: ...

With this signature, a type checker can verify that only valid field names and value types are supplied.


Alternatively, we could introduce AsTypedDict[T] that derives a TypedDict from a dataclass, and a generic OptionalFields[…] wrapper could mark all keys optional for “patch” objects. In that style replace() could be typed as:

from typing import Unpack, OptionalFields, AsTypedDict

def replace[T](obj: T, /, **changes: Unpack[OptionalFields[AsTypedDict[T]]]) -> T: ...

This is a bit more verbose, but AsTypedDict[T] and OptionalFields[T] could be used in other contexts.

Related: PEP 692 follow up: Unpacking compatibility with dataclass/others · Issue #1495 · python/typing · GitHub

5 Likes

I have bumped up against wanting to have a AsTypedDict before. So I would appreciate if this was added.

Based on the names, the difference between DataclassKwargs[T] and AsTypedDict[T] are that DataclassKwargs are the arguments for the init, whereas AsTypedDict typehints the todict() (or model_dump()) result?

I think there are actually even more potential differences between those.
Consider

from dataclasses import InitVar, asdict, dataclass, field


@dataclass
class Example:
    a: InitVar[str]  # Appears in __init__, not stored directly
    b: str
    c: str= field(init=False)  # Stored but not part of __init__

    def __post_init__(self, a: int) -> None:
        self.c = a + self.b


asdict(Example('x', 'y'))  # {'b': 'y', 'c': 'xy'}

I’ve never needed AsTypedDict in a context where DataclassKwargs[T] and AsTypedDict[T] would be different. I don’t know which would be more useful. And I don’t have great intuition for what replace should do in this kind of situation, so it’s probably not extremely important.

But regardless, something we/you should have an answer to.

I’m surprised with the dataclasses.replace example because it seems that it’s correctly handled by mypy:

import dataclasses


@dataclasses.dataclass
class Data:
    id: int
    name: str


data = Data(id=1, name='first')
data = dataclasses.replace(data, name='second')
data = dataclasses.replace(data, name=2)  # mypy: Argument "name" to "replace" of "Data" has incompatible type "int"; expected "str"  [arg-type]

I needed some kind of DataclassKwargs special type several times but I don’t know how much can already be handled by current generics.

For example it’s “easy” to have a function that types its parameters to be the same as a dataclass:

def _make_dataclass[T, **P](cls: Callable[P, T]) -> Callable[P, T]:
    def make(*args: P.args, **kwargs: P.kwargs) -> T:
        return cls(*args, **kwargs)
    return make

make_data = _make_dataclass(Data)

reveal_type(make_data)  # mypy: Revealed type is "def (id: builtins.int, name: builtins.str) -> module.Data"
data1 = make_data(id=1, name='first')
data2 = make_data(id=2, name=2)  # mypy: Argument "name" has incompatible type "int"; expected "str"
data3 = make_data(id=3, name='third', foo='spam')  # mypy: Unexpected keyword argument "foo"

It’s harder for the dataclasses.replace operation but seems doable as soon as you rely on cls.__replace__ method:

class _Dataclass[T, **P](Protocol):
    def __replace__(self, obj: T, *args: P.args, **kwargs: P.kwargs) -> T:
        pass


def _replace_dataclass[T, **P](cls: _Dataclass[T, P]) -> Callable[Concatenate[T, P], T]:
    def replace(obj: T, *args: P.args, **kwargs: P.kwargs) -> T:
        return cls.__replace__(obj, *args, **kwargs)
    return replace

replace_data = _replace_dataclass(Data)

reveal_type(replace_data)  # mypy: Revealed type is "def (module.Data, *, id: builtins.int =, name: builtins.str =) -> module.Data"
data = replace_data(data)
data = replace_data(data, name='second')
data = replace_data(data, name=2)  # mypy: Argument "name" has incompatible type "int"; expected "str"
data = replace_data(data, x=0)  # mypy: Unexpected keyword argument "x
2 Likes

Interestingly, if you use the __replace__ method that is synthesized for dataclasses since Python 3.13, then pyright catches the error:

import dataclasses

@dataclasses.dataclass
class Data:
    id: int
    name: str

data = Data(id=1, name='first')
data = data.__replace__(name='second')
data = data.__replace__(name=2)  # Argument of type "Literal[2]" cannot be assigned to parameter "name" of type "str" in function "__replace__" > "Literal[2]" is not assignable to "str"  (reportArgumentType)

EDIT: Oh, sorry, I see @entwanne made basically the same point above.

Oh you are right. Even if the signature for replace has not very tight typing info

there are plugins for mypy enable this feature for the common dataclasses libraries.
Pyright does not have plugins so it does not support this.

Wouldn’t it make sense to add this to the spec so users can get a consistent experience?

There are several distinct variations on “record types” in common use. attrs, dataclasses, pydantic, pyrsistent, and (arguable, but I call it one) TypedDict. Oh, and named tuple. Oh, oh, and you can define your own.

I think that being able to lift the (public?) API of a record into a callable type would be pretty useful. It does come up. But I’m -1 on a solution which doesn’t handle the majority of the flavors of records. And I can’t think up a solution which works for the broader category.

I think the idea that it be tied to init signature is intuitively correct, but wrong in many cases. I’ve had writable non-init parameters to dataclasses for sure.
We’d need to be able to define some mechanism by which we translate a type with attributes into a callable type. And you want to be able to explicitly exclude things if you’re trying to define a controlled API. To me it seems intractably hard to define a feature which is correct and useful.

Right, but only for __replace__, not replace, which is a bit odd.

__replace__ is called by copy.replace(), but it seems that typeshed currently defines the type of copy.replace() too broadly. There is a PR to fix it, but it seems stuck: Make `copy.replace` more strongly typed by decorator-factory · Pull Request #14819 · python/typeshed · GitHub

3 Likes

Would this need to be limited to dataclasses?
Couldn’t we have CallableKwargs[T] instead of DataclassKwargs[T], and then it could be used with any Callable (including dataclasses)?

There is some extra magic for dataclasses with the field declaration annotations, but I guess that ultimately it boils down to a callable. Related ask Method signature forwarding for subclasses · python/typing · Discussion #1079 · GitHub.