`@dataclass_transform()` and `replace`

Viicos · October 24, 2024, 2:27pm

Currently, both mypy and pyright synthesize a __replace__ method for classes decorated by @dataclass_transform(). I believe this shouldn’t be the case, each library should be free to implement the method? Maybe the following could be done:

if the class (or one of its bases) define a __replace__ method and the dataclass semantics are relevant for this class (thanks to the decorator), then synthesize a __replace__ method following the same logic from stdlib dataclasses.

NeilGirdhar · October 25, 2024, 4:06pm

How would you set the type annotation for a user-defined method?

Viicos · October 25, 2024, 7:22pm

You can’t statically type it correctly as the signature depends on each subclass of your dataclass-like type.

In Pydantic, we defined it as def __replace__(self, **changes: Any) -> Self.

But this isn’t really relevant to the original issue, being that __replace__ is synthesized whenever @dataclass_transform() is applied, no matter if the dataclass-like type implements it.

mikeshardmind · October 25, 2024, 7:30pm

Reading the living specification:

Except where stated otherwise, classes impacted by dataclass_transform , either by inheriting from a class that is decorated with dataclass_transform or by being decorated with a function decorated with dataclass_transform , are assumed to behave like stdlib dataclass().

This includes, but is not limited to, the following semantics:

(and then gives some examples)

I don’t see an exclusion of behavior (ie. it isn’t otherwise stated that __replace__ is excluded) for __replace__ here, and I think the ability to replace is integral to using frozen dataclasses and to have predictable and ergonomic behavior when using them. Does pydantic have a reason not to support __replace__ ?

Jelle · October 25, 2024, 7:44pm

We could add a parameter to @dataclass_transform to control this behavior, e.g. @dataclass_transform(supports_replace=False), to accommodate users who don’t want to support this feature. Though as a user I’d find it nice if pydantic did support copy.replace.

Viicos · October 25, 2024, 7:48pm

Thanks, I missed this part of the spec. We do implement __replace__. The reason I raised this discussion is a bit unrelated; we had the following report: Subclassing Pydantic models with Python 3.13 and mypy · Issue #10699 · pydantic/pydantic · GitHub, giving an annoying and confusing error with mypy, hard to suppress because the method is synthesized and doesn’t appear anywhere in user code:

from typing import Literal

from pydantic import BaseModel

class MyBaseClass(BaseModel):
    request_type: str

class MyInheritedClass_A(MyBaseClass):
    request_type: Literal["Create"]

class MyInheritedClass_B(MyBaseClass):
    request_type: Literal["Delete"]

No error is raised by mypy regarding the request_type Literal annotations being incompatible with MyBaseClass, I assumed for pragmatic reasons. However, you do get a LSP error regarding the __replace__ method.

As the spec clearly specifies that it is assumed to behave like stdlib dataclasses, I’m ok keeping the current behavior. Maybe mypy could avoid raising a LSP error for such synthesized methods?

NeilGirdhar · October 25, 2024, 8:07pm

It sounds like mypy should instead raise an error for the incompatible derived class? Unless your base model is frozen?

Viicos · October 25, 2024, 8:21pm

Only if explicitly enabling the mutable-override error code, which isn’t common (it isn’t even enabled in strict mode).

mikeshardmind · October 25, 2024, 8:35pm

The error I’m seeing here is one I agree with and applies more broadly than that error code. As I understand it, the point of dataclass_transform is to not manually write all of these things, but that they are part of the interface, that includes synthesized methods that are incompatible as a result of things that are only incompatible in the presence of that method.

NeilGirdhar · October 25, 2024, 8:37pm

Okay, I understand how you see things. This is a MyPy convenience that, for example, Pyright doesn’t share.

We agree that the way you’ve defined the subclasses is an LSP violation (even if MyPy lets you suppress that error). The type checkers are right that replace doesn’t work because of this error.

I think if we’re talking about fixing things, we should aim to fix the actual problem. I don’t think the right answer is to try to silence the error on replace. Instead, we should push to make it so that you can define the class without the LSP violation. For example, by defining that field to be frozen (and therefore probably removed from __replace__).

For now, since there’s no way to do that, have you considered either:

annotating the derived class’s request_type as str also, or
changing the request_type to be a method returning str in the base class and Literal[...] in the derived class?

I realize that’s not perfect, but the LSP violation is also not perfect even if you can somewhat suppress it.

mikeshardmind · October 25, 2024, 8:42pm

Realistically, if we’re looking at “today solutions”, this probably shouldn’t be using inheritance at all.

from pydantic import BaseModel

class MyNotInheritedClass_A(BaseModel):
    request_type: Literal["Create"]

class MyNotInheritedClass_B(BaseModel):
    request_type: Literal["Delete"]

class Unknown(BaseModel):  # this one's optional
    request_Type: str

type Options = MyNotInheritedClass_A | MyNotInheritedClass_B | Unknown

obviously, any shared behavior can be kept, but this field isn’t actually shared and shouldn’t be on the base.

Viicos · October 26, 2024, 9:05am

I’ll note that this isn’t my code, so I can’t really answer your question. This was raised by a user here, and I’m afraid we’ll get more reports in the future from users using mypy (as this is a common pattern, as noted in the mypy docs) when more and more users are switching to Python 3.13.

Theoretically, it makes sense for mypy to raise the LSP violation for the synthesized __replace__ method. However, I think it is debatable whether it should from a practical perspective: a good part of the users are unaware of the LSP principle, even more users don’t know about the __replace__ protocol and the fact that mypy synthesize the method for dataclass-like types. If mypy keeps the current behavior, I think a better error message should be raised because 99% of users will have no idea what’s going on.

beauxq · October 26, 2024, 11:58am

I don’t see how copy.replace could be safe.

import copy
from dataclasses import dataclass
from typing import Literal, assert_type


@dataclass(frozen=True)
class B:
    x: str


@dataclass(frozen=True)
class C(B):
    x: Literal["c"]


def foo(b: B) -> None:
    b2 = copy.replace(b, x="um...")
    if isinstance(b2, C):
        assert_type(b2.x, Literal["c"])  # type checker says good
        assert b2.x == "c"  # runtime says bad


c = C("c")
foo(c)

I think it makes sense for mypy to not report problems with __replace__ because it can’t be safe anyways - maybe if all the members are not readonly, then it’s ok to report it - but only along with mutable-override.
If a type-checker wants to be really strict, it should warn on any usage of copy.replace

mikeshardmind · October 26, 2024, 12:35pm

Might need a stub, but copy.replace could be safe. just requires a protocol with paramspec on __replace__ to describe properly. typecheckers not doing anything there yet is separate from this

NeilGirdhar · October 26, 2024, 1:22pm

The problem isn’t with replace. The problem is with the class definitions, which are themselves an LSP violation. The replace method is just revealing that problem.

The only reason that the replace method is giving a surprising error is because you’ve turned off the more helpful error of the LSP error in the class definition. Going down this rabbit hole of turning of errors is only going to cause even more confusion when users add other methods that set to these incorrectly-defined fields. Suppressing these errors in turn is, in my opinion, a bad solution.

I understand, but I suggest you simply steer them away from LSP violations. You can’t narrow a writeable field in a subclass. I think the method with a narrowed return value is the simplest solution.

MegaIng · October 26, 2024, 1:34pm

Are they? The classes are defined as frozen and therefore can’t be modified. Constructors are generally expected to violate LSP - otherwise no class could have any constructor that differs from object’s constructor.

__replace__ is such an alternate constructor and therefore doesn’t follow LSP - but I don’t know how type checkers should deal with this. Probably shut up, the same way they do for type(b)(x = "b") (using the same situation as in the last post).

NeilGirdhar · October 27, 2024, 12:58pm

Ah! I asked about that in this thread and wasn’t sure. If they are frozen, you are right that there is nothing wrong with the classes.

Right, that’s interesting.

Incidentally, I’ve often considered proposing that alternate constructors (like class method factories) be able to be marked in such a way that they don’t obey LSP, but I haven’t because I thought that this would be misused by people who don’t want to fix their real LSP errors.

Maybe you’re right that LSP errors on replace’s definition should be suppressed. However, unlike ordinary constructors, it’s easy to call replace on a child class using the parent class’s replace method’s type annotation (and therefore invariants)—which is what the error is saying.

I know that it would probably be rejected, but an alternative would be to synthesize invariant verification assertions in the generated replace methods. Then there would be no worry about LSP violations since the subclass invariants would be checked. More costly though, and potentially difficult depending on the types specified.

MegaIng · October 27, 2024, 1:08pm

At least the latest example is, but the previous examples from pydantic weren’t. You are also correct that this massively changes what is and isn’t safe.

mikeshardmind · October 27, 2024, 1:14pm

It’s not going to be safe to just exclude based on being frozen, replace is there so that copy.replace is an efficient copy with changes and is allowed to change frozen fields in the copy (and that’s going to be “part of the purpose” for people using frozen dataclasses for immutable data to get a copy with changes, so I don’t think we can exclude these fields from replace either)

mikeshardmind · October 27, 2024, 1:16pm

This could change in the future similarly to what’s been proposed to improve the usability of Hashable, where object is treated as the special case, rather than the method that is problematic on object being excluded from LSP

`@dataclass_transform()` and `__replace__`

`@dataclass_transform()` and `replace`