Treatment of `Final` attributes in dataclass-likes

This isn’t about Final. This is about what it means for something to be assigned a default value in a dataclass, and that’s for it to be a default value as an instance variable

class NonData:
    x: Any = 1  # classvar

@dataclass
class Data:
    x: Any = 1  # instancevar

The annotation isn’t what causes the behavior, and the language of Final in the spec says more than it should. It should just be left to the assignment semantics to determine if it’s a class or instance variable because that’s what actually determines it.

In case I missed something about any recent typing spec updates, when was this ever the case?

No type-checker I know of treats x here as a class variable (Pyright playground, mypy playground). Apart from Final (to my recent surprise) and dataclass-specific annotations (e.g. InitVar), there shouldn’t be any difference between dataclasses and normal classes in interpreting any kind of class-body annotation.

1 Like

That seems like there’s an issue with type checkers and class variables masking other issues then. That isn’t an instance variable.

>>> class NonData:
...     x = 1
...     __slots__ = ("x",)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable

There’s a definitive way to demonstrate this.

Ah. It seems like this is not reconcilable with PEP 526 then:

class BasicStarship:
    captain: str = 'Picard'               # instance variable with default
    damage: int                           # instance variable without default
    stats: ClassVar[Dict[str, int]] = {}  # class variable

Anything non-annotated with ClassVar is to be treated as an instance variable, according to the spec - this has been the case since forever.

If we take that at face value, then there should be no issue with Final, except that Final’s specification conflicts with 526 and the actual runtime behavior in some way Clearly Final is over-specified leading to something incorrect. I think 526 also warrants revisiting. ClassVar is likely still necessary for some expressiveness in stubs, but type checkers should be allowed to see assignment at a class scope and treat it as a classvar.

Edit: I’ve split that off into a separate thread here. Could be somewhat relevant to why people have the perception they have on this, but I think it’s a separate consideration than handling Final here.

I just noticed - actually, Pyre is the only type-checker which fully-implements the proposed behaviour. mypy doesn’t like this:

@dataclass
class A:
    a: Final[int]  # mypy: Final name must be initialised with a value

Implicitly delayed initialisation in the class constructor for a variable annotated with Final isn’t currently possible for dataclasses checked by mypy. Since mypy is my main type-checker, this is one of the reasons why I’ve never noticed anything different about dataclasses-specific behaviour with Final.

MyPy’s behaviour is correct here. PEP 591 states:

A final attribute declared in a class body without an initializer must be initialized in the __init__ method (except in stub files)

Here’s an example of a dataclass with a final instance attribute declared in a class body without an initialiser that is initialised in the __init__ method, which mypy is fine with:

@dataclass
class Item:
    x: Final[list[int]] = field(default_factory=list)

I agree with this interpretation, but Pyre isn’t conformant (and neither is Pyright), and if we read PEP 591 which also says this:

Type checkers should infer a final attribute that is initialized in a class body as being a class variable.

Then a Final initialised with a default (e.g. class A: var: Final = 0) can never mean an instance variable (and hence can never mean a dataclass field, because __init__s should never be defined for @dataclass), unless we special-case the spec for @dataclass.

Mypy’s behavior can be interpreted as spec compliant, but that’s not the same as being correct. This whole issue is that the spec as written may not be adequate for dataclasses and Final because Final specifies too much.

@dataclass
class A:
    a: Final[int]

This should be fine. The dataclass semantics, including explicit support for type checkers understanding dataclasses and dataclass-like things via dataclass_transform has this as equivalent to:

class A:
    def __init__(self, a: Final[int]):
        self.a: Final[int] = a

which is fine. Edit: Actually, mypy/pyright each compain about this and insist that this should be:

class A:
    def __init__(self, a: int):
        self.a: Final[int] = a

either way here though, something in the spec isn’t composing well and we should be looking at how to remedy this to get the most user-desirable outcome,

But in this example, MyPy considers a to be a dataclass field (following the runtime behavior). And that means MyPy knows that dataclasses will synthesize an __init__ method that unconditionally assigns to a. MyPy is fine with the exact desugared equivalent in a non-dataclass. So it seems to me this MyPy error should be considered a bug (in the dataclass case.) This behavior of MyPy would only be correct if MyPy did not treat this a as a dataclass field.

It certainly highlights the lack of clarity in the spec here, and the need for a clarification in some direction or another!

To be clear, I said “annotated assignments behave differently in dataclass bodies.” The emphasis is not on “the assignment” but on “annotated assignment” as a syntactic construct (literally the AnnAssign AST node.) And the way in which they (already!) behave differently is precisely in what type information can be inferred from the annotation on that annotated assignment. When we see x: int = field(...) in a dataclass body, we cannot infer that field(...) returns an int, although anywhere else we would consider that an obvious requirement for the code to type check. What’s more surprising, we can’t even infer that an x attribute will exist on the class at runtime at all! (The attribute will exist only on instances.)

I think this is a strong argument; thank you for stating it so clearly. I agree that all else equal, it would be preferable if seeing x: Final[int] = 3 were a universal guarantee that the value of x can never be other than 3.

But given that annotations in dataclass bodies already have surprising implications, I am not so sure that achieving this goal is worth breaking currently-working code.

And of course, my whole purpose here is to clarify the spec, so that “someone who is familiar with the Final spec” would have clarity on what to expect.

This is possible. I don’t suspect that it’s super common, but it doesn’t have to be all that common to cause a lot of churn. At the moment, I don’t think either of us has hard data on its frequency. It’s a bit tricky to collect such data, since it’s a contextual usage and instances can’t be found by a line-at-a-time code search. I’ll see if I can construct a working search for instances of it. At the moment all I have is intuition from years of observing CPython development, which tells me that assertions that “nobody relies on this existing behavior, it will be painless to remove it” are very rarely borne out in practice.

You’ve mentioned this a couple times. I don’t think I quite understand what you mean. Under this proposal, I don’t think Final has any more responsibilities in a dataclass than it has in any place where it is used to indicate a final instance attribute. Consider this spec-compliant non-dataclass example of a final instance attribute:

class Item:
    x: Final[int]
    
    def __init__(self, x: int) -> None:
        self.x = x
        
class SubItem(Item):
    x = 4
    
def f() -> None:
    item = Item(3)
    item.x = 1

MyPy correctly errors on both the line x = 4 in SubItem, and on the line item.x = 1 in f. In other words, it is already correct for a final instance attribute to prevent both subclass override and instance assignment. So I don’t think I’m proposing more responsibilities for Final in a dataclass than it already has anywhere else it is used on an instance attribute.

That makes sense, thanks for clarifying your proposal.

@dataclass
class C:
    x: Final[int] = 3

A person familiar with Final spec should say:
x is a constant class variable with the value of 3.

A person familiar with @dataclass spec should say:
x is a dataclass field of type int and the default value of 3.

A person familiar with both…

1 Like
class C:
    x: Final[int]

    def __init__(self, x: int) -> None:
        self.x = x

Should this imply x is not overridable in subclasses? What would an override even mean in this context? A subclass not being able to set x by itself, but having to call super().__init__(x=...)?

According the spec, yes.

There must be exactly one assignment to a final name.

Additionally, a type checker should prevent final attributes from being overridden in a subclass:

Both mypy and pyright agree that both SubItem1 and SubItem2 are in error here:

class Item:
    x: Final[int]
    
    def __init__(self, x: int) -> None:
        self.x = x
        
class SubItem1(Item):
    def __int__(self, x: int) -> None:
        self.x = x
        
class SubItem2(Item):
    x = 4

(And that doesn’t change if we use self.x: Final[int] = x inside __init__, instead of the class-level annotation.)

I think “override” here means a class-level assignment, as in SubItem2 above. SubItem1 is wrong due to the “exactly one assignment” requirement. And yes, it should instead call super().init(...) or Item.__init__(...) to initialize x.

1 Like

A person familiar with both would reasonably interpret that the order of operations is pretty clear here and can compare the runtime behavior.

The dataclass transform transforms the class level annotations into equivalent instance ones. That’s how decorators on classes work, they take something, and modify it. The behavior of that decorator is specified and very special cased already because otherwise:

@dataclass
class C:
    x: list[int] = field(default_factory=list)

would fail as a dataclass field isn’t consistent with list[int]

It’s very hard to take people seriously about this being unclear.

In the interest of actually getting this to a point of agreement, can we just say that the dataclass transform is assumed to take place before looking at any annotations and that so long as an annotation refers to something which has a generated parameter in __init__, then the annotations are considered as if they were in __init__ ? This practically matches what the desugared equivalent or handrolled versions would be.

@dataclass
def X:
    a: int = 1
    b: list[int] = field(default_factory=list)
    c: Final[int] = 1

# vs

class X:
    # This isn't quite what is generated for fields, but illustratively
    def __init__(self, a: int = 1, b: list[int] = MISSING, c: int = 1):
        self.a: int = a
        self.b: list[int] = b if b is not MISSING else list()
        self.c: Final[int] = c
1 Like

I don’t really understand why you’ve insisted on a difference between class-level annotations and instance-level annotations, and saying that annotations in a dataclass brings things from class-level annotations to instance-level annotations, to me, doesn’t mean anything.

Instance-level annotations, if you mean instance variable annotations specified on self variables specifically in __init__,

  • Have never been specified by any spec as any different from non-ClassVar and non-Final annotations on a class body, which means they should be exactly equivalent;
  • Have been treated by mypy as a second-class citizen since forever, with no obvious complaints or requests to fix, indicating their lack of frequency out in public code;
  • Strongly couples the shape of a class to the implementation of their initialisation procedure (pyre playground), which also means it’s impossible to specify the equivalent in .pyi stubs;
  • Have no equivalent in __new__ (mypy, Pyright, Pyre), which is an absolutely valid way to define and initialise classes.

Your non-dataclass equivalent should be illustrated from just the class shape, as would be done in stubs (I’ve avoided using the actual values here, because of the complications of mutable defaults):

class X:
    a: int = ...
    b: list[int] = ...
    def __init__(self, a: int = ..., b: list[int] = ...) -> None: ...

In the example above, I haven’t included c, because that’s where the contention lies. In the following example, we’re trying to decide

@dataclass
def X:
    a: int = 1
    b: list[int] = field(default_factory=list)
    c: Final[int] = 1

should look like which of the following shapes:

  1. The proposal in given in this thread:
    class X:
        a: int = ...
        b: list[int] = ...
        c: Final[int]
        def __init__(self, a: int = ..., b: list[int] = ..., c: int = 1) -> None: ...
    
  2. My interpretation:
    class X:
        a: int = ...
        b: list[int] = ...
        c: Final[int] = ...
        def __init__(self, a: int = ..., b: list[int] = ...) -> None: ...
    

Because that’s the behavior of dataclasses. dataclasses are a place where type checkers have to special case things already. That’s the tradeoff we have for a short form supported by type checkers. It isn’t even unique in nature, type checkers have special logic for Enums in this manner as well.

1 Like

I think this is useful, but not quite right. Note that there is no X.a or X.b at runtime either when a dataclass is used (the assignment specifies only a field default value, not a class attribute of any kind), so the more accurate options we are choosing between are these:

  1. The proposal in given in this thread:
class X:
    a: int
    b: list[int]
    c: Final[int]
    def __init__(self, a: int = ..., b: list[int] = ..., c: int = 1) -> None: ...
  1. Your preferred interpretation:
class X:
    a: int
    b: list[int]
    c: Final[int] = ...
    def __init__(self, a: int = ..., b: list[int] = ...) -> None: ...

I think we’ve both made an error here - I’m getting (Python 3.11) an attribute on X.a and no attribute on X.b at runtime.

1 Like