Treatment of `Final` attributes in dataclass-likes

(redirection from my post at pyright)

According to the docs on type qualifiers (specifically the ImmutablePoint example), attributes annotated with Final within class body are to be interpreted as:

  • class-level constants, if they have a value assigned,
  • instance-level constants, if the value is assigned in __init__.

Considering following definition:

from dataclasses import dataclass, field
from typing import Final

@dataclass
class Example:
    foo: Final[int]
    bar: Final[int] = field()

Currently, a field with a default value or with the field() function will be treated as a class variable.

This is reflected by pyright:
Cannot assign member "foo" for type "Example"
  "foo" is declared as Final and cannot be reassigned
    Member "__set__" is unknown

Cannot assign member "bar" for type "Example"
  Member "bar" cannot be assigned through a class instance because it is a ClassVar
  "bar" is declared as Final and cannot be reassigned
    Member "__set__" is unknown

Since assignment to fields is used to give them a default value, or to customize them via the field() function, and Final fields within dataclass-likes become normal instance attributes (unlike ClassVars, which do not become fields), I believe it makes sense to treat Final attributes in dataclass-likes as instance-level constants, despite the assignment to them.

While I don’t think the motivating example is super strong (this seems a better use for just freezing the dataclass), this makes sense to support, these annotations clearly refer to instance variables in the way dataclasses exist.

I also don’t think there’s anything unclear currently, this should just work.

I’m okay with additional clarity being added for this, but I’m not sure in what way the existing specification and intent are unclear about this, this appears obvious from intent and actual behavior. If we can’t assume that explicit support for dataclasses being added to typing means supporting instance variables of them, then there’s a sisyphean task of all sorts of other things that are no longer clear enough.

1 Like

Accept Final as indicating ClassVar for dataclass · Issue #89547 · python/cpython · GitHub is a very relevant issue. I think @carljm comment here is good overview and that for dataclasses Final with an assignment should be a special case and better reflect runtime. At runtime today,

@dataclass
class Example:
    foo: Final[int]
    bar: Final[int] = field()

bar is already a normal dataclass field and is not class variable constant. So dataclasses are exceptional here compared to normal class.

5 Likes

I can draft a typing spec change PR for this. At the risk of repeating what I’ve already said on the GitHub issue linked above, here is a summary of the change I will propose, and the rationale for it. (I plan to consider this the pre-requisite discussion thread before filing a typing spec change PR.)

The proposed change to the spec (at Type qualifiers — typing documentation) is to clarify that within a dataclass (or dataclass-like via dataclass_transform) body, just as the int annotation in x: int = field(...) applies to the type of the instance attribute, likewise the Final in x: Final[int] = 3 or x: Final[int] = field(...) also applies to the instance attribute, therefore it does not imply ClassVar, it creates a dataclass field whose instance attribute x cannot be assigned to outside of __init__ (with a default value of 3, in the first case.) This implies that Final[ClassVar[...]] should also be allowed in dataclass bodies, so that it remains possible to specify a final classvar on a dataclass.

I will also probably include an example of this in the Dataclasses section of the spec, in my PR.

The rationale for the spec clarification is as follows:

  1. The proposed behavior matches the existing runtime behavior of the dataclasses library, in which x: Final[int] = 3 creates a normal dataclass field with a default value of 3; it is not treated as a ClassVar. (If it were, that would necessarily imply that the dataclass would not have a field named x, since ClassVars are excluded from consideration as fields.) Changing this runtime behavior would be a serious backwards-compatibility break.

  2. Mypy (playground) and Pyre (playground) both already implement the proposed behavior (and AFAIK always have). Pyright (playground) currently seems a bit unclear about the status of x in this example. It accepts (in checking calls to the dataclass constructor, at least) that x is a field in the dataclass (which it should not be, if it were treated as a ClassVar), and throws an error only on the same line as Mypy and Pyre do (an attempt to re-assign the x attribute on an already-constructed dataclass), but the error message suggests that Pyright believes that x is a ClassVar.

  3. The proposed behavior is more consistent with the general treatment of type annotations in dataclass bodies, in which the annotation always applies to the type of the eventual instance attribute, not to the immediate assignment. (Otherwise x: int = field(...) could not work.) In other words, annotated assignments in a dataclass body form part of the mini-language for specifying the shape of instances of the dataclass; they are not treated in the same way as annotated assignments in a normal class body.

  4. The proposed behavior, in addition to being more consistent and more compatible with the de facto status quo, is also more flexible, because it permits a clear spelling for both final instance attributes and final classvars on dataclasses. The alternative interpretation suggested by the current spec would make it impossible to have a final instance attribute on a dataclass (barring a verbose workaround like hand-writing the __init__ method, which defeats the point of using a dataclass.)

Feedback welcome, before I actually draft the typing spec PR!

6 Likes

Originally it was simply a small repro to show pyright’s behavior
My use case mixes mutable & immutable fields, which frozen dataclasses cannot do (+ I am using attrs, but the API’s same in given context)

1 Like

Is this proposal just about making read-only instance attributes? Not preventing overriding or redeclaration of fields in subclasses which are also dataclasses? Or is it both of these?

Does this proposal also envision affecting dataclass-like decorators/metaclasses/subclasses defined by PEP 681 – Data Class Transforms?

1 Like

For this use case, it might make more sense to re-use the ReadOnly qualifier from PEP 705 (which I believe was recently accepted).

@dataclass
class A:
   x: int  # mutable
   y: ReadOnly[int]  # immmutable
2 Likes

All of the documented effects of Final should also apply to dataclasses, including the prohibition on subclass overrides.

Perhaps another way to clarify the proposal is this. The code

@dataclass
class Item:
    x: Final[int] = 3

should be understood by the type checker as if this had been written (this is simply expanding the dataclass transform, eliding the added methods other than __init__):

class Item:
    def __init__(self, x: int = 3) -> None:
        self.x: Final[int] = x

This also means that a subclass (dataclass or not) cannot override the x field, or define a manual __init__ method that assigns to self.x.

Mypy, Pyre, and Pyright all already agree on this anyway (though Pyre fails to error on a subclass __init__ assigning to x, and Pyright emits some extra errors relating to it considering x a ClassVar on the base class), so I don’t think it’s a change to the spec, but it will still be good to also clarify this in the spec PR (and conformance tests). Thank you!

Yes, dataclass transforms should be consistent with dataclasses, so the proposed behavior of Final should apply to any PEP 681 dataclass-like. In the playground examples linked in my previous post, mypy and pyright already behave the same if I use a custom dataclass_transform instead of dataclasses.dataclass decorator. Pyre doesn’t seem to support dataclass_transform.

I think expanding ReadOnly to non-structural types would be a major expansion that would definitely require a new PEP.

4 Likes

I agree with you in that I’d rather use ReadOnly over Final as the name conveys the intent better, but Final has already been given a meaning in this context, so I wouldn’t expect this (breaking) change to take precedence over correcting existent behavior.

As a side note, I think ReadOnly would have a little different meaning, in that Final cannot be reassigned (like a const), while ReadOnly could not be reassigned from outside of the object’s body (but mutable through methods).

Thanks for clarifying the intention, especially with regards to this example:

class Item:
    def __init__(self, x: int = 3) -> None:
        self.x: Final[int] = x

This being said, I also think the re-use of Final as proposed here is too overloaded. The motivation to special-case Final in dataclasses isn’t too compelling on the basis of what it’s trying to prevent (instance.var = obj doesn’t work with type-checkers regardless of whether var: Final means var: ClassVar or the read-only instance variable proposal here).

IMO, implementing one of these (then deprecating and eventually removing the ability of Final to create dataclass fields) is a better solution:

Preventing Final from meaning both (1) read-only instance variables and (2) non-overridable dataclasses fields in subclasses would allow the future ability to specify field defaults in subclasses without affecting read-only status.

Concept draft
import typing_extensions as _t
import dataclasses as _dc

@_dc.dataclass
class Data:
    a: _t.ReadOnly[int]  # Read-only dataclass field with no default
    b: _t.ReadOnly[int] = 0  # Read-only field with default
    c: int = 0  # Normal field
    d: _t.Final[int] = 0  # Non-overridable class variable
    e: _t.Final[_dc.Field[int]] = 0  # Non-overridable field, instance variable is **not** read-only
    f: _t.Final[_t.ReadOnly[_dc.Field[int]]] = 0  # Non-overridable field which is read-only in instances

@_dc.dataclass
class Sub(Data):
    b: _t.ReadOnly[int] = 1  # Default changed, instance variables are still read-only
    c: _t.ReadOnly[int] = 1  # Mutable override error
1 Like

I agree that Final isn’t appropriate for what’s being proposed.

After ReadOnly gets extended out of TypedDict, it would cause confusion, since the main difference between Final and ReadOnly would be that Final is non-overridable, and ReadOnly is overridable.

1 Like

Final seems appropriate to me.

dataclasses and their ilk are essentially syntactic sugar that turn class annotations into instance variables. The exact method of doing that varies from library to library (hence dataclass_transform to just inform type checkers that a library is doing something like this) Final wasn’t specifically spelled out, but it really shouldn’t need to be if people just think through it. class annotation in a dataclass causes this to be a generated instance variable. Final is correct for an instance variable (see the example equivalent __init__s above) so it should be correct here.

Class annotations in any class are supposed to be instance variables, unless ClassVar wraps the annotation; the only difference with @dataclass is that a constructor is also generated.

Given that all annotations on a class (dataclass-like or not) refer to instance variables by default, it’s not immediately obvious to me why Final[T] should mean something different in a class body if it’s dataclass-like.

typing documentation states

Type checkers should infer a final attribute that is initialized in a class body as being a class variable

Thus, currently a field with a default or with field() is treated as a final class variable.

Final and dataclasses were clear on what the intent of their mechanisms are. Final was over-specified, sure. But I don’t see how the actual intent or meaning is unclear here. Final could be changed to not say anything about treating it as a classvar or not and let type checkers figure that out automatically from what the actual behavior is in the actual context, rather than adding rules to Final when it isn’t about the annotation whether something is a classvar or instance variable

It’s not clear to me whether you’re talking about dataclasses here.

If yes: I wasn’t aware that the specification of Final said anything about dataclasses. Where is that?

If no: Final doesn’t have any behavior. It’s only an annotation. There wouldn’t be anything to figure out.

Neither a field with a default or made by field() is a final class variable.

I also think there’s some disagreement with what constitutes a class variable and instance variable here. They are not mutually exclusive; in the following examples in the classes A and B, do you consider attr to be an instance variable, class variable, both, or neither?

class A:
    _var: int

    @property
    def attr(self) -> int: return self._var
    @attr.setter
    def attr(self, val: int) -> None: self._var = val

def _b_setter(self: "B", val: int, /) -> None:
    self._var = val

class B:
    _var: int
    attr = property(lambda self: self._var, _b_setter)

My interpretation is that attr is a read-and-write instance variable of type int. property being used here is an implementation detail, but if you access attr directly from the class, it is also a class variable of type property.


This is how I would interpret dataclasses.Field as well. Something like @dataclass could have been implemented using descriptors like property, and have assignments in the class body, but this is an implementation detail. var in this example has both a class variable API and an instance variable API:

Pyright Playground (pyright-play.net)

import typing as _t

class field[T]:
    def __init__(self, *, default: T, default_factory: type[T] = ...) -> None: ...
    @_t.overload
    def __get__(self, instance: None, owner: _t.Any, /) -> _t.Self: ...  # Or -> _t.Never if the variables are supposed to be hidden
    @_t.overload
    def __get__(self, instance: _t.Any, owner: type[_t.Any], /) -> T: ...
    def __set_name__(self, owner: type[_t.Any], name: str, /) -> None: ...

class A:
    var = field(default=0)

if _t.TYPE_CHECKING:
    reveal_type(A().var)  # `int`
    reveal_type(A.var)  # `field[int]` or `Never`

Writing var: int = ... in a dataclass, to me, means the same as var: ClassVar[Field[int]] = ... (at least in an API sense; not sure about runtime). There’s a dual identity as a class variable descriptor and an instance variable here, which matches the behaviour of field(...) at runtime. You can also forget about Field being also a class variable if you’re only interacting with instances.

Hence, the use of Final[int] meaning a dataclass field is rather confusing to me - IMO, Final[Field[int]] spells the intention more clearly if we wanted Final to mean non-overridable field in subclasses, and would match the behaviour of Final on normal classes.

Thanks for the feedback!

“Isn’t appropriate” in what sense? Final instance attributes are already well established by PEP 591 and in wide use; this is not a new use or a new interpretation of Final. This proposal only clarifies the meaning of a Final annotation in a dataclass, where both possible interpretations (final instance attribute, final classvar) already have well-defined semantics and established uses outside of dataclasses.

That is already true, and will remain true no matter how we resolve this particular ambiguity with dataclasses. Final instance attributes are not introduced by this proposal. They already exist, and they already have exactly this similarity to a hypothetical future ReadOnly-for-instance-attributes.

I’m sure this similarity would be an active point of discussion on any future proposal to extend the use of ReadOnly, but nothing about it is changed by the question at hand here.

I understand your point of view, and I agree that an internally-consistent argument can be made for it, in the abstract.

I can’t agree with your weighing of the tradeoffs, though. Deprecating and removing final dataclass fields (which currently work in the runtime and in all major typecheckers, and have done so since PEP 591, and have clear, unambiguous, and useful semantics matching the behavior of final instance attributes in any other class) is a very high price to pay. Any proposal that includes this deprecation must clearly demonstrate that it is far superior to the alternatives, in order to justify this cost to the ecosystem.

Your proposal appears to be that people who are today successfully using Final[int] should be asked to migrate to Final[ReadOnly[Field[int]]] to achieve the same thing. This is not even possible until after a new PEP to expand ReadOnly (which may or may not be accepted) and then either a second PEP or a typing spec change (plus implementations in all type-checkers) to promulgate a new interpretation of dataclasses.Field in dataclass annotations (and this new interpretation is itself backwards-incompatible, requiring its own deprecation process). Only after all of that is in place could we even begin the multi-year deprecation process of Final[int] creating a dataclass field.

And once we have gone through all of that: is it intuitive for an annotation of x: Field[int] = 0 to mean the same thing as x: int = 0, but x: Final[Field[int]] = 0 to mean something different from x: Final[int] = 0? To really be internally consistent, your proposal implies that the canonical annotation spelling for all dataclass fields should be x: Field[...]. (Effectively, you are proposing this as the mirror image of my proposal that explicit ClassVar should always be required to specify a non-field.)

Nothing in this proposal prevents that future ability. The only difference would be that Final[ClassVar[...]] would be required to spell a final classvar on a dataclass-like.

I think the concern about confusion over differing interpretations of Final in dataclass bodies is relevant, but over-stated. The behavior proposed here has already been the status quo ever since PEP 591, and we have not seen a wave of confusion resulting from it. The behavior is intuitive and useful in practice. (It is intuitive enough that multiple type-checkers implemented it that way, even in apparent contradiction to the text of PEP 591, without even raising it as a point for clarification.)

It is already clear as soon as you see x: int = field(...) in a dataclass body that annotated assignments behave differently in dataclass bodies than in other classes. Specifying that all annotations on dataclass fields that are not explicitly wrapped with ClassVar are taken to apply to the assignment to the instance attribute in the generated __init__ is a simple, consistent rule that simply codifies existing relied-on behavior.

3 Likes

If it’s obvious to most other users what Final means in a dataclass, then I’ll concede to being the outlier. In a class body, I’ve consistently interpreted it as non-overridable ClassVar everywhere, which is compliant with the current documentation. If Final meant anything else hidden away in a dataclass implementation, I would have interpreted it as a ClassVar anyway. The usage here was very surprising to me; if I didn’t pay attention, the fact that it’s an instance variable in dataclasses would have totally skipped my notice, because type-checkers force you to provide a default so you don’t need to pass it to the constructor. I think claiming that this interpretation is status-quo is a stretch, given how easy it is to miss.

About the tradeoff. If, in your experience, many lines of code have used it as a way of indicating a read-only variable, and not anything else, I can believe this, in which case the cost is to switch the name from Final to ReadOnly or some other name (if accepted in a PEP) (I’m not making any claims about whether the costs would be high or low). If you say that many lines of code have used this to indicate a field shouldn’t be overridden in subclasses, or both read-only variables and non-overriding in subclasses; well, I don’t think there’s any way of evaluating this, because Final apparently has too many responsibilities in dataclasses.

Not quite. I’m proposing that x: ClassVar[Field[int]] = 0 to mean the same thing as x: int = 0, and x: Final[Field[int]] = 0 to mean the non-overridable version that must be explicitly specified as Final[Field[int]] (again - because of the current state of affairs, it’s hard to evaluate whether Final as used is supposed to mean read-only or non-overridable or both. I’m guessing the non-overridable functionality is used less than the read-only functionality, so I’m suggesting the verbose spelling here for the rarer usage, but I may be completely off). This matches hand-crafted implementations of descriptors.

To me, x: Field[int] = 0 is a usage error, because instance variables shouldn’t be of type dataclasses.Field.

I think we’re all in agreement that @dataclass does some magic under the hood, with the potential to create certain inconsistencies. I’m just proposing that the attribute-access portion could be less magical - alot of it is something that you can do in normal classes already with descriptors.

I’ve given my concerns, and I don’t think I have anything else objective to add - so it’s up to what other people think now.

It seems to me that you’re focusing on the perspective of writing code and using the language features.
The effect on reading and understanding I think should be the bigger concern.

The only place I can see you mention anything close to reading the code, turns out to be not relevant:

It is already clear as soon as you see x: int = field(…) in a dataclass body that annotated assignments behave differently in dataclass bodies than in other classes.

The behavior of the assignment is not the issue. The issue is in what type information I have about x. And that doesn’t look like it should be different from any other class.

If someone who is familiar with the Final spec sees:

class C:
    x: Final[int] = 3

That tells them x should only ever be 3, whether they get it from any instance, or from the class, or from any subclass.

To read this:

@dataclass
class C:
    x: Final[int] = 3

and have to understand that that means something totally different about what x could give them,
that just seems disastrous to me.

This has nothing to do with how “assignments behave”.

The behavior proposed here has already been the status quo ever since PEP 591, and we have not seen a wave of confusion resulting from it.

I think the only reason we haven’t seen a wave of confusion is how little it’s used (which is also why it wouldn’t be a “very high price to pay”).

1 Like