Possible modification to ClassVar

Daverball · March 1, 2024, 1:44pm

You would probably need to specify some additional type consistency rules once/if we allow ReadOnly attributes, which would usually be safe to override with a subtype in subclasses, but not when going from 1/3 to 2, since the class and instance attributes would then no longer be consistent with one another, forcing type checkers to track them separately^[1].

Right now we only get away with simpler rules because attributes are usually invariant (or at least should be).

Another case to consider is consistency with property and other descriptors. I think we can use the same rules as for __slots__ there.

not that that is necessarily a bad thing, but it would be a larger change and require careful consideration ↩︎

mikeshardmind · March 1, 2024, 1:54pm

We already have read only attributes with use of Final. I think that Final should continue to imply classvar at a class level (this is the current behavior), and that a Final classvar should imply no instancevar use (as this would be equivalent to overwriting it no longer being Final in that scope still) but I don’t think final as an instancevar needs to or should imply final as a classvar of a lack of classvar

Daverball · March 1, 2024, 1:57pm

True, but ReadOnly can be overridden with a writeable attribute in a subclass, the same is not true for Final.

mikeshardmind · March 1, 2024, 2:00pm

That… seems like something that shouldn’t be allowed. I get that it’s only technically adding capability (start with an inability to override, gain the capability) but this seems like something where the reason the lack of capability exists is part of the contract, and likely causes violating some invariant assumptions and shouldn’t.

Daverball · March 1, 2024, 2:03pm

No, this is safe in the way that ReadOnly is defined. I’ve raised an issue with its name for that very reason, but it was ultimately decided that the more strict meaning of ReadOnly is probably not that useful and as such should not preclude the name from being used in its more lax interpretation.

I previously suggested Readable in place of ReadOnly, but it was not well received, because from a type consumer’s perspective Readable and ReadOnly look the same, so Readable may be confusing in that context.

mikeshardmind · March 1, 2024, 2:09pm

I guess it’s fine and things that have that invariant assumption should be typing it with Final, but I consider myself pretty well informed about the type semantics of python as well as theory outside of a specific language, and I had to think about that for more than a couple minutes to convince myself that it actually would be fine and consider the implications of tracking a classvar and instancevar separately with this. Without a specific construct like Final already existing though, there would be no way to actually type that important contract.

erictraut · March 1, 2024, 2:46pm

I gave this topic significant thought when developing pyright. After much experimentation and feedback from users, I came up with this formulation. It’s internally consistent — and consistent with the current typing spec, but it differs from mypy in a few ways. It seems to work well for the common use cases in Python.

Setting aside enums, namedtuples, dataclasses, and TypedDicts (which are all non-standard in some respects), a variable in “normal” Python classes falls into one of three categories:

A “pure” class variable that is not intended to be overwritten by an instance variable
A “pure” instance variable that has no class variable associated with it
A class variable that may be overwritten by an instance variable; the value at the class level acts as a default fallback

Pyright allows you to specify which of these three you intend. If you want a “pure” class variable, use ClassVar. In this case, pyright enforces that the variable cannot be set (overwritten) through an instance.

If you want a “pure” instance variable, do not declare the variable in the class body and simply set the value through a self.x = v statement. Alternatively, specify the variable in __slots__. In this case, pyright enforces that the variable cannot be set through the class.

If you want a hybrid, declare (and optionally set the value of) the variable in the class body.

I think this provides a natural way for developers to express their intent. Pyright’s rules differ slightly from the rules that @mikeshardmind proposes at the top of this thread. Based on my experience, the rules that I converged on with pyright are more consistent with developer expectations and common use cases.

mikeshardmind · March 1, 2024, 3:00pm

So, for context, this was split off specifically from a thread about dataclasses, and the basic formulation allowing static inference of any kind to determine instancevar or classvar was motivated by having a more general rule that doesn’t need to treat these as special cases on both sides of the specification.

@dataclass
class X:
    a: int = 1

In this case, the type system is aware of the intent of the dataclass mechanism, and should not treat this as a class var. This also neatly wraps the inconsistency raised in that issue of:

@dataclass
class X:
    a: Final[int] = 1

As how the variable is treated is then allowed to defer to the meaning in the context of all of the static information, including the dataclass decorator and that this creates an init method.

I’m otherwise fine with pyright’s rules, but would like both the specification and pyright to be allowed to consider context that is statically available when appropriate.

carljm · March 1, 2024, 6:28pm

I think pyright’s behavior here is the right choice, and a good match for how Python code is written. I think it would be useful to codify this in the typing spec, which (as far as I can find) only clarifies the “pure class variable” case with ClassVar, and doesn’t offer clear guidance on pure instance variables vs class-and-instance variables.

Michael H:

This isn’t a guessing at inference situation, it’s at runtime a class variable if you do this.
>>> class NonData:
...     x = 1
...     __slots__ = ("x",)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable

I don’t think it is correct to use this error message as evidence that x should be considered a “pure ClassVar”, because the runtime has always allowed @erictraut 's case 3: a class variable that may be overwritten by an instance variable, with the class value acting as fallback. So yes, x here is a class variable, because it is set on the class. And that doesn’t preclude it also being an instance variable.

mikeshardmind · March 1, 2024, 7:47pm

Please reference the rest of the discussion and the edits that were made hours prior to this response to the top of the thread because of how frequently people do not read the whole discussion.

The request was not for precluding instance variable use unless there was static information that actually precludes it, but for the specification to change the conditions for defaulting to be based on “when type checkers cannot statically determine”, this was specific to remove the inherent special casing that exists where the idea of instance variables and classvariable interact with “Special” exceptions. such as dataclasses to make it based on the broader scope of “what can be statically determined” instead of “only these two criteria explicitly laid out in the specification”.

carljm · March 1, 2024, 7:58pm

Sorry if my quote misrepresented you.

I think it will be clearer for the spec to be explicit about how classvar/instancevar should be handled in the general case, and then for each particular special case (e.g. dataclasses) to precisely specify how it departs from that general handling. Rather than for the specification of the general case to be vague (using terms like “what can be statically determined”) in order to leave the door open for arbitrary special cases. Future special cases can always be explicit when they are added to the spec, too. (Of course for discoverability it may make sense for the general case in the spec to cross-reference the special cases that have differing behavior.)

mikeshardmind · March 1, 2024, 8:13pm

There’s nothing vague about that to me. It is abstract but not at all vague. The point of such a statement is that anything actually special gets called out where it is special, for pertinent example, spelling out that dataclass transforms turn class level annotations (including what would otherwise appear to be class variable use) into a specific fuller class definition, including init methods, and that the annotations are intended to refer to the transformed state.

People are frequently getting lost in the details of special casing, and much of the time even if there’s a need for special casing, expressing it only where it exists makes figuring out how things piece together sensibly easier.

carljm · March 1, 2024, 8:14pm

Oh, if your proposal is basically that the (clearly and explicitly specified) general case should also call out the fact that special cases exist, with different handling, then I think we are in agreement.

mikeshardmind · March 1, 2024, 8:24pm

Not quite, but very close.

General rule that applies without special cases: spelled out and fully explored

General rule can maintain a a list of special cases, but it should be clear which part of the equation is special, why, and which part should take precedence.

For example, a section discussing determining if something is a classvar or instance var or some hybrid and how to determine that should not say that Enum, dataclasses, and typed dict, are special cased, but instead that these are constructs that have special semantics for this which take precedence when considering it. Each of those should in their respective sections have their semantics expanded on for this. (making it clear those special semantics belong not to the rule itself, but the other construct, and that it is possible the other construct could be changed without touching the general rule)