Possible modification to ClassVar

mikeshardmind · March 1, 2024, 3:50am

PEP 526 as well as the specifcation states that all annotations in a class scope should be treated as instance variables unless marked with ClassVar. Can this be modified to “When type checkers cannot statically determine if a class scoped annotation is an instance variable or classvariable, they should default to instance variable?”

This would change the typing behavior of:

class X:
    something: int = 1

to be able to determine that X.something is a classvariable. This is statically available information, but this may not always be available information statically, such as in stubs, ABCs, or protocols, so this is not a request to deprecate ClassVar

Edit: For anyone who isn’t reading the whole thread, this should require multiple changes to allow better inference without breaking people’s use cases.

Inferring class variables
Inferring instance variables
Having an ability to specify both class and instance variables
Having the ability to specify only a class or instance variable

Without static evidence that it must be only one or the other, the inference should continue allowing use as an instance variable. Presence of an identifier in slots should preclude classvar, presence in __init__ via self.identifier should result in instance variable, type checkers should understand both.

examples here

class Error:
    a: int = 1  # marks as a class var
    __slots__ = ("a",)  # marks as only an instance variable, 
   # error both statically and at runtime declaration

class X:
    a: int = 1  # detect as ClassVar, non exclusively, matching runtime behavior

class CounterToX:
    a: int  # detect as instancevar, non-exclusive
   # doesn't create a classvar at runtime.

class Y:
    def __init__(self):
        self.a: int = 2  # detect as instance var, non exclusively


class Z(Y, X): pass

Z.a  # class var
Z().a  # instance var

description of rules that would apply for consistency in subclasses here

Daverball · March 1, 2024, 9:55am

I think this would result in a ton of false positives for a pattern I see fairly commonly in code, where assignments like these are used as a simple way to offer a default value for an attribute which is expected to be overridden by subclasses with an instance variable.

Prior to annotations being a thing this was also often just used as a way to keep the docstring for the available instance attributes outside the __init__ implementation^[1].

These would now all need to be changed in some way, by either adding an __init__ that needs to be invoked through super() or implemented using a descriptor like property or removing the default value altogether and requiring subclasses to provide one.

I think it makes sense to make ClassVar a deliberate choice, rather than try to infer it, since it doesn’t just mark a variable as belonging to the class rather than the instance, it also means you aren’t allowed to override it with an instance attribute. Mixed instance/class attributes are fairly common.

although in this case you could just make the rule, that if an attribute is written to in __init__, then it isn’t implicitly inferred as ClassVar, even if it had a value assigned to it in the class scope ↩︎

srittau · March 1, 2024, 9:56am

The example you give is not necessarily a class var. I’ve seen code before that mixes class and instance variable behaviour like this to define a default value:

class Foo:
    bar = 3

    def baz(self):
        self.bar = 4

mikeshardmind · March 1, 2024, 9:57am

This isn’t a guessing at inference situation, it’s at runtime a class variable if you do this.

>>> class NonData:
...     x = 1
...     __slots__ = ("x",)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable

Daverball · March 1, 2024, 9:59am

It is allowed in a subclass though, as well as with the regular __dict__ case (i.e. the more common case).

This one single case, that isn’t safe, is something a linter should be able to catch, since it really doesn’t have anything to do with whether or not x is inferred as a ClassVar.

mikeshardmind · March 1, 2024, 10:03am

Nothing about treating this as a class variable wouldn’t be okay with subclassing:

class X:
    a: int = 1  # class variable


class Y(X):
    a: int = 2  # class variable that is consistent with the parent class's type


X.a  # 1
Y.a  # 2

mikeshardmind · March 1, 2024, 10:05am

This has other effects in terms of other things not composing well and requiring imports for things that are statically knowable without them, see the other issue with dataclasses that links here

Daverball · March 1, 2024, 10:06am

Yes, but that’s not the case I’m talking about, I’m talking about this one:

class X:
    a: int = 1

class Y(X):
    def __init__(self) -> None:
        self.a = 5  # type error

Or this one (although this one could be statically detected and made an exception for):

class X:
    #: some documentation about attribute
    a: int = 1

    def __init__(self) -> None:
        self.a = 5

mikeshardmind · March 1, 2024, 10:16am

That should be erroring under the rules for class variables. Whether those rules should be looser than they are is a reasonable discussion to also have, but I don’t think it’s exactly compelling to have such rules, and then to not detect the situations there are meant to detect, especially at the cost of the type system being incorrect in modeling runtime, and not fully composing anymore due to the special casing.

just remove the = 1 if the intent is that this is an instance variable.

Daverball · March 1, 2024, 10:26am

Just add a ClassVar if the intent is that this is a class variable. We can both make that argument.

Ultimately this comes down to which pattern will cause more false positives/negatives down the road. I know it’s easier to forget to add a ClassVar than it is to remove an assignment, especially if the type checker yells at you, but the simple truth is that sometimes you want things to be both an instance and a class variable and there’s currently no way to express that, in fact the default is to treat it like that (at least in mypy), since mypy will not complain about X.a regardless of whether it’s marked with ClassVar or not or only appears inside __init__.

So if anything this is an argument for me to introduce an additional modifier InstanceVar which is only accessible on instances. Then you could make the rule that anything inside __slots__ and __init__ is an InstanceVar by default, unless it also appears inside its own class body without an explicit InstanceVar annotation or is a mixed attribute in a superclass.

mikeshardmind · March 1, 2024, 10:28am

Except that we don’t need to and shouldn’t as that’s just wrong. The assignment at a class scope causes it to be a class variable at runtime as shown above. The intent not being a class variable would be a programming error then.

Daverball · March 1, 2024, 10:31am

I disagree, the way the lack of a ClassVar modifier is interpreted currently (i.e. it’s either an instance or a class attribute) it’s just as correct, with ClassVar you’re being explicit about it actually being only a ClassVar and excluding the other possibility. I think this is much more in line with the gradual nature of Python’s type system and mirrors the fact that you’re allowed to override a class attribute with an instance attribute, except in that one case with __slots__.

mikeshardmind · March 1, 2024, 10:32am

This falls under the rules around class variables may be too strict (in your view, they shouldn’t disallow assigning to an instance) but doesn’t change that this should be a class variable.

mikeshardmind · March 1, 2024, 10:38am

Somewhere along the way, things went wrong.

Why would we rather people need to import and express something more strict than runtime instead fo allow it to be detected accurately inferred statically?

class X:
    a: int = 1  # detect as ClassVar due to assignment


class Y:
    def __init__(self):
        self.a: int = 2  # detect as instance var as normal, defined in `__init__`


class Z(Y, X): pass

Z.a  # class var
Z().a  # instance var

I’m 100% fine with the cases there for Z, it’s consistent with use in __init__ and at a class scope.

Edit: Maybe we do need an expressed ability to be able to do a: int and ClassVar[int] for stub use though/

Daverball · March 1, 2024, 10:42am

I’m not disagreeing with you that the current state is unsatisfying, it’s not expressive enough to accurately model the behavior of attributes^[1], but this needs to be fixed before we can change what the type checker should infer on its own.

this is even more true for classes implemented using the C-API, since they can have read-only attributes that aren’t properties ↩︎

mikeshardmind · March 1, 2024, 11:51am

Sorry, I think some of what I said could have been worded better throughout this discussion, and I’ve let my frustration with how there’s a web of interconnected issues here and my general frustration with how poorly things compose overall get in the way of just proposing a proper fix.

I think you’re right that this needs multiple changes to actually fix without causing people to have “Working” code no longer work (for the record, I think that class variable usage as shown in this thread is extremely fragile and that properly detecting the actual runtime behavior could help prevent fragile misuse; It’s been something I reject in code review when not used as an immutable constant, and usually prefer a class property or module scoped constant (depending on who the constant is for) for such constants due to the fragility)

Inferring class variables
Inferring instance variables
Having an ability to specify both class and instance variables
Having the ability to specify only a class or instance variable

Having inference do the right thing with all 4 of those, and lacking static evidence of it being 4 with 1, treat it as 3.

Daverball · March 1, 2024, 12:13pm

I would tend to agree with this as long as 1. and 2. can be overridden with 3. in subclasses and 3. being automatically inferred as well, i.e. you don’t need to add a qualifier in order to turn a 1. or 2. into a 3. in a subclass. With 4. being the only case where you always need to explicitly add a qualifier (outside of type stubs), although you could just make the decision to always infer 3. in type stubs unless a qualifier is used that would turn it into 4, then you remove the need to add an additional qualifier and 1/2/3 are strictly internal state that the type checker is tracking for you.

Automatically inferring 4 sounds like a mistake to me, since it would generate a lot of noise in a codebase that is transitioning from having no annotations to some annotations. You could make it an optional strictness flag however.

mikeshardmind · March 1, 2024, 12:16pm

I would only recommend automatically inferring 4 in the case of __slots__, and it being a case of 2+4, or any other similar case. I don’t think 1+4 can be inferred statically even if the behavior can be created with c-extensions, if it can, go ahead and infer it, but such inference should be limited to things where it is without question exclusively one.

Daverball · March 1, 2024, 12:20pm

I don’t think this is necessary, I would still just infer 2 in this case (or 3 if a parent class inferred it to 1 or 3) and special case a 2 that is inferred from __slots__ making any class level assignment in the same class body with the same name illegal.

Actually scratch that, mixing __slots__ specifically should probably be disallowed, since you don’t get the parent class attribute if you overwrite it in a subclass. So I guess I agree with you that __slots__ specifically should cause 2+4. But I wouldn’t do that for any other instance level attributes.

On second thought, I think inferring 2 is still fine, but it should be an error if an attribute that is mentioned in __slots__ ends up being inferred as 3 once the parent class and the rest of the class scope has been considered, since it can’t be 1 in that specific class. That way you can still make a subclass without slots (or different slots) that adds a class attribute that overrides the slot from the parent class, but the opposite case would generate an error (i.e. defining a class attribute in a parent or the same class)

mikeshardmind · March 1, 2024, 12:25pm

Hmm, Id go a slightly different route. Going back above to your thing about allowing subclasses to use it in a compatible manner and just model both the inference and the rules around the runtime and the need for consistency (LSP)

type checkers gain knowledge of how the language works, presence in slots prevents classvar use

If it’s currently only an instance var, but nothing precludes being a classvar, subclasses can add classvar cabaility.

If it’s a classvar, adding instance var capability is fine, but adding incompatible slots (removing classvar capability) is not (this isn’t overly strict, it will also error at runtime, so modeling the runtime statically is fine)

Inference applies to totality and consistency but isn’t binding as exclusive without expressing it as exclusive. When unclear which (ie. annotation, but no assignment at class scope), prefer instance variable intent, matching runtime outcome.