Differentiating between initialized and uninitialized assignments on class toplevel

In Python, attributes that are declared at class toplevel but are uninitialized there cannot be directly accessed from the class itself at runtime. For instance,

class Foo:
  x: int

Foo.x  # Runtime crash! type object 'Foo' has no attribute 'x'

This makes sense because at runtime, the bare x: int “declaration” on Foo toplevel is just an annotated assignment statement without a RHS, which semantically is a no-op. Nothing actually gets added to the class Foo. So attribute-accessing x from class Foo directly is almost always a runtime crash.

But if we try to assign an initial value to the attribute at class toplevel, x will be there:

class Bar:
  x: int = 42

Bar.x  # OK!

Currnetly, neither mypy and pyright checks this kind of issue. But in pyrefly (and in ty according to Alex), we are trying to emit type errors when we detect that the user tries to access uninitialized attributes from class names directly. Our experience has been that this check rarely misfires for classes defined in user-written .py source files, but it misfires every now and then for classes that are defined in stub files – since stub files don’t quite pay much attention to differentiating between the initialized and uninitialized cases. In particular, Typeshed currently follows the following convention, according to Sebastian:

So far in typeshed, we’ve treated x: int and x: int = ... not only as identical, but the latter form even as outdated.

As an example, builtins.object is declared to have 4 uninitialized class-level attributes, but at runtime all these 4 attributes are always initialized on the object class. This means that every time someone attribute-accesses __doc__ or __dict__ etc. on any classes, Pyrefly is going to complain that these attributes are not initialized:

Such typeshed convention quite limits the usability of Pyrefly’s uninitialized-class-attribute check at the moment so I’m inclined to change it. But before I make my attempt I’d also like to see what folks think about this idea: is differentiating between initialized and uninitialized class-level attribute considered beneficial enough? What’s the chance of elevating the behavior/convention to the typing spec?

5 Likes

The reason pyright doesn’t enforce this (at least by default) is that x is a public member and can be assigned a value along any code path, including those that are not being analyzed.

# This is perfectly valid
Foo.x = 0

It’s also valid to delete the value along any code path.

del Foo.x

For these reasons, I was never comfortable flagging it as an error when accessing such an attribute. I’m guessing that the reasoning behind mypy’s behavior is the same. It’s interesting to hear that in your experience this rarely results in a false positive.

Pyright does offer a configuration setting reportUninitializedInstanceVariable. It’s opt-in (not enabled by default even in strict mode) because it demands a particular coding practice that some Python developers find too restrictive. It requires that all instance variables be initialized within the class body and/or the __init__ method. I think that’s equivalent to the check you’re proposing in pyrefly.

I don’t have a problem with standardizing x: int = ... in stubs to indicate that the attribute is expected to have a value whereas x: int indicates that it may not. This won’t be 100% dependable because, as I noted above, such a value can always be deleted after the fact using a del statement or call to __delattr__, but it will be correct most of the time.

A similar topic that could have a bearing on this one is how type checkers differentiate between “pure class variables”, “pure instance variables”, and “class + instance variables” (what pyright’s docs call “regular instance variables”). Mypy does not differentiate between “pure instance variables” and “class + instances variables”.

With pyright’s approach, a variable that is considered a “pure instance variable” cannot be accessed at all on the class object regardless of whether it has been initialized.

class Foo:
    def __init__(self) -> None:
        self.x: int = 0

# The following line will result in a runtime error
Foo.x  # Pyright error: cannot access x, mypy emits no error here

I mention this because there’s currently no way in a stub file to differentiate between “pure instance variables” and “class + instances variables”. Differentiating between these two may provide more benefit than differentiating between a “class + instance variable” that has an initial value and one that does not.

5 Likes

Quick nitpick: object.__annotations__ isn’t a thing (or at least not on py313):

>>> object.__annotations__
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    object.__annotations__
AttributeError: type object 'object' has no attribute '__annotations__'

But in general I completely agree. It’s what I’ve been trying to do in scipy-stubs and in the stubs for NumPy. But because I’m not aware of any tools that are able to validate this, there’s a good chance that I didn’t catch all of them.

And for what it’s worth; “dotting the defaults” (in stubs) feels pretty natural to me. It’s what’s commonly done for function parameters with defaults [1]. So I think it makes sense to also enforce this for class-initialized attributes and module-level constants.


Also, I’m kinda curious why in typeshed x: int is preferred over x: int = ... :thinking:.


  1. note that usually it’s better (for introspection/documentation purposes) to use the actual default values instead of ... ↩︎

2 Likes

But it will be in 3.14, which is why it’s in the main branch of typeshed. I don’t know what I was thinking about there…

No, it won’t be. 3.14 doesn’t add an __annotations__ attribute to anything that didn’t have one before.

I don’t recall why object.__annotations__ is in typeshed; it probably shouldn’t be. On a quick search I couldn’t find a discussion of why we put it there.

2 Likes

Yeah I was just looking at the git blame for that line and it’s 4 years old. I think I confused myself with PEP 749 and annotationlib

I support getting better at differentiating between different kinds of instance and class variables, and your proposal seems like a step in the right direction.

Technically true, but code that deletes public attributes from objects must be quite rare and difficult to follow. I think it’s generally more user-friendly to disregard this possibility.

Because it’s shorter, and so far we assumed that it meant the same thing.

3 Likes

In a dataclass, annotated class variables declare instance variables (and it always initializes them, IIUC). For those, the class variables shouldn’t be referenced at all.

I don’t think this is equivalent. The check discussed here is to error on Foo.x (class access) when the body of Foo has x: int (but not when it has x: int = 1). This is not the same as requiring initialization of instance attributes; it has no bearing on anything that happens inside __init__ or any other method. Rather, it’s about distinguishing pure instance attributes from class+instance attributes, and considering x: int without binding in a class body to describe a pure instance attribute. So it’s closely related to this:

Ty (and it sounds like pyrefly also) agree with pyright here, but would like to extend this also to consider x here to be a “pure instance variable”, because it has no binding on the class:

class Foo:
    x: int

And the proposal in this thread is precisely to provide a way that we can differentiate a “pure instance variable” from a “class & instance variable” in a stub file: x: int vs x: int = ....

3 Likes

I think this is true, although given how rare it is (or how people generally don’t expect much from type checker when this happens) to add/remove attributes from class names, we could at least be fairly confident that it’s an issue in the simple case where the unassigned attribute is not set/deleted on class toplevel directly. In other cases, it’s reasonable for the type checker to say “I don’t know” and choose to either surface or suppress the issue depending on how strict it wants to be.

Yeah sorry I wasn’t being clear on what my proposal was earlier. As Carl mentioned, the goal is not to reason about whether attributes themselves may be uninitialized or not. I agree that’s a more involved task that requires more user-visible restrictions. Instead, I was aiming at something much less ambitious: using Pyright’s terminology, to make it possible to differentiate “pure instance variable” vs. “class+instance variable” in stub files. With the proposal’s scope reduced, I’d be interested to hear your thoughts on the idea.

I welcome a standardized way to differentiate between a “pure instance variable” and a “class+instance variable” (in stubs and elsewhere), but the proposed mechanism is still not entirely clear to me.

Leaving aside stubs for a moment, let’s consider the following sample.

class A:
    x: int = 0
    y: int

    def __init__(self) -> None:
        self.z: int = 0

    @classmethod
    def init_me(cls) -> None:
        cls.y = 0

Which of the following subsequent statements would you expect to be flagged as a type error?

A.x = 1
A.y = 1
A.z = 1

Mypy is fine with all three of these statements. Pyright flags the third one as an error because it considers z a “pure instance variable” — one that cannot be accessed directly on the class object. Pyright allows the first two statements because it considers x and y to be “class+instance variables”.

If I understand your proposal correctly, the statement y: int within the class body would imply that y is a “pure instance variable”. That would mean A.y = 1 would be an error — as would cls.y = 0. I’m not convinced that’s the right behavior. It would definitely be a breaking change for pyright users.

Perhaps you’re proposing that these meanings apply only to classes defined in “.pyi” files but not in “.py” files? I don’t think we should special-case stubs in this manner. I think we can agree that ascribing different meaning in stubs versus non-stubs is confusing for users. I’d prefer to focus on solutions that work consistently regardless of where a class is defined.

Here’s another edge case that needs to be considered. What if we split the declaration and the initial assignment into separate lines? By your proposal, would x and y in the sample below be a “pure instance variable” or a “class+instance variable”?

class A:
    x: int
    x = 0

    y: int
    if get_condition():
        y = 0

print(A.x)  # Allowed?
print(A.y)  # Allowed?
2 Likes

I would prefer to see an error on the second and third. I would also be fine with not seeing an error on the second, but only because of the presence of the classmethod assignment to cls.y (that is, if that classmethod didn’t exist, I would want to see an error on A.y.)

What I’ve observed is that in almost all real-world cases, x: int in a class body (with no value assigned to x in the class body) is intended to describe an instance attribute, and A.x at runtime will in fact be an attribute error, so failing to flag it is very likely a false negative. Ty already implements this and it hasn’t come up as a significant source of false positives.

If there isn’t a value available yet in the class body, and it’s intended to be set in a classmethod (or __init_subclass__), annotating it as x: ClassVar[int] should cover many of those cases. If it really is supposed to be a class-and-instance attribute, with delayed setting of the class fallback (this doesn’t seem common) a default value can probably be set in the class body.

That said, I also wouldn’t be opposed to adding a scan of assignments in class methods, and considering that also sufficient to “declare” an attribute as accessible on the class.

I think the presence of any binding in the class body should be sufficient to make it clear that the name is accessible on the class. This is what we implement in ty.

I agree that this rule should be consistent between stubs and non-stubs. But I do think it’s an advantage of this rule that it gives us a way to clarify instance vs class-and-instance, in a stub.

7 Likes

Thanks for the feedback! Those are indeed some great examples.

I am pretty much aligned with what Carl said. Basically,

  • In your first example, I think we can agree on what should happen for x and z. As for y, I’m also OK with leaving the decision to the type checkers. Pyre/Pyrefly generally dislikes code whose correctness depends on method invocation ordering (e.g. init_me() must be invoked first before any other methods that reads from A.y can be invoked) so it probably wants to flag the issue there, but I can also see that it can be overly restrictive at times (especially for small-sized projects) and there are also plenty of reasons to not error.
  • In your second example, x would be definitely class+instance variable. y may or may not be pure instance variable, and again whether to count it as pure instance or class+instance would be up to the type checkers and/or their configurations to decide. Realistically I’d lean towards treating it as a class+instance variable for simplicity, and only try to be more conservative in some form of crazy strict mode.
  • Agree that it would be nice if stub and non-stub behaviors can be aligned on this.
2 Likes

That’s promising. If that’s the case, then it’s probably best if we formalize this in the spec and standardize the behavior across type checkers. @grievejia, perhaps you’d be willing to write a draft update to the spec?

You may recall that last year I proposed a number of typing spec priorities. You’ll see that “Instance and class variables” is on that list, and this is roughly what I had in mind when I wrote that slide. We may need to add a new “Methods and Attributes” chapter to the spec.

I think we could also benefit from standardizing the terminology here. I struggled to come up with good terminology in the pyright documentation and don’t really like what I came up with (“pure instance variables”, etc.). I’m hoping someone else can suggest something better.

3 Likes

There might be a reason I don’t see here, but I don’t think we actually need a short term for these concepts. Even in user documentation, the explanation of the difference can be introduced as an ordered list, and then examples can either refer to each by number to avoid significant repetition, or phrase the example around the distinction.

This makes it “attribute accessible only from the class”, “attribute accessible only from an instance of the class”, “Attribute accessible from either the class or an instance of it”. In places where the distinction between the three cases doesn’t matter, it can be simply omitted, and we have “attribute” again.

This framing will also help any attempt to improve generalized descriptor support in the future, as descriptor behavior can be implemented in such a way that these distinctions matter.

I’d be happy to take on the spec work. I may need to start after the PyCon trip though so it probably won’t be ready in a few weeks.

It’s great to see that this topic covered by your priority list, and I’m glad we can make further progress on it!

2 Likes