I’ve written a first draft of a new chapter for the typing spec that focuses on type checker behaviors for enumerations.
The Enum class has many atypical behaviors, and type checkers need to include special-case logic to handle these behaviors. Not surprisingly, there are a number of differences between type checkers in how they handle enums. This is an attempt to define and standardize the desired behaviors.
If you are interested in reviewing the draft, you can find it in this PR. Minor feedback (typos, wording suggestions, etc.) can be added directly to the PR. For significant questions or areas of potential disagreement, please post to this thread so community members get better visibility and can engage in the discussion.
This looks like a good first draft to me and deals with anything I could think of, I only have a minor comment:
In the section about Member/Non-Member attributes I wonder if type checkers should be encouraged to be slightly more strict to catch likely bugs at the definition site, rather than only potentially catch them once you try to use non-members as members. This would only be a “may”, like some of the other optional features in the spec, but I think it’s at least worth recommending.
Specifically the case where you assign a callable to a name inside an enum rather than use an explicit method definition seems like it would almost always be a bug and should’ve either been wrapped in enum.member or enum.nonmember, or explicitly ignored as a “yes, i know this won’t be an enum member” in earlier Python versions.
I’ve made this mistake in the past and I never used the enum in a way where my error became apparent at runtime or using a type checker until much later in development, so catching this kind of error early seems valuable to me.
Within a type stub, members can be defined using the actual runtime values,
or a placeholder of ... can be used::
Currently, in typeshed, many enums are defined like this:
class Foo(Enum):
ABC: str
DEF: str
And while we should probably change many instances to use explicit values, sometimes explicitly annotating the type is necessary. For example, suppose we can’t use the actual value for whatever reason:
class Foo(Enum):
ABC = ...
DEF = ...
The type of Foo.ABC.value and Foo.DEF.value is undefined, as it depends on the actual values used.
Currently, in typeshed, many enums are defined like this
Yes, I think that’s problematic because type checkers currently cannot differentiate between members and non-member attributes for these stub definitions; they’re ambiguous. I’m hoping that we can get those fixed in typeshed.
There are only 13 such enums in all of typeshed, and all are in third-party stubs. They should be easy to fix.
The type of Foo.ABC.value and Foo.DEF.value is undefined, as it depends on the actual values used.
The value of value in this case would be Any (which is how it is defined in the Enum class) unless the stub provides an explicit type annotation for _value_, like this:
Typeshed itself is easy to change, but there is likely a lot of third-party code that also relied on this pattern and is currently accepted by type checkers.
I think your draft is still the right choice for the spec, as the alternative is ambiguous, but type checkers who currently support other patterns will have to go through a careful deprecation procedure.
Here’s a quick experiment that attempts to quantify the extent of the problem. I modified pyright to remove its special-case handling of stubs and ran the change through mypy_primer. Here are the results. This generated a number of compatibility issues, but it appears they are mostly related to typeshed stubs.
As a follow-on experiment, I temporarily modified pyright’s copy of typeshed so it conforms with the proposed spec. Here are the results. Most of the issues from the previous experiment are now gone. The remaining issues are because someone copied and vendored the inspect.pyi typeshed stub.
I’m not sure what to take away from that experiment other than that there will be some compatibility issues as you’ve predicted.
I find the “non-member attributes” phrase very confusing. On the one hand, enum classes are made of attributes, some of which are members, and the others are not; on the other hand, the attributes that are not members are typically attributes of the members:
Planet.SATURN.mass
or (from the new chapter)
class Pet(Enum):
genus: str
species: str
I would say that genus and species are actually member-attributes, as they won’t be available directly from Pet (at least from what we see in the snippet).
I don’t know if there’s a good solution to this confusion.
The draft currently mandates that this is an error:
WrongName = Enum('Color', 'RED GREEN BLUE') # Type checker error
But mypy currently allows this, with reasonable behavior.
I agree that this code is confusing and poorly written, but it works at runtime, and I don’t see a strong case for why mypy should be changed to emit an error here. Can we make this error optional?
Another case that may need some explicit language is enums of unusual types. Usually enum values are ints or strs, but they can of course be of any type. Sebastian brought up the ssl.Purpose enum as an example. Do I understand correctly that in a stub, it should be specified as:
Mypy does enforce this for other functional forms that create classes or objects: TypeVars, ParamSpecs, TypeVarTuples, NewTypes, and NamedTuples. I suspect the fact that mypy doesn’t enforce it in the Enum case is just an oversight. For consistency, I think this should be mandated in the spec, as it is in those other cases.
Or it should be removed from the spec in those other cases… But I would lean toward leaving it in.
I’m in agreement with Jelle here. I don’t think we should lightly start mandating that type checkers emit errors where they didn’t before and where the warnings have limited usefulness for the users of type checkers. I don’t see any particular reason to change the spec with regards to typevarlikes, NewTypes and NamedTuples – we’ve always had those rules, and the rules there promote cleaner code. But demaning that mypy start rejecting code it previously accepted feels unnecessarily disruptive here, and I’m not sure what practical benefit it would have.
I’m fine with not mandating such things, but I want to be consistent. Is there a general rule we can establish that explains (with solid justification) why name consistency is enforced in some cases but not others?
# Mypy Error
# Mandated in typing spec:
# > The argument to TypeVar() must be a string equal to the variable name
# > to which it is assigned.
T_Bad = TypeVar("T")
# Mypy Error
# Not explicitly mandated in typing spec, but implied
P_Bad = ParamSpec("P")
# Mypy Error
# Not explicitly mandated in typing spec, but implied
Ts_Bad = TypeVarTuple("Ts")
# Mypy Error
# Not currently mandated in typing spec
NT_Bad = NewType("NT", int)
# Mypy Error
# Not currently mandated in typing spec
NTp_Bad = NamedTuple("NTp", [("a", int)])
# Mypy Error
# Not currently mandated in typing spec
NTp2_Bad = namedtuple("NTp2", "a b c")
# No Mypy Error
# Not currently mandated in typing spec
E_Bad = Enum("E", "a b c")
# Mypy Error
# Not currently mandated in typing spec
TD_Bad = TypedDict("TD", {"a": int})
One clear pattern is that all types that you showed for which mypy produces an error are part of the typing stdlib package, whereas Enum is not included in there, so legacy code might use a different name and mandating that would have produces errors for code that was previously perfectly fine. However, collections.namedtuple also produces an error, so that isn’t a hard rule. (and I can’t actually think of any other cases in the stdlib where this would be relevant?)
IMO, allowing the errors, but not mandating them for none-Typevar stuff is a good idea.