Draft of typing spec chapter for enums

erictraut · January 22, 2024, 8:27pm

One other difference is that calls to TypeVar, ParamSpec and TypeVarTuple produce instances of those classes, but the other calls on this list produce class objects. They are effectively class factory calls.

Based on the feedback in this thread, I’d like to propose that the typing spec be amended as follows:

For TypeVar, ParamSpec, and TypeVarTuple, naming consistency is (remains) mandated. The typing spec already says this for TypeVar, but it should be made clear that this also applies to the other two forms.
For all of the other class factories on the list above, the typing spec should not mandate naming consistency but should state that type checkers MAY do so.

tmk · January 23, 2024, 1:09pm

I think a mismatch in NewType should also be an error. The whole point of NewType is its name, so a mismatch feels very strange here.

Explicit TypeAliasType also feels like a case where a mismatch should be an error.

mikeshardmind · January 23, 2024, 4:37pm

I don’t think the name should matter or be enforced by specification in most of these cases. (it matters for typevar, typevartuple, and paramspec for other reasons)

X = NewType("X", int)
Y = X
del X

There’s nothing wrong with this, in fact it can happen somewhat more naturally if a NewType is an exported symbol and imported as another name. The name passed there is important for constructing a sensible repr, but enforcing naming seems to be more appropriate as a “may” or for a linter.

rchen152 · January 24, 2024, 12:03am

I’m in favor of @erictraut’s latest proposal for when the spec should mandate name consistency (so, not for enums). Other than that, the draft looks pretty good to me. Having a clear way to distinguish between members and non-members is especially nice.

AlexWaygood · February 4, 2024, 7:00pm

I have some concerns about the draft chapter as it relates to stub files.

The draft chapter says:

All enum member objects have an attribute _value_ that contains the member’s
value. They also have a property value that returns the same value. Type
checkers may infer the type of a member’s value::
class Color(Enum):
    RED = 1
    GREEN = 2
    BLUE = 3

reveal_type(Color.RED._value_)  # Revealed type is Literal[1] (or int or object or Any)
reveal_type(Color.RED.value)  # Revealed type is Literal[1] (or int or object or Any)

With regards to stub files, the draft chapter states:

If the literal values for enum members are not supplied, as they sometimes
are not within a type stub file, a type checker can use the type of the
_value_ attribute::
class ColumnType(Enum):
    _value_: int
    DORIC = ...
    IONIC = ...
    CORINTHIAN = ...
  
reveal_type(ColumnType.DORIC.value)  # Revealed type is int (or object or Any)

This doesn’t appear to account for the fact that it’s perfectly legal at runtime for enums to be heterogenous in the types of their member values. While it’s less common to have heterogenous enums than homogenous enums, this isn’t a hypothetical concern: here’s an enum in the stdlib uuid module where different members have values of different types:

@_simple_enum(Enum)
class SafeUUID:
    safe = 0
    unsafe = -1
    unknown = None

For cases where enums have simple enum values, like this one, we can simply include the enum member values directly in a stub file, so type checkers should be able to figure out that the type of SafeUUID.unsafe.value is of type Literal[-1], and the type of SafeUUID.unknown.value is of type None. But it’s not always possible to include enum values in stub files: in some cases, the value of the enum may be constructed using a more complex expression, and complex expressions have traditionally been banned in stub files (with a few very small exceptions), on the grounds that stub files are essentially declarative “data files” for the type checker.

You might argue that creating heterogenous enums like SafeUUID is an antipattern, and that enums should generally have values with homogenous types. I would tend to agree with you. However, I think it’s important that we should still have a way to express heterogenous enums in stub files if that’s what the runtime is doing. The most important thing in a stub is for us to be accurate w.r.t. to the runtime, even if the runtime is making use of an antipattern that we wouldn’t necessarily endorse when we were writing our own code.

I would also prefer it if the spec explicitly banned using = ... for enum members in a stub file if the enum in the stub file does not include an explicit _value_ annotation. In the absence of a _value_ annotation, it will be impossible for a type checker to infer the type of a member declared using = ... in a stub file. I would prefer for these cases to be explicitly rejected by the type checker, rather than the type checker inferring a type of Any or Unknown for the type of the value of that member.

erictraut · February 4, 2024, 8:27pm

The case you’re talking about involves the intersection of the following:

The enum is part of a library’s public interface described by a type stub
The enum uses heterogeneous value types
Some of the value types are complex (not simple literals)
These value types are meaningful for type checking purposes (as opposed to, for example, a plain object that is used to define a sentinel enum member)

My sense is that the combination of these circumstances is extremely rare. Based on my recent pass through the typeshed stubs, I can say with a high degree of certainty that there are no such examples in typeshed.

Are you aware of any real cases in other stubs you’ve run across? I haven’t ever seen any such case. That doesn’t mean an example doesn’t exist, but my sense is that it’s not something we should design for.

Also of note is the fact that most enums in typeshed (and other stubs in the wild) today have values defined as = ... which means they are evaluated as Any. This proposal gives us some tools to improve this situation in cases where the value of an enum member is meaningful for type checking purposes.

I’ll also point out that if one of these rare edge cases were identified, there are a couple of reasonable fallbacks permitted by the proposed mechanism:

The _value_ type can be annotated as a union (or a common supertype) of the heterogeneous types.
Complex member values can fall back on = ... and have a value type of Any. (Note that the type of the enum member itself is still defined, since it’s a Literal instance of the class; it’s only its value property that would have a type of Any in this case.)

I would also prefer it if the spec explicitly banned using = ... for enum members in a stub file if the enum in the stub file does not include an explicit _value_ annotation.

I’m opposed to this because it’s not in the spirit of gradual typing. I think it’s fine for the maintainers of typeshed to impose this rule if you want, but that’s more of a code conformance rule that would best be handled by a linter or custom tooling like stubtest.

As I mentioned above, most type stubs today use = ... for most of their enum values, and this works fine for most enum usage. For many enums, the value of the enum members is not meaningful and is never accessed by code that uses the enum. It doesn’t make sense in these cases to force stub authors to provide types for these values.

AlexWaygood · February 4, 2024, 10:56pm

Are you sure? I don’t think typeshed currently has any enums where the values are defined as = .... Until Add support for setting enum members to "..." by JelleZijlstra · Pull Request #16807 · python/mypy · GitHub, a recent mypy PR that was made in response to CI errors on your typeshed PR Changed enums to conform with proposed change to typing spec discusse… by erictraut · Pull Request #11299 · python/typeshed · GitHub, stubtest — a tool typeshed uses in CI — would emit an error if any stubs defined enum members using = ... when they were not literally set to ... at runtime. I believe your proposed spec represents quite a big change in the way typeshed has done things up till now (which, to be clear, is fine if it leads to improvements for our users).

erictraut · February 4, 2024, 11:14pm

I stand corrected on that point. Most of the enums in typeshed currently do not provide a literal type for enum member values, but they do provide the (non-literal) value type like int or str in most cases.

layday · April 15, 2024, 11:19am

The spec says:

Methods, callables, and descriptors (including properties) that are defined
in the class are not treated as enum members by the EnumType metaclass
and should likewise not be treated as enum members by a type checker […]

Nested classes should be added to this list for Python 3.13 up, which will require the use of the member decorator. Nested classes have been emiting a deprecation warning since 3.11 without it. This change only appears in the long changelog for 3.11 (without a GH link) and is easy to miss. The corresponding PR is gh-78157: [Enum] nested classes will not be members in 3.13 by ethanfurman · Pull Request #92366 · python/cpython · GitHub.

Viicos · May 14, 2024, 5:18pm

I’m wondering if the spec should clarify how enum members are instantiated, especially with respect to the __new__ and __init__ methods. While the current draft covers having a custom __init__, __new__ is special cased by the enum module. When the EnumMeta.__new__ runs (to create new enum classes):

_find_new_ is called and responsible for finding a custom __new__ method, with a fallback to object.__new__.
If a custom one is defined, each enum member will be created by calling __new__(enum_class, *args), with args being a tuple of the defined member value ^[1]. Else, because object.__new__ is used and accepts no arguments, __new__(enum_class) will be called.
After explicitly calling __new__, __init__ is unconditionally being called with the args unpacked.

The most common use case I can think of where a custom __new__ is defined is when a different base class is used:

from enum import Enum

class StrEnum(str, Enum):
    A = "a"  # ok
    B = b"b", "utf-8"  # ok, see the typeshed definition of `str.__new__` (a)
    C = "too", "many", "arguments", "provided"  # runtime error: TypeError: str() takes at most 3 arguments (4 given)

(a): str.__new__ on typeshed.

Currently, both pyright and mypy are happy with the above code.

Funnily enough, pyright will raise an error if we define a custom MyStr class instead (mypy still doesn’t).

It is also possible (although really not recommended, as this will overwrite Enum.__new__, used to instantiate enum members when doing a lookup like Color(3)) to define the __new__ method directly on the Enum class, it will be used as well:

from enum import Enum

class E(Enum):
    def __new__(cls, arg: int) -> Self: ...

    A = "a", "b"  # Type error, too many arguments.

Currently supported by pyright, not by mypy.

I guess the behavior can be specified more easily since the chapter about constructors was recently added (well in fact maybe not, especially because a custom __init__ is used to instanciate new members when the enum class is defined – as said above – but will not be used when calling MyEnum(...). This makes pyright a bit confused).

That is, if you define your enum member(s) as MEMBER_NAME = 1, "a", args=(1, "a"). ↩︎

erictraut · June 3, 2024, 5:57am

The TC has signed off on this addition, and it has been merged into the official typing spec.

Thanks to everyone who helped review the chapter and suggested improvements.