PEP 747: TypeExpr: Type Hint for a Type Expression

I’ve drafted a PR that:

  • Rephrases the rules for the | operator to just say that situations that previously resulted in a UnionType static type should now result in a TypeExpr[X | Y] static type, rather than attempting to directly specify the existing complex (and undocumented?) rules for the | operator.
  • Removes the ability to assign a TypeExpr[X | Y] to a UnionType, which implies a backward compatibility break for the rare functions that use UnionType in their signature. Migration guidance is added to §“Backward Compatibility”.

Feedback is welcome.

After the above items are ironed out, I’ll move on to altering the name of the concept that PEP 747 introduces, which I expect will be a monster of a diff.

1 Like

I’ve drafted a replacement PR that takes a different approach:

  • No longer attempts to alter the rules for the | operator. Thus users must use the explicit TypeExpr(...) syntax in code using |, like the following:

    if isassignable(x, TypeExpr(int | str)): ...  # TypeExpr(...) required
    

Feedback is welcome.

Update on the overall status of PEP 747: Remaining tasks I see:

  • Address remaining major points of feedback. Achieve rough consensus:
    • A1. :white_check_mark: T1 | T2: Decide how/whether to recognize type expressions that use | in a value expression context
    • A2. :arrow_forward: Finalize the name of the concept that the PEP introduces.
  • Draft a reference implementation for at least 2 static type checkers. Discover/resolve any specification issues.
    • B1. mypy - Probably drafted by me
    • B2. (?) pyright
  • C. Present PEP for formal review by Typing/Steering Councils
1 Like

A new draft of the TypeForm (previously: TypeExpr) PEP is ready for review in this PR! This latest draft is brought to you by @erictraut , a new co-author for the PEP.

Major changes include:

  • Refining and clarifying the terminology and definitions
    • ex: TypeExpr → TypeForm (biggest rename)
  • Eliminating the detailed implicit evaluation rules for every kind of type expression.
    • Instead, anything that conforms to the typing spec’s definition of a “type expression” should be accepted when assigned to a variable or passed to a function parameter that is explicitly annotated as TypeForm.

Please leave general comments in this thread and more-specific comments in the PR itself.


The next version of pyright (1.1.381), likely released Tues 9/17, will include provisional support for TypeForm as described in this draft. The new functionality is off by default, but it can be enabled by setting “enableExperimentalFeatures” to true in the pyright configuration.

I’m planning to work on mypy support for TypeForm over the next two weeks.

6 Likes

Hello all! First-time commenter on a PEP here. Apologies if I’m missing context in my suggestion here. Please take this comment as eager customer feedback.

I’ve been playing around with the new TypeForm and found myself very surprised by this:

s: str = "5"
t: type[str] = type(s)  # ok
tf1: TypeForm[str] = str  # ok
tf2: TypeForm[str] = type(s)  # error
tf3: TypeForm[str] = t  # error

I think the spec as currently written does not agree with this behavior. In particular, the spec says:

When a type expression is evaluated at runtime, the resulting value is a type form object.

str is, by this definition, a “type form object” because it is the result of evaluating a type expression at run time. But the result of evaluating type(s) at run time is also str, and since it does not say anywhere in the spec that a “type form object” is defined by its provenance, it must be that the result of type(s) is also a “type form object.” Similarly, the definition of TypeForm itself is

TypeForm is a special form that, when used in a type expression, describes a set of type form objects.

Since type(s) produces a type form object, it is in the same set of type form objects as str and therefore should belong to TypeForm[str].

There is precedent for making provenance in type hints – LiteralStrings are the same thing as strs at runtime, but defined specifically by their provenance.

I have two comments:

  1. I think the spec needs to be clarified here. Probably the easiest way to do this is to provide at least one example of the behavior I have above and compare to LiteralString. At the risk of opening a further can of worms, I would also suggest that LiteralType is perhaps a better name than then TypeForm. It’s more clear that the object in question is a “type” (not a syntactic construct), and it clarifies static provenance and not just runtime value is a part of the definition of the type.
  2. I would suggest that the definition of “type form object” in this spec as it is – that is, defined only but the set of runtime values that can result from evaluating type expressions – is actually more useful. What do we lose by making type, UnionType, etc. assignable to TypeForm? Unless I’m missing some important details – and I probably am! – any Python program that will run on a TypeForm under the narrower literal definition will also run on the expanded definition, since at runtime, type(s) and str are indistinguishable.

As a concrete example of (2), for some time, we have had in codebase an alias AnyType = type | UnionType | NewType | GenericAlias | TypeVar | ... to stand in for TypeForm. It’s hard to make this list authoritative because there are a number of runtime objects like typing._LiteralGenericAlias that are private to the standard library and that type checkers are not fond of. I was very excited to see TypeForm appear, only to realize that it does not in fact replace this construct at all.

The motivation for limiting TypeForm to runtime type objects with a specific provenance is, as far as I understand it, borderline circular: since type(s) is not a legal type expression, its result cannot be a legal type form by definition. It’s not clear to me that the (very good) motivation behind LiteralString – to catch bugs that can arise from allowing non-programmer-specified strings – is present here. Similarly, if TypeForm objects were to be used in code that runs only during type-checking, then of course limiting it to static type expressions would be well-motivated.

I would greatly appreciate a little more motivation in the spec: what bad programs are ruled out by forbidding tf: TypeForm[str] = type("5")?

1 Like

But type(s) is not a “type expression”, since function calls as a general rule are not valid in annotations. That is why the examples are failing.

Allowing type(s) would be an extra special case on top of the existing rules, and I am not sure if there is strong enough motivation to allow it.

@adampauls, thanks for the feedback on the PEP.

Almost every expression in Python can be evaluated as multiple types. For example, the expression [1] can be evaluated as list[int] or list[float] or list[str | int], etc. Each type checker has a default set of rules for inferring the type of an expression, but these can be overridden by providing additional context.

x1 = [1] # Type is list[int]
x2: list[float] = [1] # Type is list[float]

class MyDict(TypedDict):
    x: int

d1 = {"x": 1} # Type is dict[str, int]
d2: MyDict = {"x": 1} # Type is MyDict

The proposal for TypeForm follows this same pattern.

t1 = str # Type is type[str]
t2: TypeForm[str] = str  # Type is TypeForm[str]

s1 = int | str # Type is UnionType
s2: TypeForm[int | str] = int | str  # Type is TypeForm[int | str]

An earlier draft of the PEP specified that type[T] should be considered a subtype of TypeForm[T]. I removed this in later drafts because I found this special case to be poorly motivated. If type[T] is a subtype of TypeForm[T], then why aren’t UnionType, GeneralAlias, TypeAliasType, etc. also special-cased as subtypes? As you’ve pointed out, this list of special cases would be lengthy, would involve many undocumented classes (e.g. _LiteralGenericAlias), and would grow over time as the type system was expanded. This doesn’t strike me as a sound formulation. I suppose we could add back only type[T] as a special case, but that doesn’t seem very principled. Why would type[T] alone deserve special-casing?

To be generally useful, this alternate approach would also require us to specify a bunch of type evaluation behaviors for type checkers that are not currently specified and have traditionally been beyond the purview of the typing spec. For example, what type would you expect to be revealed here? All major type checkers currently disagree with each other and with the runtime. And this is just one example of dozens that I could provide.

T = TypeVar("T")
MyList = Annotated[list[T], ""]
reveal_type(MyList)

The current formulation in the PEP avoids the need to try to standardize these behaviors and builds upon well-specified evaluation rules that are in place for type expressions. It puts us on a firm footing with respect to type form evaluations.

If I understand your use case correctly, you want to write a function that accepts either a TypeForm type or a subclass of type. You can do that with the current draft of the PEP using a union: TypeForm | type. If you want to also accept a UnionType, you can add that explicitly to the union (TypeForm | type | UnionType). Does that satisfy the use case you have in mind?

Why would type[T] alone deserve special-casing?

I agree, that would be bad. I’m not saying it alone should deserve special-casing, I believe (but am definitely not sure!) that there is, in fact, an enumerable list of all runtime types that can result from evaluating type expressions (at run time, not by the type checker). It includes type, UnionType, TypeVar, Annotated, etc. Why is it a “special case” to say that type[T], UnionType, etc. are subtypes of TypeForm[T]? Aren’t they all, in fact, subsets of the possible type form objects for T and therefore subsets of TypeForm[T]?

[this list] would grow over time as the type system was expanded.

Well, exactly! If I want to handle the result of inspect.signature, as more and more (runtime) type expressions appear, I will have to add code to handle them. Am I wrong to expect that inspect.Signature.return_annotation will soon have a TypeForm as its output?

1 Like

So far, TypeForm seems to cover most use cases in Pydantic (playground). Thanks for the people working on this PEP, this will be really useful for us in many places. Seems like it came a long way and seems like having typing terms and concepts specified really helped.

I’ve tried finding edge cases and/or questions but couldn’t find any. Just one thing I was wondering: is a TypeForm itself a type form? i.e. is the following valid?

test: TypeForm[TypeForm[int]] = TypeForm[int]

pyright seems to accept it (playground).

1 Like

Yes, that should be valid. TypeForm is a form that can be used in a type expression to represent a type, so it is itself a TypeForm. (Sort of like how type(type) is type.)

1 Like

Yes, TypeForm can be used in a nested manner for the reason that Jelle provided. If you’re curious, this case is covered in pyright’s unit tests here.

1 Like

The runtime type of [1] is list and of MyList is typing._AnnotatedAlias. Whatever the type a type-checker decides to assign to each of this expressions, isn’t it fair to say that, all else equal, the runtime type should be assignable to whatever the type checker returns? Of course there are cases where this isn’t true – LiteralString and NewType at least – but both of those have the explicit goal of catching particular programming errors. It’s just not obvious to me that ruling out type form objects like str that are produced dynamically is a clear motivation here.

Suppose before the existence of Sequence, someone wanted to propose a unifying type for [1, 2] and (1,2). They call it Sequence, but list and tuple are not assignable to it – the type only applies to tuple and list literals. (I realize that “type form” is not as clearly misleading as Sequence – using “type form” to mean “type literal” is not that bad). But even renaming the type to LiteralSequence, at least addressing any confusion about intent, would leave one wondering about why you would limit the type to sequence literals?

I just realized that I didn’t remove this in the draft that I posted in the latest PR. An earlier draft of the PEP said that type[T] was considered a subtype of TypeForm[T], and I unintentionally left this in the latest draft. Pyright’s current provisional implementation (as of pyright 1.1.382) does not allow type[T] to be assigned to TypeForm[T].

As I explained above, I don’t think that special-casing type[T] here is well motivated, so I’m going to submit an update to the PR that eliminates this special case. @davidfstr, let me know if you have concerns about this and would like to argue for maintaining this special case.

Conceptually & ideally I think it would make the most sense for type[S] to be assignable to TypeForm[T] so long as S is assignable to T. However as Eric points out, you’d also need to extend that rule to cover many more cases:

Limiting the construction of a TypeForm to a literal type expression only seems a pragmatic way to sidestep needing to specify the additional cases while still allowing the main motivating uses for TypeForm. Therefore I’m in favor of having the limitation for now, in an effort to get a useful version of the PEP out the door.

If there is a strong desire down the road to make non-literal type expression objects assignable to TypeForm – such as type, UnionType, etc – I think that support could be added later without significant backward compatibility concerns.


Edit: In other news, I’ve started work on the mypy implementation of TypeForm. I’m hoping to have a PR posted within the upcoming week.

1 Like

Thanks for all the work on the PEP! In general, the current draft looks very clear and useful.

I think this is a very strong argument, and we should avoid introducing types that don’t follow this rule.

Type assignability should always be determined by considering what set of possible runtime objects a type represents; anything else is a violation of the core concepts of the Python typing specification. The set of possible runtime objects represented by type[T] is a subset of the set of objects represented by TypeForm[T], therefore type[T] is a subtype of TypeForm[T] and should be assignable to it.

The other types for the runtime representation of typing special forms discussed above (UnionType, for example), are not generic types. Therefore UnionType represents the set of all possible objects whose __class__ is UnionType, and thus can only be assignable to TypeForm[object] or TypeForm[Any]. The same goes for the runtime classes representing all other typing special forms.

I don’t think the fact that there are a number of such classes (and there may be more in the future) is a reason to avoid accurately enumerating them in the PEP (and thus the future spec for TypeForm), or in type checkers. There aren’t so many that it would be difficult to enumerate them; the type spec already enumerates all possible type expression forms.

3 Likes

Trying to experiment with TypeForm using pyright enableExperimentFeatures, I fairly quickly ran into case of having code like,

class TypedObject[T]:
  @property
  def object_type(self) -> TypeForm[T]:
    ...

class Foo(TypedObject[Foo]):
  @property
  def object_type(self) -> type[Foo]: # Error here
    ...

This is simplified, but it’s unexpected to not be able to implement subclass that returns simpler type where values are subset of normal typeform possibilities. Work around now is upgrading type[Foo] → TypeForm[Foo] although that’s little off of what it actually returns and can interfere if it’s used elsewhere with more normal methods like isinstance that only handle type case.

edit: Continuing this I have a few broad functions that handle TypeForm. I have many other functions/code that doesn’t need to handle that broadness and just uses type. I’d expect all places where I have a variable/argument of type[T] to be passable to few places that do handle TypeForm[T]. Otherwise I either need to promote a lot of places and lie there, or just ignore a lot of type is incompatible with TypeForm messages.

edit 2: One other edge case. Is None compatible with TypeForm[T]? I’d lean yes since None is strangely both value and it’s own type by itself. At moment x: TypeForm[T] = None reports an error, but x: None is valid type annotation.

edit 3: After updating core places that do runtime type introspection to TypeForm[T] I see around 30-40ish “false positives” mostly caused by treating type[T] as not compatible with TypeForm[T]. Not ideal but fewer than I expected given the codebase size is ~30k lines (although only some deal with types like this).

1 Like

Carl’s point above, plus Mehdi quickly running into the case where a type[T] and TypeForm[T] couldn’t mix has made me reconsider allowing at least type[T] to be a subtype of TypeForm[T].

I’ll draft the mypy implemention initially to allow type[T] as a subtype of TypeForm[T] so that folks can experiment with that configuration.

Yes, since None is a valid type expression.

Edit: However I will note that an earlier version of the PEP required the type None to be phrased explicitly as TypeForm(None) because it is ambiguous with the value None. It may still be pragmatic (from the perspective of a type checker implementor) to require the explicit phrasing. I’ll investigate this question in the mypy implementation.


Regarding the mypy implementation progress, it is proceeding but taking a bit longer than I expected. I almost have TypeForm fully recognized in assignments, but still need to make them work in function parameters and return types.

2 Likes