PEP 747: TypeExpr: Type Hint for a Type Expression

Daverball · June 24, 2024, 9:35am

I personally think it’s fairly intuitive what constitutes a type expression and what doesn’t. I think what confuses people the most is the expectation that there’s a simple rule like “If this special form is used, then it has to be annotation expression”, but unfortunately it is not always that simple, the context of where/how the special form is used matters and it takes time and effort to build an intuition for what is or isn’t a type expression.

A good example of this is Unpack which appears both in the type expression and annotation expression grammar. The special form itself does not make it a type or annotation expression, it’s the expression as a whole that needs to satisfy the grammar rule of either kind of expression.

TypeIs/TypeGuard are similar to Unpack, so they very well could’ve ended up in both grammars as well, but the main difference is that they’re also equivalent to bool, i.e. TypeIs[T] ~= bool rather than TypeIs[T] ~= T which I think is a pretty good rule of thumb of what is a type expression vs. an annotation expression. If you can peel off the outermost layer(s) of the expression without changing the set of values that expression represents, then it is an annotation expression and not a type expression e.g. Final[str] ~= str. The one exception to that rule currently is Annotated, but since it is often used for refinement types in runtime type libraries I think it is justifiable. It is kind of like the Any of annotation expressions, since we don’t know what kind of metadata is supplied, we have to assume that it could be type related, so it should be treated as a type expression, regardless of whether it actually is^[1].

It’s also important to recognize that variadic parameter and variadic keyword parameter annotations are a special case, since you’re technically not expressing just one type, but rather a whole sequence/mapping of types. So the remainder of the confusing parts of the grammar stem from scalar vs. sequence/mapping types, which need to first be unpacked into an appropriate container type in order to become a scalar type. In the case of *args and **kwargs that step is hidden because we decided that the annotation represents the type of the element rather the whole container, because the type of the container is predetermined. So you end up with incomplete type expressions for the heterogeneous case.

One possible refinement of the expression grammar could be to explicitly disambiguate scalar and non-scalar type expressions into their own sections and stress the fact that when we say “type expression” we always mean a scalar type expression that represents a single type/set of values.

Another good reason is that it is treated transparently by some introspection helpers, i.e. it can be silently stripped out of the expression, by things like typing.get_type_hints ↩︎

erictraut · June 28, 2024, 12:57pm

@davidfstr, thanks for continuing to push this idea forward!

I’ve reviewed the latest draft in its entirety. I think it makes good progress in some areas, but it also (IMO) takes some steps backward in other areas.

Type expression terminology

The PEP continues to confuse the meaning of the term “type expression”. It sometimes uses the term properly to describe “an expression that ’spells’ a type”, but it often conflates this with “the type that is spelled by a type expression”. Using the same term for both concepts is causing confusion in the PEP and in the discussions about the PEP.

Let’s be clear about what an “expression” is. It’s a grammatical construct that represents a tree of operations and operands. When the Python runtime evaluates an expression, it produces a value. When a static type checker evaluates an expression, it produces a type that represents all possible values that can be produced by that expression at runtime. In neither case (runtime nor type checker) does the evaluation of an expression produce another expression. It would be confusing, for example, to say that the expression (x ** 2.0) * PI has the type “float expression”. Its type is simply “float”. Likewise, if we evaluate a “type expression”, we wouldn’t say that the evaluated type is a “type expression”. It evaluates to a type in the Python type system.

The proposed name TypeExpr further serves to confuse these two concepts, so I think that using TypeExpr here is a big step backward over the previous TypeForm. I’m not wedded to the term TypeForm, but I think it’s much better than TypeExpr. If people don’t like TypeForm, let’s look for some other option that better describes “a type in the Python type system”. I’ll throw out a few here: TypeObject or SpelledType. (Of these options, my personal preference is still TypeForm.)

I apologize if this comes across as bikeshedding, but terminology matters. And in this case, I think TypeExpr is not serving us well.

TypeExpr without brackets

The latest draft says:

It can also be written without brackets as just TypeExpr, in which case a type checker should apply its usual type inference mechanisms to determine the type of its argument, possibly Any.

This is inconsistent with all other type expressions that take type arguments, and it will not compose well with other type features. If a type argument is omitted from a type, type checkers should always assume Any. For comparison, the type type is interpreted as type[Any]. The type list is interpreted as list[Any]. Likewise, the type TypeExpr should be interpreted as TypeExpr[Any]. There should be no exception. There is no need to apply inference rules here.

Rules for unions

In the “Implicit TypeExpr Values” section, there’s a subsection that discusses the rules for unions. It says:

As a value expression, x | y has type TypeExpr[x | y] if x has type TypeExpr[t1] (or type[t1]) and y has type TypeExpr[t2] (or type[t2]).

This is problematic for classes whose metaclass overrides the __or__ or __ror__ method. In this case, the type of value expression x | y should honor the method on the metaclass rather than assume that the expression is intended as a type expression.

Current type checkers evaluate the value expression x | y as type UnionType (unless the metaclass overrides __or__ or __ror__). Some typeshed definitions have come to rely on this UnionType behavior. Most notably, the definition for isinstance and issubclass use UnionType in their signature. That’s because these two calls (unfortunately, IMO) accept value expressions of the form x | y. It’s not clear to me how the signatures for these two functions would change given the proposal in this PEP. I’m concerned that there will be no way to express these signatures accurately if we switch to the rules proposed in this draft PEP. More generally, I recommend looking at all the places where typeshed stubs currently use UnionType and asking whether the proposal in this PEP breaks those usages.

Rules for Annotation Expressions

The latest draft includes a section titled “Implicit Annotation Expression Values”. I strongly recommend deleting this section. The idea of representing runtime types of annotation expressions is not well motivated in this PEP. If someone were to ever propose adding such a construct to the type system, I would push back hard against it. These constructs do not spell types, so the result of evaluating one of these as a value expression will not follow normal type calculus rules. What would it mean to take the union of Final[str] and ClassVar[int]? What would it mean to take the intersection of Required[str] and Unpack[tuple[()]]? Is Final[str] a subtype of str? These are nonsensical questions. This construct doesn’t belong in the type system. Including this section in the PEP is confusing, problematic, and unnecessary.

I likewise recommend removing any mention of the “Annotation expression object” from the “Rationale” section.

Subtyping Rules

One of the rules in the “Subtyping” section uses the term “plain type”. I think I know what you mean by this, but it isn’t a term that used anywhere else in the typing spec, so it could be misinterpreted. I recommend deleting the rule that uses this term. It’s unnecessary if TypeExpr (with no type arguments) is always interpreted as TypeExpr[Any], as it should be.

Literal[] TypeExprs

The “Literal[] TypeExprs" section begins: “To simplify static type checking…”. I don’t think that’s an accurate or well-reasoned justification for the rule that follows. The reason this shouldn’t be supported is that variables (dynamic values) are not allowed in type expressions. In your example, STRS_TYPE_NAME is a variable, and the fact that it appears in the expression means it is not a valid type expression.

davidfstr · June 28, 2024, 11:53pm

Type expression terminology

The most straightforward spelling for the concept, type or Type, is unfortunately already in use for spelling class objects which do not encompass all types.

Early on I proposed widening the existing definition of type to match any type and not just class objects but got pushback from various mypy folks, mostly on backward-compatibility grounds.

Code that manipulates TypeExpr objects at runtime is typically actually manipulating the syntactic elements of the expression, looking at the origin / args of the expression, etc. Therefore I think it makes sense to still call the concept an “expression”.

The name “type form” specifically is problematic which is why a new name was chosen.

TypeExpr without brackets

Final is allowed to infer its argument:

my_const: Final = 5
# Is a Final[int]

It seems to me that allowing similar inference for TypeExpr would provide good ergonomics:

typx: TypeExpr = int | str
# Is a TypeExpr[int | str]

It sounds like your main objection is that inferring the parameter would be difficult to implement. If so, I think the main place it seems to be helpful to infer the parameter is in a direct variable assignment like in the example above. Perhaps in that specific case it wouldn’t be too hard to infer?

FWIW, Any is explicitly allowed by the current wording.

Rules for unions

Interesting. Sounds like I’ll need to tweak the rules so that UnionType continues to be inferred for the cases that you mention. And then state explicitly that UnionType is treated as a subtype of TypeExpr.

Rules for Annotation Expressions

Ah. I originally added this section based on feedback from you. But I can easily delete the section. I’m fine leaving the type of annotation expressions undefined, as they currently are today.

Subtyping Rules

If I were to delete this rule then there would be no rules for how to treat TypeExpr[Any] vs. type[Any] because the “subtyping” relation (as PEP 483 defines it) does not apply to Any. Only “is-consistent-with” (recently rebranded to “assignability”) applies to Any. So I still think I need this rule.

I can rephrase to avoid the term “plain type” though.

Literal[] TypeExprs

I’ll rephrase to use this justification.

erictraut · June 29, 2024, 12:42am

Final is not a type. It is a type qualifier. It is not allowed within a type expression, and the subscript within a Final index expression isn’t a “type argument”.

The proposed TypeExpr is a type, and it takes a single type argument. It should follow the rules of other types. For all existing types that accept a type argument, Any is assumed if a type argument is omitted. It would require an extremely compelling argument to justify an exception to this rule. I don’t see any such justification.

Also keep in mind that TypeExpr (unlike Final) can be used any place that a type expression can be used, so it can appear in type alias definitions, parameter annotations, etc. It wouldn’t make sense to allow inference in these situations. This is what I mean by “it wouldn’t compose well with other type features”.

Daverball · June 29, 2024, 9:52am

I think another argument against it is that it would encourage people to use TypeExpr instead of TypeAlias/PEP 695 type alias expressions, which really seems like a bad idea.

I don’t really want to see TypeExpr for a static assignment, that’s not the purpose of it, TypeExpr is all about allowing dynamic runtime behavior. I think of TypeAlias = Foo as being more or less equivalent to Final[TypeExpr[Foo]] = Foo. An inference shortcut may give people the wrong idea and blurs the lines between type aliases and type expressions.

davidfstr · June 29, 2024, 1:59pm

Fair enough. A few commentators already seem to be unclear about the distinction between TypeExpr and TypeAlias so leaving in a speed bump that makes TypeExpr more difficult to use in a variable assignment could be beneficial to avoid misuse. As you mention, TypeExpr’s main benefits don’t come from using it in static assignments anyway.

I’ll alter a bare TypeExpr (with no type argument) to always be treated as TypeExpr[Any], and update any code examples that need to change.

davidfstr · July 10, 2024, 12:59pm

A new round of feedback has been integrated to the TypeExpr PEP via this diff and is ready for review.

Notably, new rules for inferring the type of union expressions like X | Y were added. These new rules for inferring Union use pretty much the same rules as regular value expressions (i.e. look for an __or__ method). Additionally, to ensure that a new TypeExpr[X | Y] value can be passed to a function that expects a UnionType (like isinstance) I’ve made the former a subtype of the latter.

@erictraut I’m particularly curious to hear if the new rules for unions make sense to you.

Other changes:

TypeExpr == TypeExpr[Any]
Rules for recognizing Annotation Expressions are deleted and left unspecified
TypeAlias is contrasted with TypeExpr

erictraut · July 10, 2024, 9:53pm

Thanks for the update, @davidfstr.

I’ve left a bunch of comments in the commit.

The latest draft proposes that TypeExpr is a subtype of UnionType in some cases and that UnionType is a subtype of TypeExpr in some cases. I don’t think that works. It breaks some fundamental set-theoretic rules about how types work. I presume that you were prompted to propose this awkward arrangement as a workaround to the UnionType issue that I mentioned in my previous round of feedback. While this is a creative attempt to address that problem, I don’t think it’s a viable solution.

I see two other solutions that I think are viable:

We change the PEP to indicate that value expressions containing a union (either an old-style typing.Union or a newer-style | operator) are not evaluated as a TypeExpr. This would require the use of an explicit TypeExpr constructor call for unions, as is required for other ambiguous forms.
We work to remove UnionType from all places where it is currently used in typeshed stubs and replace them with TypeExpr[Any]. There are currently five places where UnionType is used: isinstance, issubclass, type.__(r)or__, GenericAlias.__(r)or__, get_origin, and unittest.TestCase.assert(Not)IsInstance. Most of these are already special-cased by type checkers, so these type definitions don’t need to be precise. The exceptions are get_origin and the unittest methods. These would need to be updated to include TypeExpr[Any] along with UnionType.

Of these two options, I have a slight preference for 2, but I’d like to hear from maintainers of typeshed and other type checkers.

All of my other comments are related to phrases and terms that I think are ambiguous, undefined, or confusing in the latest draft.

I want to once again reiterate my concern about the term TypeExpr. I think it adds to the confusion of this feature rather than helping to clarify, as good terms should do. Do others share this concern?

carljm · July 10, 2024, 11:02pm

I do. I fully agree with the argument you spelled out earlier in this thread. An expression is a syntactic construct, not a runtime object or value. The very first sentence of the PEP already sounds confused to me when it mentions “type expression objects.”

I much prefer the name TypeForm. I don’t find any of the concerns that have been raised about TypeForm (including the existence of a company by that name) to be convincing reasons the name shouldn’t be used for this PEP. I don’t think anyone will confuse a Python static typing construct with a product that creates web forms, and I suspect that fairly quickly a search for “python typeform” will bring up the right thing.

mdrissi · July 10, 2024, 11:24pm

As one of the most common type forms I use in runtime context is Unions, I’d strongly prefer solution 2. UnionType I view as mostly an internal annotation currently and would rather update typeshed to use TypeForm where needed over saying that I can’t pass explicit Union[int, str] (or int | str) to function that will handle runtime types.

On naming I’m happy with TypeForm. I also liked TypeAnnotation which is a bit verbose, but pretty explicit.

davidfstr · July 12, 2024, 12:28pm

Of the two alternative approaches you mention I personally prefer #2, for the reasons @mdrissi mentions. This approach I believe involves:

Treating value expression X | Y as having the type TypeExpr[X | Y] rather than UnionType
- This would allow X | Y to be used in contexts like isassignable(foo, X | Y) without requiring it to be explicitly wrapped in TypeExpr(...).
Eliminating usage of UnionType entirely, notably from typeshed stubs, replacing with something else
- TypeExpr[object | Any] I think may work to write “any union type”.
- TypeExpr[Any] - a more permissive type - could also be substituted, which would probably be OK for typeshed (which is special-cased by typecheckers anyway to something narrower) but would be less-ideal for (the rare) user code that only wants to accept union types.
Deprecating types.UnionType itself and scheduling it for removal
- The construct isinstance(foo, UnionType) could be made to raise a DeprecationWarning as per Python’s backward compatibility policy.

sigh OK. I’ll open the naming can of worms again. I’ll give my own thoughts probably late next week when I have more energy.

davidfstr · July 20, 2024, 1:46pm

Let’s talk about the name of the concept that PEP 747 is trying to introduce. Originally it was called “TypeForm” and later was renamed to “TypeExpr”. Commentators still have some concerns over using “TypeExpr” as the name, so I’m reopening the name for discussion.

A complaint against “TypeExpr” is that it describes not the concept itself it references (i.e. “any type”) but rather that it describes syntactically how that concept is spelled. Although it is useful to have a formal definition of the concept’s spelling, it may not be appropriate to focus on the spelling in the name itself.

Goals of the Name:

Align with the concept being named: any kind of type (spelled by a “type expression”); not just class object types
Distinguish from the similar but distinct concept: any kind of annotation (spelled by an “annotation expression”), which may or may not spell a type
Be approachable/understandable by users who are not Python typing experts
Be concise

Approach 1: Focus on what the thing is: (prefer this approach)

Candidates:

Type (name unavailable, taken by class object types)
AnyType (may be too similar to Any, a somewhat different concept)
GeneralType
GenericType (too similar to a different concept: “generics”)
ComplexType (misleading because can hold a simple class object type)

Approach 2: Focus on the spelling of the thing, and that it’s actually processed as an expression at runtime: (avoid this approach)

Candidates:

TypeExpr / TypeExpression (perfectly aligns with “type expression”, the formal spelling of the concept)
TypeAnnotation (easy to mistake for an “annotation expression”, especially for non-experts)
Annotation (nearly certain to confuse with “annotation expression”; hard to distinguish from Annotated)
TypeHint (easy to mistake for an “annotation expression”, especially for non-experts)
TypeForm (aligns with the concept of a “special form” which may be familiar to experts but not to regular users; not informative to emphasize that concept is implemented as a special form because most typing spellings are generally implemented that way; collides with the name of a popular survey product)

Recommendation:

Of the above candidates in approach #1, I like AnyType the best:
- It aligns well with the concept and is concise.
- Its main drawback is its similarity to Any, which is a slightly different concept.
I also think GeneralType is OK, although sounds a bit milquetoast.

I’m interested to hear other opinions. Are there any other name candidates you’d prefer? Why?

mikeshardmind · July 20, 2024, 3:14pm

TypeForm and TypeExpression remain the best options I’ve seen so far, and I still don’t find the idea that TypeForm conflicts with a company name unrelated to the python type system as a reason to avoid it.

I don’t think any of the types in the first group are good options, despite saying that this is a focus on what it is, none of these accurately are what it is.

AnyType isn’t accurate, Not all types are valid for this
GeneralType, general how?
GenericType, that actually does conflict with preexisting concepts that people would find in a search
ComplexType, complexity is arbitrary and many complex types would not qualify as compatible.

By contrast, the things you’ve said are focusing on the spelling of the thing actually seem to better capture what it is too.

TypeExpression is fine, it’s a bit verbose, but this is literally what it is, so this should seem to belong in group 1

I agree with TypeAnnotation, Annotation, and TypeHint having drawbacks in confusability.

TypeForm aligns well for experts, who is arguably the audience for this feature. It’s less lengthy than TypeExpression, but also slightly less accurate.

xmw · July 20, 2024, 8:33pm

How about TypeValue to indicate “a type used in a value context”? “Value expression context” or “value context” seems to be how it’s most widely described in this thread.

I don’t think trying to link this construct to special forms makes it clearer, as (if TypeForm was the final name) if I’m understanding correctly, TypeForm does not declare a special form, it is a special form.

rusty-snake · July 23, 2024, 5:36am

The way I read TypeExpr is that it’s a special kind of expression that evaluates to a type. Seems pretty natural to me. Is this interpretation inaccurate?

carljm · July 23, 2024, 5:59am

I still prefer TypeForm best of all the listed options.

I think the link to “special form” is an advantage, not a disadvantage. The word “form” in both cases has the same meaning, and the term applies well in both cases. A “special form” is just that: a special form that represents some special kinds of types at runtime (as opposed to the “common” form, a class object). A TypeForm is any of the forms of a type, both common forms and special forms. The link is apropos because “special form” is precisely the name we give to the runtime object resulting from evaluating a “special” type expression; these (as well as the “common form” of a type) are the kinds of runtime objects that can be passed where a TypeForm is expected.

I think focusing on names “for beginners” or “for experts” is over-thinking it. The concept described in this PEP is inherently an advanced concept. There is no name we could pick that will successfully teach the concept to beginners; they will learn it from documentation, and whatever name the documentation gives it will be the name that they learn for it. It is more important that we pick a name that is accurate and consistent with existing use of terminology, and TypeForm scores very well there, IMO.

Jelle · July 23, 2024, 9:43pm

Worth noting that the way we now define “special form” (Glossary — typing documentation), special forms would not be a subset of TypeExprs or TypeForms. For example, Final is a special form, but it is not itself a type, and would not be valid as a TypeExpr.

carljm · July 23, 2024, 10:22pm

Thanks, this is a wrinkle worth noting, but FWIW I don’t feel it makes a difference. Sure, we can have “forms” for annotations that are not types too, and “special forms” for those as well, but clearly TypeForm is the form of a type; it’s in the name. The terminology consistency/linkage that I’m suggesting doesn’t rely on “special form” being a subset of “TypeForm”, just that the term “form” plays the same role in both. And it does.

davidfstr · July 24, 2024, 1:01pm

It seems the opinion of the typing experts vocal in this thread is mainly in favor of the name TypeForm. To paraphrase:

A TypeForm is any of the forms of a type, both common forms (i.e. class objects) and special forms.

The reasoning cited is consistent with the reasoning I used originally when suggesting the name TypeForm back in 2020: showing an alignment with the existing term “special form”.
Typing experts reasonably argue that “type form” is what the concept is rather than merely how it is spelled, aligning with a preference toward naming the concept for what it is.
Several folks don’t seem to find the name collision with the Typeform survey product to be a problem in the long term.

TypeValue is an interesting name, well-contrasted with TypeAlias. However:

I don’t think TypeValue is sufficiently distinct from Type. Similarly if I had to choose between List and ListValue, I’d prefer List because it’s shorter and more direct. Sadly the name Type itself is taken.
The contrast with TypeAlias isn’t really that important, vs showing a contrast with Type. Additionally TypeAlias is soft-deprecated in favor of the type statement, so showing contrast with a less-and-less used construct will be increasingly less informative to users.

At runtime a variable of TypeForm type would indeed contain a “special form” object if it did not contain a class object. So it is a “form” in that sense. Picking a name that aligns with what the concept is seems desirable.

That’s accurate. However if it’s possible to use a term that better aligns with what the concept is rather than how it is spelled, such a term would be preferred.

Regarding what the concept is:

From the perspective of a user, the concept is a “type”.
From the perspective of a static type checker, the concept is a “type” (or a “type form”).
From the perspective of a runtime type checker the concept is a “type expression” (or a “type form”), whose parts are individually introspected and manipulated.

I prefer the user’s perspective over all others where I’m able to.

xmw · July 24, 2024, 1:58pm

Thanks for the feedback.

IMO, TypeValue not being distinct in comparison to the type statement is less of an issue than the type statement being spelt identically to builtins.type.

The purpose of the name TypeValue was not to be distinct from typing.Type or builtins.type, as one of the main benefits of this PEP is to be able to declare a dynamic type value broader than just concrete classes (superceding use of typing.Type[...] or builtins.type[...] in several contexts).

Whatever the name chosen, I wouldn’t mind if it was similar to other names which declare types (Type/type, TypeVar, TypeAlias, TypeAliasType, NewType, TypeVarTuple, TypedDict), and would prefer it to be distinct from things like access modifiers, type qualifiers, or other items with special meanings, like ClassVar, Unpack, or Concatenate. Luckily, most or all of the latter group don’t have the word “type” in it.