Basic terminology for types and type forms

Daverball · February 24, 2024, 3:31pm

Could you quote the statement? Even after rereading I can’t find any such statement. The only mention of TypedDict I can find is in the list of valid type forms, same with Protocol.

alicederyn · February 24, 2024, 3:33pm

This is the only definition given.

Daverball · February 24, 2024, 3:40pm

Alright, that’s fair, then we’re back at that being an implementation detail, the spec should probably clarify that Protocol and TypedDict subclasses are exceptions to that rule, because they have special meanings and only partially behave like a class for convenience.

alicederyn · February 24, 2024, 3:41pm

Yeah, I was hoping there might be a more general rule than just listing those two exceptions, but I can’t think of a good one.

alicederyn · February 24, 2024, 3:47pm

Maybe “returns true from isinstance if and only if it is in the MRO of the object”? That actually carves out room for ABCs with custom isinstance checks too, like Iterator – is that a good thing?

Daverball · February 24, 2024, 3:48pm

I mean, in the end Python is a dynamic language and there’s nothing stopping people from defining metaclasses that do funky things. zope.interface.Interface for example creates instances of object when you subclass them, rather than instances of type, but they can still be subclassed to extend the interface with additional methods/attributes.

So it’ll be difficult to come up with a definition that’s absolutely rock-solid. So I think this definition is good enough, as long as it excludes structural types. Anything more rigorous will probably be too difficult for most people to understand.

alicederyn · February 24, 2024, 3:48pm

I think having a good concrete definition of a structural type is valuable? Knowing whether Iterator is a class or a structural type in this system, for instance.

Jelle · February 24, 2024, 3:53pm

Good point on the inconsistent definition of “class”. Maybe we should define “class” to include Protocols and TypedDicts, since they are classes at runtime, and introduce a new term, nominal class, for “normal” classes.

alicederyn · February 24, 2024, 3:55pm

How do we define “normal”?

Jelle · February 24, 2024, 3:55pm

It’s not a Protocol or TypedDict .

alicederyn · February 24, 2024, 3:56pm

What about Iterator? That’s a structural type too.

Jelle · February 24, 2024, 3:57pm

It’s a Protocol as far as type checkers are concerned: typeshed/stdlib/typing.pyi at 49b1a1a96a90946ff8885792eec30acb5cf39af0 · python/typeshed · GitHub.

alicederyn · February 24, 2024, 3:58pm

That feels like evidence that we need a concrete definition? Even if that’s just “here is a canonical list of things that are not nominal types”

alicederyn · February 24, 2024, 4:03pm

I would need to add “can be in the MRO of an object at runtime” to exclude TypedDict here. Equivalently, it must be possible for isinstance to return True

Jelle · February 24, 2024, 4:21pm

Under my definition, special forms are not a subset of type forms. They are names that have a special meaning to the type checker, e.g., TypedDict or Literal. You can think of them as the equivalent of keywords in the language grammar, except that they can be aliased (thanks Guido for this analogy). In other words, Literal is a special form, but Literal[1] is not; it is a type form, and more specifically a literal type. Any is both a special form and a type form.

Jelle · February 24, 2024, 4:24pm

I’m talking here specifically about terms that are to be used in the typing spec. What you’re talking about should be discussed in the language spec instead, though obviously it’s useful to keep the terms in sync.

The typing spec should definitely be amended to clarify which forms are acceptable in what contexts, though.

Daverball · February 24, 2024, 4:38pm

Alright, that makes sense, I think what made the distinction confusing to me was the inconsistent use of the terminology. The original definition seems fine, but right after you use it like this:

For example, the type of a class attribute may be wrapped in the ClassVar[T] special form.

You call the entire expression a special form, even though only the ClassVar part of it is a special form according to your definition.

davidfstr · February 26, 2024, 2:31am

It would be most clear if the definition of “type form” used in the typing spec was in alignment with the future use of the term in the TypeForm PEP. I expect that PEP to want to allow any kind of type expression, including ones that include type qualifiers like ClassVar[] or Final[] to be assignable to a TypeForm-typed variable.

So @Jelle I wonder if you’d consider a term like unqualified type form to refer to your original definition that excludes ClassVar[foo], Final[foo], etc.

Then that would leave the more-general term type form to refer to any kind of type expression, which is valid in at least one of the following locations:

On the right-hand-side of a variable declaration: value: *form*
On the right-hand-side of a parameter declaration: def some_func(value: *form*):
As the return type of a function: def some_func() -> *form*:

Jelle · February 27, 2024, 2:53am

Thanks for the feedback so far. I am now thinking of proposing the following set of definitions:

A type expression is any expression that validly expresses a type. Type expressions are always acceptable in annotations and also in various other places, such as the first argument to cast() .

An annotation expression is an expression that is acceptable to use in an annotation context (a function parameter annotation, function return annotation, or variable annotation). Generally, an annotation expression is a type expression, optionally surrounded by one or more type qualifiers or by Annotated. Each type qualifier is only valid in some contexts. Note that while annotation expressions are the only expressions valid as type annotations in the type system, the Python language itself makes no such restriction: any expression is allowed.

I think we can express this with the following BNF-ish grammar:

annotation-expression:
   | <Required> '[' annotation-expression ']'
   | <NotRequired> '[' annotation-expression ']'
   | <ReadOnly> '[' annotation-expression ']'
   | <ClassVar> '[' annotation-expression ']'
   | <Final> '[' annotation-expression ']'
   | <Annotated> '[' annotation-expression ',' expression (',' expression)* ']'
   | <TypeAlias> (only valid in variable annotations)
   | unpacked-type-expression (only valid for `*args` annotations)
   | <Unpack> '[' name ']' (where name refers to an in-scope TypedDict; only valid in `**kwargs` annotations)
   | string (contents must be parsable as Python code that is a valid annotation-expression)
   | name '.' 'args' (where name must be an in-scope ParamSpec; only valid in `*args` annotations)
   | name '.' 'kwargs' (where name must be an in-scope ParamSpec; only valid in `**kwargs` annotations)
   | type-expression

type-expression:
   | <Any>
   | <Self> (only valid in some contexts)
   | <LiteralString>
   | <NoReturn>
   | <Never>
   | <None>
   | name (where name must refer to a valid in-scope class, type alias, or TypeVar)
   | name '[' maybe-unpacked-type-expression (',' maybe-unpacked-type-expression)* ']'
   | <Literal> '[' <expression> (',' <expression>) ']' (see documentation for Literal for restrictions)
   | type-expression '|' type-expression
   | <Optional> '[' type-expression ']'
   | <Union> '[' type-expression (',' type-expression)* ']'
   | <type> '[' <Any> ']'
   | <type> '[' name ']' (where name must refer to a valid in-scope class or TypeVar)
   | <Callable> '[' '...' ',' type-expression ']'
   | <Callable> '[' name ',' type-expression ']' (where name must be a valid in-scope ParamSpec)
   | <Callable> '[' <Concatenate> '[' (type-expression ',')+ (name | '...') ']' ',' type-expression ']' (where name must be a valid in-scope ParamSpec)
   | <Callable> '[' '[' maybe-unpacked-type-expression (',' maybe-unpacked-type-expression)* ']' ',' type-expression ']'
   | tuple-type-expression
   | <Annotated> '[' type-expression ',' expression (',' expression)* ']'
   | <TypeGuard> '[' type-expression ']' (only valid in some contexts)
   | string (contents must be parsable as Python code that is a valid type-expression)

maybe-unpacked-type-expression:
   | type-expression
   | unpacked-type-expression

unpacked-type-expression:
   | '*' unpackable-type-expression
   | <Unpack> '[' unpackable-type-expression ']'

unpackable-type-expression:
   | tuple-type-expression
   | name (where name must refer to an in-scope TypeVarTuple)

tuple-type-expression:
   | <tuple> '[' '(' ')' ']' (representing an empty tuple)
   | <tuple> '[' type-expression ',' '...' ']' (representing an arbitrary-length tuple)
   | <tuple> '[' maybe-unpacked-type-expression (',' maybe-unpacked-type-expression)* ']'

Notes:

The grammar assumes the code has already been parsed as Python code, and loosely follows the structure of the AST. Syntactic details like comments and whitespace are ignored.
<Name> refers to a special form. Most special forms must be imported from typing or typing_extensions, except for None, type, and tuple. The latter two have aliases in typing: typing.Type and typing.Tuple. Callable may be imported from either typing or collections.abc. Special forms may be aliased (e.g., from typing import Literal as L), and they may be referred to by a qualified name (e.g., typing.Literal).
Any leaf denoted as name may also be a qualified name (i.e., module '.' name).
Comments in parentheses denote additional restrictions not expressed in the grammar, or brief descriptions of the meaning of a construct.

Discussion:

I switched to “type expression” instead of “type form” to align with Pyright’s usage and avoid interference with the potential TypeForm proposal.
I kept TypeGuard[...] as a type expression rather than an annotation expression because I think TypeGuard[T] isn’t really like a type qualifier such as Required[T]: it can be used inside a more complicated type expression (e.g., Callable[..., TypeGuard[T]]), and while Required[T] can be read as “type T, plus some non-type information”, that is not true for TypeGuard[T].

MegaIng · February 27, 2024, 3:06am

^[1] Typo fixed.

Also, imo this separation you added now is a good argument for this proposal of mine.

In your second paragraph you have both “type expression” and “type form”. Based on the later comment about avoiding type form the second is a typo? ↩︎