Basic terminology for types and type forms

Jelle · February 24, 2024, 2:54am

I have noticed some confusing terminology for core concepts in the type system, so to make communication easier, I’d like to propose adding a few important terms to the spec. I would want to add some variation of these definitions to the “Definitions” section, and go through the rest of the spec and adjust wording where it makes sense. This shouldn’t lead to any changes in actual specified behavior, but it would put the spec on a firmer footing.

Below are definitions of the terms I’d like to add. The main innovation is to use the term type form for expressions that are valid in annotations.

A class is an instance of the builtin type type, often created through the class statement. We avoid using the term “type” for classes, because that term can have other meanings.

A special form is an object that has a special meaning in the type system. Every special form is different, but many special forms are used with the syntax SpecialForm[T], where SpecialForm is the special form (often imported from typing) and T is a type form.

A type form is any expression that validly expresses a type. Type forms are always acceptable in annotations and also in various other places, such as the first argument to cast(). In some annotation contexts, special forms other than type forms are acceptable. For example, the type of a class attribute may be wrapped in the ClassVar[T] special form.

Valid type forms include:

The name of a class (representing instances of that class)
The name of a protocol
The name of a TypedDict
The name of a type alias
class[parameters], where class is a generic class or type alias and parameters is a comma-separated list where each entry is either a type form or an unpacked type form
None (representing None)
Literal[value] (representing literally that value; see the specification for Literal for what values are allowed)
LiteralString
A TypeVar
Any
Never or NoReturn
form | form, where form is any type form
Optional[form], where form is any type form
Union[parameters], where parameters is a nonempty comma-separated list of type forms
type[Any]
type[class], where class is any class
type[T], where T is a TypeVar
Callable[..., form], where form is any type form
Callable[P, form], where P is a ParamSpec and form is any type form
Callable[[parameters], form], where parameters is a (possibly empty) list of type forms or unpacked type forms, and form is any type form
A tuple type form (see below)
Annotated[form, metadata], where form is any type form and metadata is any expression
TypeGuard[form], where form is any type form (only valid in some contexts)
Self (only valid in some contexts)
A string, the contents of which (when enclosed in parentheses) can be parsed as a Python
expression which evaluates to a valid type form

An unpacked type form is a variant of a type form that is valid in some restricted contexts.
It is written as either *X or Unpack[X], where X may be:

A TypeVarTuple
A tuple type form

A tuple type form may be (in all cases, tuple can also be Tuple):

tuple[()] (an empty tuple)
tuple[T, ...], where T is a type form (an arbitrary-length tuple)
tuple[parameters], where parameters is a comma-separated list where each entry is either a type form or an unpacked type form

davidfstr · February 24, 2024, 3:27am

class

+1. This use of “class” is consistent with usage I’ve seen in PEPs and in discussions.

special form

+0. (There aren’t many cases where I foresee myself talking about a “special form” as a useful distinct concept.)

type form

+1. FWIW, this is roughly the same definition of “type form” used in the TypeForm proto-PEP. There, the full definition I used is:

The TypeForm proto-PEP's definition of a 'type form'

Values of type TypeForm

The type TypeForm has values corresponding to exactly those runtime objects that are valid on the right-hand-side of a variable declaration,

value: *form*

the right-hand-side of a parameter declaration,

def some_func(value: *form*):

or as the return type of a function:

def some_func() -> *form*:

Any runtime object that is valid in one of the above locations is a value of TypeForm.

Incomplete forms like a bare Optional or Union are not values of TypeForm.

Example of values include:

type objects like int, str, object, and FooClass
generic collections like List, List[int], Dict, or Dict[K, V]
callables like Callable, Callable[[Arg1Type, Arg2Type], ReturnType], Callable[..., ReturnType]
union forms like Optional[str], Union[int, str], or NoReturn
literal forms like Literal['r', 'rb', 'w', 'wb']
type variables like T or AnyStr
annotated types like Annotated[int, ValueRange(-10, 5)]
type aliases like Vector (where Vector = list[float])
the Any form
the Type and Type[C] forms
the TypeForm and TypeForm[T] forms

I’m not sure if the list is intended to be exhaustive, but every item you list I agree makes sense as a “type form”. In particular, I agree it should include:

mikeshardmind · February 24, 2024, 3:32am

I think the differentiation of type form and special form is slightly imprecise here, and I’d prefer to not put language around things like protocol that treat them as a “type form” rather than just a type as I think this could lead to a point of confusable terminology.

I think the more important distinction here is the context in which these forms appear in changes how we treat them, as a consequence of typing being implemented in python with objects that have a runtime representation.

To that end, I think we need clear definitions for “type expression” and “value expression”, and explaining why certain forms have differening behavior as a type and as a value.

Jelle · February 24, 2024, 3:33am

A difference is that my definition of “type form” excludes forms that are only present as the outermost part of an annotation in specific contexts (e.g., Final, ClassVar, NotRequired, Required, ReadOnly). I think that makes the concept more useful because those qualifiers are not valid in many places where type forms are accepted. Whether TypeForm should accept those forms I am not sure.

mikeshardmind · February 24, 2024, 3:41am

Something I mentioned elsewhere as an off-handed comment but do think may be worth exploring, I think those should be considered type forms, and that the way to ensure runtime introspectability of them would be:

TypeForm[TF, *Parameters]

where TF must be the typeform itself, or Any to indicate handling any type form

This allows granuarly accepting specific type forms, or saying your runtime function handles any of them.

An example of this that would handle a Union (At least if Union[*Ts] also becomes allowed)

@overload
def try_parse_as_value(typ: TypeForm[Union, *Ts], user_input: str) -> Union[*Ts]:
    ...
@overload
def try_parse_as_value(typ: type[T], user_input: str) -> T:
    ...

This direction would also make distinguishing between types as values and types as type expressions the more important distinction, so my musings on possible solutions for a few of these problems have probably shaped the language I would prefer.

davidfstr · February 24, 2024, 3:59am

Interesting: You’re intentionally excluding “type qualifiers”.

Yet I notice then that you include TypeGuard[T] in the definition of “type form”, which it’s only valid as a return-type annotation. If a “type form” is intended to be usable as-is in most contexts, I think that might exclude TypeGuard[T].

Michael H:

Something I mentioned elsewhere as an off-handed comment but do think may be worth exploring, I think those should be considered type forms, and that the way to ensure runtime introspectability of them would be:
TypeForm[TF, *Parameters]

The design of TypeForm - the runtime type of a type expression - I think is out of scope of this thread. I think Jelle is only looking to define “type form” for the typing spec, which I don’t expect to exactly equal whatever the future TypeForm PEP will specify. I’d suggest directing your comment here: TypeForm[T]: Spelling for regular types (int, str) & special forms (Union[int, str], Literal['foo'], etc) · Issue #9773 · python/mypy · GitHub

sirosen · February 24, 2024, 4:02am

Would there be any interest in adding type forms to the type system itself in the future? I think I’ve seen other languages call these “kinds” as in “kinds of types”.

It comes up naturally when doing runtime inspection of types. For example, imagine a function:

def is_optional(x) -> bool:
    """Is it a union with None?"""
    ...

What is the type of x? type | UnionType | ...?
As far as I know, it can’t be expressed today.

EDIT: Sorry, I see now that I didn’t follow this thread well, and that there’s such a proposal already in progress.

mikeshardmind · February 24, 2024, 4:10am

If the two aren’t equivalent or strongly related, I think it’s setting up for further confusable terminology. We already have a lot of overloaded terminology with different definitions in different contexts, I was trying to avoid creating another.

Jelle · February 24, 2024, 4:20am

TypeGuard is indeed at the boundary of my definition of “type form”. However, I feel it’s more comparable with Self, also only valid in specific contexts, but can be nested in e.g. list[Self], than like ClassVar, which is only valid at the top level of an annotation. For example, Callable[..., TypeGuard[str]] is a valid type.

I don’t use those terms here and they are not in the spec (“type expression” appears in one heading). What do you think they should mean? It seems pyright uses “type expression” in its error messages in a meaning that’s close to my “type form”.

I did have the TypeForm proposal in mind when I chose the word “type form”, though I think adding a term like it to our vocabulary is useful regardless of whether the TypeForm proposal goes forward. I’d be open to switching to “type expression” instead of “type form” for this concept if that reduces confusion.

I don’t think my definitions of “type form” and “special form” are especially close. The listing of protocols merely means that the name of a Protocol is a valid type form, which I hope is not controversial. I will edit the OP to clarify that I mean the name of a protocol or class.

mikeshardmind · February 24, 2024, 5:09am

Yeah, sorry I should have been more clear that I was introducing a competing set of terms that I felt address this difference in a way that is closer to the root of the difference.

class X(Protocol):
    def foo(self) -> str:    ...

class ConcreteX:
    def __init__(self, arg: str):
        self.arg: str = arg
    def foo(self):
        return self.arg

each of these declares a type. One of these is usable at runtime to create instances, but as far as the type system is concerned these are both type declarations, one of a structural type, one of a nominal type.

continuing the example

def takesX(val: X) -> bytes:
    return val.foo().enocde()


def takesOnlyConcreteX(val: ConcreteX) -> bytes:
    return val.foo.encode()

In these, the prior types we declared are used as “type expressions”, they express to the type system an expectation about the type of a value. Type checkers may emit errors when determining statically that a value would not be consistent with a type expression.

def register_serialization_hook(typ: type[T], hook: Callable[T, bytes]):
    some_registry[typ] = hook


register_serialization_hook(ConcreteX, takesOnlyConcreteX)

ConcreteX here is a “value expression”, and is meant for runtime use, not for type system use. The type system is still interested in ensuring the value is conformant to the “type expression” it corresponds to, but (At least currently) this is only possible for things that can be composed with type

This is where all of the special forms in the type system have the potential to differ in their meaning between what the type system is concerned about and what runtime use is concerned about. As far as I can tell, the context of the form is the common factor that allows expressing the behavioral difference using the same language for all typing constructs, special or not.

Liz · February 24, 2024, 6:13am

I agree that the context is the common factor, but I don’t think what you’re saying directly competes without more substance. How would you go about using the definitions you have to differentiate the forms which don’t compose with type and explain this to people that weren’t already on the same page?

mikeshardmind · February 24, 2024, 6:52am

Somewhat glibly, I wouldn’t.

The confusion doesn’t come from those being any different in the type system.
None of these are fundamentally different from forms that do compose with type in any way other than 1 not existing yet, with successively higher-order relations being needed and expressed.

int as an annotation having 1 as a value which is consistent with it
type[int] as an annotation having int as a value which is consistent with it
TypeForm as an annotation having type[int] as a value which is consistent with it

So I wouldn’t address the common confusion from trying to place these kinds of expressions about types into different buckets, and instead address the confusion by discussing the differing context we encounter type system constructs in and how to reason about their purpose in each.

This is far more future-proof and simpler to define overall, and should not require ongoing maintenance with each added type construct. There is a drawback to this in that it is more abstract and requires slightly better intuition about the relation between stating an expectation of a type and runtime values that conform to that expectation, but I believe it is the better way to express this and that we can teach this in an approachable manner.

Now I’m viewing the specific confusion this is meant to address from a lens shaped by recent discussions as well as the context in which @Jelle linked this thread to me, so if it is meant to address other forms of confusion beyond this, maybe there’s more to work out here.

In terms of avoiding further overloaded terms or confusable terms, I would prefer to leave defining terms like TypeForm/SpecialForm to the in progress(?) pep which will also add a corresponding runtime expression of that idea. There’s a clear need for such a form to express runtime use of type expressions that require introspecting things which are Python objects, but only exist to represent typing concepts, and putting that work first means we can end up with consistent definitions for that term. I don’t think we need that term or similarly named terms which could later be confused for similarity to be defined to start helping better distinguish the points of confusion around this.

pf_moore · February 24, 2024, 9:49am

It would be useful to also clarify where these forms can be used (in overview, at least).

For example, am I right in thinking that only a class is valid in runtime isinstance calls? And as a base class when declaring a subclass? More generally, is it correct to say that the runtime is unaware of type or special forms, and only deals in classes?

Daverball · February 24, 2024, 10:19am

isinstance is sort of a special case in its own right, since anything can implement __instancecheck__ and __subclasscheck__, which some special forms do, e.g. for Union they are implemented, since it’s essentially equivalent to isinstance/issubclass with a list of types, but the individual members of the Union may themselves not be valid as arguments for isinstance/issubclass, so unfortunately the answer is an unsatisfying “it depends”.

All you can really say is that for class these kinds of operations will work and for a type form they may work, but they also might not. You can think of type form as a superset of the term class which makes no restrictions about you being able to use them in places where a class will work.

mdrissi · February 24, 2024, 10:31am

Libraries like pydantic often support many type forms at runtime. Even for standard library dataclasses has a small amount of TypeForm logic in treating ClassVar as special. Today whether function supports only class vs type forms is mostly documentation as there’s not yet a way to specify that in types. And most libraries that support type forms at runtime usually only support a subset and exact ones varies a lot by library. I have a library that has some special treatment and understanding of forms like Literal, Final, ClassVar, Required but it would struggle and does not understand how to deal with TypeGuard, TypeVar, Paramspec, and TypeVarTuple. Sometimes specific TypeForm may not make much sense for that api or maybe it does make sense and it’s just complex to support. Manipulating generics at runtime is tricky and I haven’t seen many libraries that do so.

I’d prefer to not distinguish too much TypeGuard vs ClassVar vs Required. While some special forms do have restrictions on where they can be used, in practice runtime manipulation support varies a lot for each special form. I expect most libraries that handle special forms will need case by case logic for each one and support ones that are most useful for that specific library. Saying TypeGuard is more or less restricted really depends on specific usage pattern. For a cattrs/pydantic like library that mainly focus on class attributes, TypeGuard is not a valid annotation for an attribute, while ClassVar is.

alicederyn · February 24, 2024, 1:45pm

I’m a little confused whether you consider a protocol/TypedDict to be a class. On the one hand it’s an instance of type, but on the other you have them listed as separate entries under valid type forms.

Daverball · February 24, 2024, 2:32pm

Protocol/TypedDict being a type at runtime is an implementation detail, while they allow some things that you can do with a regular type, there’s many things that don’t work, e.g. you can’t use them in isinstance checks^[1], or create an instance of in the case of a Protocol and even in case of TypedDict you are not creating an instance as much as just returning the dictionary that was passed into the constructor.

So I think it’s appropriate to put them in a different bucket than the class bucket. They’re not special forms, but they’re also not really classes. Type form should encompass everything that’s valid within an annotation, so I don’t think I would exclude type qualifiers from the type form term either, even if there’s stricter rules about where they’re allowed. Otherwise we put ourselves into a position where we need another umbrella term for qualifiers and then always use that in a union with type form if we want to talk about an annotation.

You can of course with a Protocol if you use the typing.runtime_checkable decorator, but it’s not part of the type ↩︎

alicederyn · February 24, 2024, 3:09pm

As long as it’s part of the definition, that’s fine. It wasn’t though. How would you change the definition given in the original post? I’ve tried a few versions in my head but they all seem deficient.

Daverball · February 24, 2024, 3:28pm

The definition in the post doesn’t mention whether TypedDict/Protocol is a special form, all it says is that it’s a type form, which includes special forms amongst other things.

You could certainly make the argument that TypedDict and Protocol could be interpreted as special forms, but the definition doesn’t actually explicitly state that.

I would split typing constructs into four categories:

nominal types or class
structural types (Protocol and TypedDict)
type qualifiers (Final, ClassVar, ReadOnly, Required, NotRequired)
special forms (everything else)

You could arguably add a fifth category for type modifiers, these are similar to type qualifiers, but we don’t have any of them yet, this would include things like Partial or Immutable which modifies the type that is wrapped, rather than the container (i.e. the owner of __annotations__).

Type form would include all four (or five) categories.

alicederyn · February 24, 2024, 3:29pm

The definition in the post says that both TypedDicts and protocols are classes.