TypeForm: Spelling for a type annotation object at runtime

davidfstr · April 20, 2024, 2:56am

Back in 2021 I started a conversation about creating a way to spell the type
of a type-annotation-object: TypeForm[]

>>> # A special form object
>>> Union[int, str]
typing.Union[int, str]

>>> # A variable holding that object. NOT a type alias.
>>> IntOrStrType: TypeForm = Union[int, str]

>>> # Call of a function which accepts a special form as an argument
>>> assert isassignable(1, IntOrStrType)

Now in 2024 there seems to be renewed interest, so I’ve gathered some more
input and have drafted a PEP that is ready for review.

Notable changes/additions since the 2021 version of this PEP

TypeForms are no longer specified to match any kind of “annotation form”.
Instead they only match those type-annotation-objects that spell a type.
- See §“Rejected Ideas > Accept arbitrary annotation-forms”
Both TypeGuard[] and TypeIs[] have been accepted, so interaction with
those types is described:
- See §“Interactions with TypeIs, TypeGuard, and type variables”
Interactions with Annotated[] forms are described:
- See §“Interactions with Annotated and type variables”
Stringified type-annotation-objects are themselves considered to be TypeForms:
- See §“Stringified TypeForms” and §“Forward References”
TypeForm[T] is defined to be covariant in its argument type rather than being invariant, which matches the behavior of Type[T].
- See §“Subtyping”
A survey of §“Common kinds of functions that would benefit from TypeForm” was added to the end of the §“Motivation” section.

mikeshardmind · April 20, 2024, 5:57am

Something which appears to be missing from the design, whether intentionally so or not, is the ability to switch on type forms or to express in the type system whether a function handles specific type forms, but not others, or handles some in a different manner from others.

Copying this over from the original github issue, there was a suggestion to be able to encode this information into the type system’s handling of TypeForm by having the first argument to TypeForm be the form, and the remainder the inner annotations

@overload
def try_parse_as_value(typ: TypeForm[Annotated, T, *Annotations], user_input: str) -> T:
    ...
@overload
def try_parse_as_value(typ: TypeForm[Union, *Ts], user_input: str) -> Union[*Ts]:
    ...
@overload
def try_parse_as_value(typ: type[T], user_input: str) -> T:
    ...

There’s not always going to be an equivalent way to do this with

TypeForm[Form[...]] as best I can tell (due to restrictions on where parameterized generics are allowed)

davidfstr · April 21, 2024, 2:52pm

express in the type system whether a function handles specific type forms, but not others, or handles some in a different manner from others.

I expect that runtime functions which accept a TypeForm as input - many of which will attempt to support all possible forms - will nevertheless vary (from each other) in many small details RE what specific forms they recognize.^[1]

I’m not sure that its desirable (or even possible) to write out the exact constraints on what kinds of forms a particular function will accept. (And I think that’s OK.) I’m planning to discuss & elaborate on this consideration (among others) for functions accepting TypeForms as input in a “How to Teach This” section in the next draft.

there was a suggestion to be able to encode this information into the type system’s handling of TypeForm by having the first argument to TypeForm be the form, and the remainder the inner annotations […]

Aye. For that particular example I have some specific thoughts I’d like to articulate carefully but I’ve run out of time this weekend. Let me get back to you in a few days.

For example a stringified type annotation object like "SomeTypeAlias" is difficult/impossible for a runtime type checker to support in the general case, assuming information about which module the string was defined in is absent. ↩︎

mdrissi · April 22, 2024, 6:07am

Expanding there I have runtime typing functions that exact type annotations they work on is very hard to describe. They handle types that are registered for serialization/deserialization. This includes basic types + common generics (list/set/dict/Mapping/Sequence/etc). User defined generic classes kind of work, but fail in some ways. Polymorphic classes (classes with subclasses) work usually, but struggle when generics mixed in. String annotations usually work, but sometimes getting right namespace to evaluate them is tricky and fails. You can find bug reports to python itself on handling forward references across different modules correctly. TypedDicts mostly work, but I haven’t looked at adding any logic for ReadOnly yet. Required/NotRequired mostly works, but there’s one open bug in typing-extensions about it when mixed with forward references that causes it to misbehave.

Similarly libraries like typeguard/cattrs/etc tend to generally work but have strange exceptions/caveats where runtime analysis becomes hard and they don’t handle. So I don’t think TypeForm[int]/TypeForm[Foo] with concrete type is good idea to support especially in initial PEP.

IntOrStrType: TypeForm = Union[int, str]

This example I also find strange. Main use case I see for TypeForm is functions like,

def trycast[T](form: TypeForm[T], value: object) -> Optional[T]: ...

where TypeForm has a generic typevar with it. What does TypeForm without type variable mean? TypeForm[Any]? But TypeForm[int]/TypeForm[Foo] we’ve excluded as supported.

I would lean to only allow TypeForm[T] where T is TypeVar and fully exclude TypeForm by itself. So x: TypeForm = int would raise a type checker error like TypeForm is unsupported outside function signature without type variable. Similarly TypeForm[int] is not allowed. What function would you write that takes TypeForm vs object where distinction is important? TypeForm[T] is very useful because of T being usable elsewhere in signature, but TypeForm by itself lacks that. Notably most (all maybe) of your examples in Common kinds of functions that would benefit from TypeForm are use cases where TypeForm[T] is valuable not TypeForm without type variable.

Next topic is exactly which annotations are allowed and what they would mean. Let’s focus on trycast function, what does

x: object
P = Paramspec('P')

class Foo:
  ...

trycast(Required[int], x)
trycast(P, x)
trycast(P.args, x)
trycast(ClassVar[int], x)
trycast(Final[int], x)
trycast('Foo', x)

mean? If you bind ClassVar[int] to T and it returns Optional[T] what does it mean to return a ClassVar? Similarly paramspec case I have no clue how to interpret even though P.args is a valid annotation in some contexts. I think these examples lead to necessary constraint that TypeForm[T] likely only works for annotations that would be valid as annotation for an argument in function signature. So Required/ReadOnly/Final/ClassVar/Paramspec/etc are all forbidden to bind to TypeForm[T].

Required/ReadOnly could make sense in very specific contexts like,

class Foo(TypedDict[T]):
  x: T

Foo[Required[int]] # This could work and makes some sense.

but even there I struggle to see where TypeForm should be mentioned so I think safest choice to avoid strange behaviors/rules is only annotations that can always be placed as input/return argument annotation for a function signature are allowed.

Also I’ve focused on function signatures as that’s main place I see TypeForm[T] as valuable. I’m unsure on why/where we should allow

STR_TYPE: TypeForm[str] = str  # variable type

this case especially when it also does TypeForm[str] instead of TypeForm[T]. I lean that TypeForm should not be allowed for variable types or we need examples/library use cases where allowing it for variables is valuable.

edit: I realize reading further TypeForm Values section covers some of the Required/Final rules. My only disagreement is on variable declaration rule as I remain unclear on value/purpose of typeform next to variable. The parameter/return type rules I agree with fully.

edit 2: One comment I made on document is about handling of ForwardRefs. If you have code like,

trycast('list[int]', x)

then trycast must make ForwardRef and convert it to runtime type object. ForwardRefs have a private method _evaluate in typing.py. I’m unaware of any good public typing api to convert forwardref object to runtime one. get_type_hints will often do it for you but depends where/how annotation is given to you. So I think for TypeForm usage either PEP acknowledges/advises libraries to re-implement their own forwardref → runtime value logic or it would be helpful to publicly expose 1/2 existing typing.py functions. ForwardRef_evaluate is one I consider most useful although maybe similar function in typing would be preferred.

KotlinIsland · April 23, 2024, 11:57pm

Is None a valid value for a TypeForm? It’s a valid value to use in a type annotation.

davidfstr · May 1, 2024, 2:15pm

Thank you @erictraut and @Jelle for the feedback inside the PEP document! I will respond/integrate those comments over the next few days.

Michael H:

there was a suggestion to be able to encode this information into the type system’s handling of TypeForm by having the first argument to TypeForm be the form, and the remainder the inner annotations
@overload
def try_parse_as_value(typ: TypeForm[Annotated, T, *Annotations], user_input: str) -> T:
    ...
@overload
def try_parse_as_value(typ: TypeForm[Union, *Ts], user_input: str) -> Union[*Ts]:
    ...
@overload
def try_parse_as_value(typ: type[T], user_input: str) -> T:
    ...

Looking at this example, it appears to be trying to use the proposed TypeForm[OriginType, *ArgTypes] syntax to define the exact kinds of forms the function will accept, which I mentioned isn’t always desirable or even possible.

Considering alternative uses for the proposed TypeForm[OriginType, *ArgTypes] syntax, the following function observed in the wild could potentially use it, but there’s also a straightforward workaround.

svcs.svcs_from() function

* Typed lookup operations:
    * Takes a sequence of type-forms and returns a tuple of instances of those forms.
    * Pattern: `def get_instances(forms: *TypeForm[T]) -> Tuple[*T]`
    * Examples: svcs.svcs_from(...).get(...)
    * Workaround: Use overloads like:
        * @overload
          def get_instances(t1: TypeForm[T1]) -> Tuple[T1]: ...
          @overload
          def get_instances(t1: TypeForm[T1], t2: TypeForm[T2]) -> Tuple[T1, T2]: ...
          # (... repeat up to tuples of length 7 or so ...)

Lacking other compelling use cases, I’m leaning away from supporting syntax that can match the origin & arguments to a type expression.

Yes, exactly. This is consistent with how other generic type constructors are handled.

I believe for every syntactic position where a type variable (T) is allowed, a literal type has also been allowed as well. That makes sense to me because you can “substitute” a specific value into the location of a variable.

It’s not clear to me the value of specially banning the use of a literal type argument, such as disallowing TypeForm[int].

Mehdi Drissi:

Next topic is exactly which annotations are allowed and what they would mean. Let’s focus on trycast function, what does
x: object
P = Paramspec('P')

class Foo:
  ...

trycast(Required[int], x)
trycast(P, x)
trycast(P.args, x)
trycast(ClassVar[int], x)
trycast(Final[int], x)
trycast('Foo', x)
mean?

Of the examples above, only trycast('Foo', x) would be permitted.

As indicated in §“TypeForm Values”:

A runtime object that is valid in only some but not all of the above locations, like Final[*form*] (valid only in a variable declaration) or TypeIs[*form*] (valid only in a return type), is considered to be an “annotation form” but not a “type form”.

We already agree that it’s valuable to allow a function parameter to be annotated as TypeForm[...]. A function parameter is a type of variable. It’s not clear to me why it would be useful to specially ban moving a TypeForm[...] into a local variable.

Yes. None is a valid type expression and therefore a valid value for a TypeForm.

I see I didn’t include None as an example in §“TypeForm Values”, so I will add it there to clarify this question.

davidfstr · May 4, 2024, 6:20pm

The second draft of the TypeForm PEP is ready for review.

Notable changes/additions since draft 1

Started using the terms “type expression” and “annotation expression” from the typing specification whereever possible/appropriate, rather than redefining equivalent terms inline.
- “Special form” is still used in a few parts of the PEP, but hopefully in a manner consistent with the definition from the typing specification.
Noted that a TypeForm can hold almost any kind of type expression, but has some exceptions.
Restricted TypeForm to NOT recognize runtime-only representations of type expressions (like ForwardRef(...), GenericAlias(...), and others) since such runtime-only representations are not API (at least at this time).
Tweaked the subtyping rules to hopefully now be sound
Added a new §“How to Teach This” section outlining considerations when defining functions that take a TypeForm as input
Altered the interpretation of a bare TypeForm (lacking an argument like T) so that type checkers may infer the missing argument, rather than always assuming it is Any. This allows reducing repetition in code like StrType: TypeForm[str] = str by rewriting as the shorter StrType: TypeForm = str.
Mentioned that a TypeForm[...] value is itself a kind of TypeForm

I also added one Open Issue, whether the name “TypeForm” is best one to use, since the PEP’s current definition aligns less with the modern definition of “special form” (from the typing specification) and more with a “type expression”.

Let me know what you think.

insilications · May 6, 2024, 11:09am

I just want to thank you for your work on this. This new construct would be a really useful and powerful addition to the typing system.

mdrissi · May 7, 2024, 11:38pm

One related PEP I think is useful to discuss/compare with as it can enable similar functionality as examples here.

def try_cast(form: TypeForm[T], value: object) -> Optional[T]: ...

trycast(Annotated[list[int], "..."], x)

vs

def try_cast[form: T](value: object) -> Optional[T]: ...

trycast[Annotated[list[int], "..."]](x)

What’s pros of doing typeform as a value to function vs as a type argument to a generic subscriptable function? Are there use cases of typeform that can not be written using latter approach?

Main tradeoff I see is this pep feels easier to implement at runtime and is straight forward to backport in typing extensions. PEP 718 requires a grammar change and runtime interpreter changes. To actually implement trycast some runtime dunder/builtin is also necessary to retrieve type argument. So runtime implementation wise TypeForm feels simpler.

On other hand PEP 718 looks simpler type system wise. Generic classes already exist and have mostly defined subscripting rules/behaviors. Re-using those subscripting rules for functions would make smaller type system change and I’d guess would be simpler type checker change.

Jelle · May 8, 2024, 2:20am

It doesn’t; it’s just changes to builtin objects. But it’s still a major change to core functionality.

LtWorf · May 8, 2024, 11:14am

I hope this gets accepted, so it won’t be needed to manually annotate many calls to typedload in that repetitive way that is required now a: list[int] = load(data, list[int])

davidfstr · May 12, 2024, 9:49pm

Mehdi Drissi:

One related PEP [would allow writing:]
def trycast[form: T](value: object) -> Optional[T]: ...

trycast[Annotated[list[int], "..."]](x)
What’s pros of doing typeform as a value to function vs as a type argument to a generic subscriptable function? Are there use cases of typeform that can not be written using latter approach?

PEP 718 does not appear to allow writing def trycast[form: T](...), where the value of T is named form and can be accessed. I only see examples like def trycast[T](...) where T is unnamed.

It is necessary for an implementation of trycast to be able to actually access the object passed for T at runtime, which it cannot do if T isn’t given a name.

mdrissi · May 12, 2024, 10:03pm

PEP 718: subscriptable functions - #42 by Gobot1234 Fair as it’s not in pep at moment. The pep discussion includes comment by author on follow up draft pep to include runtime access of the generic argument. I think that pep is stronger if runtime access is included together but I understand the bigger pep gets the more work it is to implement in one step.

I’m mostly viewing it as in say few years from now, would type system both for static type checkers like mypy and runtime libraries like trycast find it preferable to have TypeForm or to have subscriptable functions with runtime access.

Edit: One note a name though is not really necessary. For generic classes there exists way today to access type arguments with dunders like __orig_class__ which works as type arguments are tuple and the only way to subscript today is positionally.

Edit 2: Also it is possible with 718 by itself to be enough. Use inspect part of standard library to find parent stack frame and extract back out the type argument. That will mostly work although it’ll have interesting edge cases/corners. I’d prefer a more obvious supported way but evaluating forward refs today across namespaces I already sometimes do similar trick of finding parent stack frame and saving it’s globals/locals for later type evaluation.

Edit 3: A few too many edits. def trycast[form: T](...) is already legal syntax today. Although I’d guess it’d probably be written without : T part as that’s normally for bound. That syntax was added by type parameter syntax pep.

mikeshardmind · May 12, 2024, 11:02pm

I think you’re mistaken on this part. The libraries which actually need this right now have differing handling for Annotated from other types, and some of them only support a subset of types. This subset is expressible in the type system.

For instance, pydantic and msgspec both can take something which look like Annotated[T, Callable[[T], bool] and return T (No longer wrapped with other metadata, and should no longer be treated as having those annotations attached by runtime analysis), they each also have (de)serialization for only a subset of builtin types based on reasonable support to a format (json, msgpack, etc)

Additionally, this doesn’t address the other issue with TypeForm[Generic[GenericArgs]],

One such example of this is that typevariables can’t be bound to parameterized generics. Changing this can’t be done without opening the door to full on higher kinded types, and this flattened form is more akin to how ParamSpec works around a similar issue with higher ordered types not being fully expressible.

MegaIng · May 12, 2024, 11:33pm

I don’t think so, at least not without an absurd amount of effort. While it might be possible to correctly define the subset that is applicable at the top level, the fact that generic types can potentially be deeply nested and contain multiple type aliases makes the set of types supported hard to describe. For example, a possible restriction might be “only builtin types, but nested however you want” (this is somewhat close to what e.g. a json verifier (not deserailizer) might accept). How would you express this?

And if your answer is “well, this case then can’t be properly described”, I would argue that these cases that can’t be properly described because of nesting are the vast majority. In fact, I would be surprised if you can find any real world usecases where the top level restrictions you can place with the syntax you are proposing (if I understand you correctly) do much of anything to reduce runtime errors.

(The point against adding syntax for HKTs I somewhat understand. However, I am personally just in favor of adding HKTs )

mdrissi · May 12, 2024, 11:50pm

This much is mostly expressible. Builtin types is finite list for given python version. Take a union of all the non generic types. Recursive type hints are valid so you can have,

BasicTypes = `int | str | ...` # This maybe very long
SupportedTypes = BasicTypes | list[BasicTypes] | dict[BasicTypes, BasicTypes] | ...

I’d still agree to exclude as I think a good description for many libraries is even more messy then that. The library I work with most that would use TypeForm has functionality similar to ABC.register that allows dynamically adding a type as supported. This is very useful for supporting types from other libraries (including even basic types from standard library from my perspective), but already goes beyond type system today (and in general case of registered types dynamically at runtime seems impossible). cattrs is available open source library that similarly allows registration of types like this and I don’t see any reasonable description for cattrs supported typeforms. The rules for which special forms are handled, generics, polymorphic types, and much more also sounds difficult to describe. There was a while initially where library I worked on had trouble dealing with child classes and maybe,

class A:
  ...

A here is supported, but

class A:
  ....

class B(A):
   ...

B was not supported which breaks a fairly fundamental typing rule already.

davidfstr · May 12, 2024, 11:55pm

FWIW, the proposed handling of TypeForm in combination with Annotated[...] is already defined to work that way, stripping out the metadata component:

count: int | str = -1
if ismatch(count, Annotated[int, ValueRange(1, float('inf'))]):
    assert_type(count, int)  # NOT: Annotated[int, ...]
else:
    assert_type(count, int | str)

Sorry I didn’t understand this bit at all:

TypeForm[Generic[GenericArgs]] isn’t valid syntax because Generic[...] isn’t a type expression (or an annotation expression for that matter).
RE “typevariables can’t be bound to parameterized generics” I’d have to see an example to understand what you mean. And how is this limitation on type variables related to TypeForm?

davidfstr · May 13, 2024, 12:03am

For anyone reading this thread, I’d like to highlight that some members of the runtime type checker community (in particular) have given feedback on draft 2 of this PEP on an older mypy GitHub thread starting after this message.

Although it may be worth reading what is on that other thread, I encourage folks to continue responding on this Discourse thread so that future PEP reviewers won’t have too many discussion venues to review.

mikeshardmind · May 13, 2024, 12:27am

Generic there wasn’t meant to be typing.Generic, just some Generic form which was valid within type form, I could have used a more careful term, but I thought the context would have made it obvious that I was not specifically speaking about typing.Generic, and only about something which had generic parameterization as a type form.

I can work up an example later, but this doesn’t cover all of the same cases which runtime users were currently (mis)using type for, and which those uses were told to wait for type form, partially because of…

The extra level of nesting introduced for these cases to be “valid” changes where the parameterization would need to exist and be supported in cases that were ruled in Is `Annotated` compatible with `type[T]`? as “misuse, and needing to wait on typeform”

I’m also in favor of adding HKTs, What I’ve said shouldn’t be read as not to add HKTs, but that if we don’t have them in full, we can still scope a set of rules here that allows a very narrow use that retains something users had a use case for, and arguably had those cases working in their type checkers already.

mdrissi · May 13, 2024, 12:28am

So I’m inclined to agree that TypeForms probably shouldn’t allow stringified annotations since they’re basically impossible to work with robustly at runtime.

I’m quoting from the issue, but I disagree here especially for earlier python versions like 3.8/3.9 but any case where from __future__ import annotations is used. It is fairly common in library I work with for strings/forwardref to be passed. It’s annoying to handle them but very possible for most situations. Core thing is eval (or a function that eventually calls eval) + tracking right globals/locals. I currently do this by having decorator save the parent stack frames namespaces for later usage as needed. This does not cover 100% of cases, but it mostly works and I’ve added couple tricks for harder cases.

Even with recent python versions, for recursive types, part of type generally comes as a forward reference/string.

edit: I also see reading more comments this point goes back/forth.