Idea: Simpler and More Expressive Type Annotations

Something that comes up occasionally is that type annotations are somewhat verbose and limited in functionality. For example, the recent discussion around an inline typed dict syntax is heavily constrained by the fact that raw dicts don’t work in annotations because they e.g. define the | differently than what you’d need or that inheritance through {"new_key": int, **OldDict} would invoke the usual iteration protocol on OldDict, which isn’t iterable. Similar issue exist for callable types and concepts that currently aren’t in Python like mapped or ternary types.

On the verbosity side, people that are new to typed Python code often expect to spell the tuple[int, str] type simply as (int, str) or lists as [int]. Using list/tuple/etc. constructors like that also leads to shorter and more readable annotations. Of course, this point is less critical since it’s “just” an arbitrary syntax choice, but going off of my experience there are a lot of people that are confused by the current syntax, especially when it comes to more complex types like numpy’s ndarray[tuple[Literal[1], Literal[2], Literal[3]], ...]. Verbosity and readability also was a big point of consideration in the discussions about encoding literal arithmetic for things like array dimensions earlier this year.

In general, the reason why we can’t assign arbitrary meaning to expressions in annotations is that the Python interpreter doesn’t differentiate between expressions occurring e.g. in the middle of a function body and ones in type annotations. That is, when it encounters a: (int, str) = (int, str) it treats both tuples as exactly the same even though we conceptually might think of the left as referring to the type of tuples with certain entries and the right as a particular tuple containing two class objects. In theory, type checkers could just start treating the left object as referring to that type, but this would conflict with runtime introspection of these annotations. If e.g. someone writes a: (int, str) | (str, int) to refer to the union of two tuple types, the annotation cannot be inspected at runtime since the tuple type does not implement the | operator.

Jelle Zijlstra also gave a detailed rundown of some of these issues in an article.

My proposal is that instead of creating new syntax that gets around these issues, we set up a mechanism that lets us change the meaning of currently existing syntax when used in annotations. That is, if (int, str) is written in an annotation the generated bytecode won’t create an actual tuple object but the same instance of GenericAlias that writing tuple[int, str] currently does. This mechanism could, for example, then also emit bytecode that creates a TypedDict class when you write {"new_key": int, **OldDict} instead of creating a dict and trying to iterate over OldDict.

Implementation

Currently, defining a function

def my_func(a: int, b: [str]): ...

creates a function object whose __annotate__ looks (for the purposes of this idea) essentially like this:

def __annotate__(format, /):
    return {
        "a": int,
        "b": [str],
    }

I.e. the annotations just are collected into dict, which leads to them being evaluated just like any other expression. My idea is to change these functions to the equivalent of this:

def __annotate__(format, annotations_as_types=False, /):
    return {
        "a": int,
        "b": list[str] if annotations_as_types else [str],
    }

That is, the bytecode in annotation functions (and the similar methods on TypeAliasType objects) is modified to either create the usual objects the expressions evaluate to or type annotation specific objects instead. For most expressions, the created code would be identical, but for ones that have type-specific meaning, there would be two code paths generated. To keep things somewhat intuitive and easy to reason about, the specific bytecode should always create some typing object that encodes the syntactic construct and contains the objects its subexpressions evaluate to.

The exact list of these specific expressions isn’t the core idea I want to throw into the ring, but an initial list could be something like this:

Annotation Expression Equivalent Expression
(expr1, expr2, …) tuple[expr1, expr2, …]
[expr] list[expr]
{expr} set[expr]
{expr1: expr2} dict[expr1, expr2]
int, bool or None literal Literal[expr]

None of these offer terribly exciting new capabilities, but they are reasonably straightforward to implement (except for tuples, which would require a small modification to the data stored in the AST) and should give enough of an idea whether this change is something the langauge even wants to begin with. If that is the case, people could then work on getting actually new functionality like typed dict literals into the type system using the new syntactic capabilities.

Potential Issues

Even though this repurposes existing syntax to create completely different objects, it is fully backwards compatible. Since the new codegen only is used if the optional argument to the annotation functions is set, any existing code will behave exactly the same. Further, even if libraries that do runtime introspection start setting the flag, any existing type annotations are evaluated exactly the same, since the only expressions whose meaning is changed are ones that currently aren’t valid type annotations.

The only case where things break is when someone uses annotations for something other than types and these then are interpreted by other code that expects types. But such code already is broken since the currently generated objects aren’t valid types. Going purely off of my personal experience, it’s also exceedingly uncommon to intentionally use annotations for things that aren’t types and even rarer to do so while also using a library that introspects type annotations. But even in that case, it’s fairly easy to e.g. use a decorator that forces the annotation function’s flag argument to be false.

There also is the hickup that we sometimes want to create type objects in places other than annotations, for example when using cast(SomeType, value). Of course, the compiler cannot differentiate this from any other function call, so it must generate the normal bytecode for it. One possible workaround is to first create a type alias (which can use the type-specific syntax) and then use that in the function call. We could also add a new function to the standard library that evaluates a string as though it was an annotation, returning a TypeAliasType-like object.

Closing Thoughts

I might be barking entirely up the wrong tree here, but the syntax limitations in Python’s type system due how it is handled at runtime has been something that’s been on my mind for a while now. I’ve seen a handful of ideas that got stuck not on their own merits but rather on technicalities of Python expression syntax and semantics, which seems unfortunate. And so far, most of those discussions had the tone that that’s just an unfortunate reality of how Python’s type system evolved. So my goal is to now show that we could reasonably change this and to start the discussion over whether that actually is something that the community wants.

12 Likes

Nice writeup!

One issue is that sometimes type expressions occur in places that aren’t special to the interpreter (e.g., the arguments to cast() and assert_type(), type arguments to base classes). We’ve successfully replaced many of those, but there are still a few left.

My broader issue is that this approach requires us to hardcode support for specific patterns, so flexibility is still limited and a lot of complexity is pushed into the compiler.

As an alternative, I’d like to suggest something that exposes the AST plus some way to resolve names in the appropriate scope. A way that might look could be a new Format.AST, in which __annotate__ returns a pair of the function AST and some special object that you can feed names to resolve them in the context of the original function. That way, runtime type checkers can use the raw AST to evaluate types in whatever way they prefer.

4 Likes

This could be potentially solved with subscriptable functions.
For example:

cast(Foo, obj)
assert_type(obj, Foo)

The above could be rewritten as:

cast[Foo](obj)
assert_type[Foo](obj)

That doesn’t make it special for the compiler, so this doesn’t solve the problem at all.

Regarding cast and similar functions: I think the best solution would be to add a function that evaluates a string using the same rules that we’d use for type annotations. So then you’d write cast(type_expr("SomeAnnotation"), value). Unfortunately, this would lead to yet another function that type checkers have to special case.

I’ve thought about exposing the AST too and it also has some other advantages. It would allow us to apply the transform depending on what symbols actually evaluate to, e.g. turning SomeEnum.Value into Literal[SomeEnum.Value] while leaving non-enum values alone. Also, it would actually guarantee that Annotated data is left as-is without relying on further magic.

But does that actually give us more flexibility? Presumably there would be a utility function in the types or annotation modules that takes the AST and symtable info and gives you the actual types. That function would have the same constraints on backwards compatibility and changes that the compiler itself has, right? We could work around that by putting the function in typing_extensions, but I think that there definitely should be some official way of producing the actual type objects. Otherwise there’d be a huge burden on any consumer of type annotations and we’d risk further fragmentation of the type system.

1 Like

I see this more a problem than a solution as it implies having a syntax that is both context dependent and redundant (with already list[str] and the old List[str]):

list[int] # always a types.GenericAlias
list() # always a list
[int] # list with an item, but would be types.GenericAlias  or list depending on context
list # a type / GenericAlias?
List # typing._SpecialGenericAlias (old syntax, also redundant now with list)

If it would solve a bunch of problems, maybe I would like it , but is it the case? it barely saves 4,5 characters.

I do agree though that the usability of both ndarray types and Literals are a horrible pain. but not sure how would this solve it? One of the big problems in ndarrays are shapes that doesn’t look to be solved by this, and the fact that typically you don’t have literals, but a wide variety of numpy-specific floating types that relate more or less with built ins.

Also parts of this pains are addresed already by aliases, so they are not so painful:

type OneTwoThree = tuple[Literal[1], Literal[2], Literal[3]]
type Ndarray = ndarray[OneTwoThree, …]

We’d want a standard library implementation, but I think it’s still preferable if it’s done in Python code rather than baked into the interpreter. That way, new features could be backported in typing-extensions, and third-party libraries could also add their own extensions.

2 Likes

Hm, right. I immediately thought of cattrs.structure([], list[int]).

I agree that saving a couple characters on a tuple or list type isn’t super important. This proposal is more about making it possible that future additions to the type system can happen. For example, using the current syntax there isn’t really any good way to make inline typed dicts work with things like inheritance. Being able to just write {"some_key": str, **ExistingDict} would genuinely add new capabilities rather than just saving some keystrokes. Similarly, mapped types like {Key: NotRequired[Value] for Key, Value in ExistingDict} or conditional types like type FancyList[T] = list[int if issubclass(T, int) else str] cannot really be written in another way.

Also, while I totally see that it’s weird that the same syntax produces different objects depending on context, I don’t think that’s as big of an issue as it first seems. When we see list[int] we generally never think about that as an instance of GenericAlias, we just think about the type it expresses. Most users also never actually look at the runtime objects that their annotations generate themselves, usually only libraries do that. I think that as long as the type specific meaning of each piece of existing syntax stays close enough to the expected meaning, very few people will actually be surprised that it can generate different runtime objects.

4 Likes

As a huge fan of typing who finds it no longer reasonable to write/read Python code without type hints even for small projects, I 100% support this idea.

I personally believe at this point in Python typing evolution it should have become fairly obvious that from a design and UX standpoint making type hints a DSL-like mini-language that is parsed differently by the interpreter is an absolute necessity, and not doing this from the start was and still continues to be a mistake driven IMO by people who are antagonistic towards typing in general. Continuing on the current path will necessitate more and more band-aids to prop up the current technically-not-a-DSL-but-more-alien-than-a-DSL-would-be approach and add further features to typing.

Sticking to pure Python syntax severely limits what can be done with typing. Having type hints be parsed as normal Python objects benefits the tiny number of people doing introspection on type hints[1] while hurting the vast majority of people who write and read them. It is ironic that keeping standard syntax forced us to work with this parallel not-quite-DSL where on the most basic level everything’s alien to Python: tuple[a, b] instead of (a, b), dict[a, b] instead of {a: b} and so on. While an actual DSL with different semantics for many syntax elements would be more natural to Python UX-wise.

We went from e.g. List[Tuple[Union[str, int], Optional[int]]] to list[tuple[str | int, int | None]] and it’s still too unwieldly to use without moving it into a type alias [2]. Being able to write [(str | int, int | None)] should in my opinion be the non-negotiable end goal, and minor technical obstacles on that path should not be seen as total dealbreakers but as places where temporary imperfect solutions are tolerated, thus allowing for incremental progress. For example, in cases where one has to use type hints in Python context like in cast() the old (current) syntax can continue to work until a more natural solution is found.

The main problem with this is IMO social: this is a long-term high-level design decision, and the way decision-making works here one needs to have a fully-formed (basically perfect) implementation and a substantial buy-in from the SC/TC to have any chances of getting through the discussion phase, and even then the chances for such a high-level change is slim. And for typing such changes have a premium of the opposition from a number of people who want to keep typing isolated from the rest of the language and who would loath to increase the complexity of the interpreter for stuff they don’t use.


  1. and as the OP suggests this benefit can be preserved by going the DSL way ↩︎

  2. while a good practice, one shouldn’t be forced to do this every time by the unwieldiness of the current syntax ↩︎

5 Likes

I have a bit more of a nuanced view on this. While I agree that the expressiveness of Python’s type expressions is currently severely hampered by the lack of a proper DSL, I disagree that the lack of the ability to write [...] instead of list[...] is such a big issue. While these hints are very common, you still have to write a lot of hints that don’t use them, especially in function parameters, if you want to be a good Python citizen and support duck typing.

I fear that making these builtins even more alluring compared to duck typed hints, will cause people to write hostile APIs that only accept very narrow types, because it was easier to write that way. I’m sure that to a degree, that’s already an issue, but it would be further exacerbated by making the hostile type hint significantly shorter, than the one you should be writing instead.

I would definitely like to see a DSL for more advanced typing features, to make them less cumbersome to write and easier to read and understand. Callable and signature types are very high on that list. I also think it’s holding back progress on features that have been requested many times like mapped types.

11 Likes

If there was a DSL, [int] could be same as Sequence[int]. That would avoid an import, and in cases where you needed a list, you could still write list[int]. That could be confusing, though, and you anyway needed import and use Iterable if you wanted even more flexible typing. Simply having the most commonly used ABCs available as builtins could have more benefits in this regard and certainly would be easier to implement.

1 Like

I do concede that this is a much more far-reaching and consequential benefit of this idea than terseness and more natural and consistent syntax for builtins, so I agree it’s better to focus on that. I still think the alien nature of the status quo for builtins is a major symptom of the limitations that block those improvements, and while it would (should) go away with a new DSL it shouldn’t be the priority but a welcome byproduct.

(I've moved the rest of my post here as it's a minor point that I wanted to make and I don't want to derail this promising conversation any further)

My issue with such paternalistic arguments (“we can’t do X because people can’t be trusted not to abuse it”) is that they are by nature speculative and unprovable. Who are we talking about?

People who already use duck-typed hints and will stop doing so because one can now write [int] instead of list[int] - they do not exist. They already value Postel’s law over terseness.

People who don’t use them either don’t care about/don’t need their benefits or don’t know about them. The former would be unaffected by the change. The latter will mostly switch once they learn why and where it’s needed because e.g. Iterable[int] has practical advantages over both list[int] and [int].

So the paternalistic concern narrows down to… people who can potentially start using duck-typed hints but would be seduced by this change to stick with restrictive types? IOW, those who can forgo 15% terseness for duck-typing benefits but 30% terseness is just too irresistible. I believe this is a tiny group that still falls into the category of ‘not caring about duck typing’, and their strange priorities shouldn’t block progress.

1 Like

Am I understanding correctly that A[(B, C)] would be distinct from A[B, C] in the DSL even if in ordinary Python they are equivalent?

I was also thinking that, but on the other hand it’s not great if you have to explain to newcomers: “to annotate a tuple use (str, int), except in cast() where you have to use tuple[str, int]”.

I think I may have muddled the waters a bit in my outline. I think there’s roughly three parts to this general idea:

  • “making types a DSL”, that is, decoupling the syntax we use for types from the usual runtime value restrictions.
  • simplifying builtin types like tuples, lists, etc.
  • adding more complex type expressions for e.g. inline typed dicts, mapped types, conditional types, etc.

What my proposal is primarily about is the first point and it’s mainly motivated by the more complex type annotations it would enable. While I personally do like the simpler builtin notation, I fully agree that it alone doesn’t justify this change. Rather, I see it as a nice benefit that we get for (almost) free if we decide to implement a system like this for the other reasons.

To give everyone a bit more of a concrete idea of what I’m talking about, here’s some off the cuff examples that this system would enable (with the caveat that none of this is concrete syntax or semantics that I am directly proposing, just rough example ideas):

We could use the familiar dict syntax to specifc that a value should look mostly like an existing dict with one additional key:

def some_func(val: {"some_key": int, **OtherDict}) -> None: ...

Or we use the comprehension to copy a typed dict with some modifications:

type UserEdit = {Key: NotRequired[Val] for Key, Val in User}

With conditional types we could give the correct types to e.g. a function that unwraps arbitrary levels of nested lists:

type Flattened[T] = Flattened[T] if issubclass(T, Iterable) else T
def flatten_fully[T: Iterable](nested_iter: T) -> Iterable[Flattened[T]]:
    for elem in nested_iter:
        if isinstance(elem, Iterable):
            yield from flatten_fully(elem):
        else:
            yield elem

We could add type-level arithmetic and track things like matrix shapes during computations:

def matmul[M: Number, N: Number, K: Number, E: ElemType](first: ndarray[tuple[M, N], E], second: ndarray[tuple[N, K], E]) -> ndarray[tuple[M, K], E]: ...

def concat[A: Number, B: Number, C: Number, E: ElemType](first: ndarray[tuple[A, B], E], second: ndarray[tuple[C, B], E]) -> ndarray[tuple[A + C, B], E]: ...
5 Likes

@Jelle suggested a similar approach to expose the AST in Inlined typed dicts and typed dict comprehensions - #13 by Jelle, in this case by having <...> as an explicit marker that would return an object allowing AST introspection.

Here are some thoughts about this approach of exposing the AST:

In both cases, new syntax for the new DSL can’t be used because the AST is being exposed (and new syntax would require.. new AST nodes).

Pros/cons of using the explicit <...> markers:

  • + Can be used everywhere, not only in type annotations (e.g. cast(<...>, var)).
  • - Renders quite nice (imo) for inline dictionaries (and maybe tuples): <{k: int}>, <(int, str)> as my proposal was only scoped to typed dicts; but doesn’t fit well for a general purpose [1].
  • Requires a syntax change.

Pros/cons of having the DSL “implicit”:

  • + No syntax change required.
  • - Cannot use it outside type annotations. There will be some confusion for users to discover that {K: int for K in ...} can be used as an annotation in, say, a function param. annotation, but not as a argument to Pydantic’s TypeAdapter or cattrs’s structure(). Apart from the third party runtime checkers, this would be more or less an issue. Lets check the valid type expressions locations:
    • cast() and assert_type() are fine, they are meant for static type checkers and the typ argument is discarded.
    • PEP 695 type vars bounds/constraints/defaults and type aliases should be fine as well, they all have an __evaluate__() implementation.
    • The type arguments of a generic class, TypedDict’s extra_items and the base type in the definition of a NewType are introspectable at runtime, and as such couldn’t make use of the new DSL; unless the annotation is stringified/a type_expr(typ: str) helper (or using the <...> marker?) is used.

I personally support this AST approach and having it implicit. I also support @Daverball’s opinion that the new DSL would be for more advanced features. Imo the limitations of the last bullet points would be pretty uncommon, especially if the DSL is only for advanced typing features.


This new AST annotation format may be disruptive though. Until now get_type_hints()/get_annotations(format=Format.VALUE) is the recommended way to get annotations at runtime. Apart from the possible NameError that can be raised, any runtime exception could happen with Format.VALUE/FORWARDREF [2]. I guess to be specified is whether the Format.AST only returns the AST representation (and the stdlib provides utilities to convert it to proper typing objects [3]), or it takes care of converting to the actual typing objects directly.


  1. e.g. for a literal type using the | operator, it feels weird having to use <'string_lit' | 1 | True> ↩︎

  2. e.g. {'key': int} | int ↩︎

  3. e.g. from {'key':int} | int to SomeEquivalentTDClass | int ↩︎

4 Likes

That suggests a third approach, a mixed one where the DSL is implicit in type hints but explicit with markers outside of them. This would keep the benefit of not requiring noisy markers in the most common use case.

OTOH this has the same con as the implicit approach: is it even possible for a DSL to be backwards compatibile without markers? Can the DSL theoretically be written in such a way as to support existing type hints (and without it becoming a burden from the start on the greenfield system)?

1 Like

I’m not sure what you mean by that, do you mean that the new DSL (left unspecified for now), used in conjunction of the existing syntax, could lead to syntax ambiguities? If so, I think it would be prevented by the fact that there’s currently a grammar for annotation and type expressions, so presumably any possible ambiguity would be caught when updating the grammar to account for new syntax added to the DSL.

1 Like

IMO this is another good motivating example for macros.

If a macro version of this existed, a few things could be done:

  • This could be prototyped first without the need for core language changes, decoupling it from the slow release cycle of python.
  • Type checkers might just need to be able to run macros - no need to actually understand the new syntax at the beginning, the translation to familar terms would be handled somewhere else.
  • This would be useful long-term for locations outside of type-annotations.
  • Fewer assumptions need to be backed into the language. Instead python could expose a “annotation-transformation-hook” that points to a macro that is applied. (this would have a performance cost, so it might not be worth it)
3 Likes