Something that comes up occasionally is that type annotations are somewhat verbose and limited in functionality. For example, the recent discussion around an inline typed dict syntax is heavily constrained by the fact that raw dicts don’t work in annotations because they e.g. define the | differently than what you’d need or that inheritance through {"new_key": int, **OldDict} would invoke the usual iteration protocol on OldDict, which isn’t iterable. Similar issue exist for callable types and concepts that currently aren’t in Python like mapped or ternary types.
On the verbosity side, people that are new to typed Python code often expect to spell the tuple[int, str] type simply as (int, str) or lists as [int]. Using list/tuple/etc. constructors like that also leads to shorter and more readable annotations. Of course, this point is less critical since it’s “just” an arbitrary syntax choice, but going off of my experience there are a lot of people that are confused by the current syntax, especially when it comes to more complex types like numpy’s ndarray[tuple[Literal[1], Literal[2], Literal[3]], ...]. Verbosity and readability also was a big point of consideration in the discussions about encoding literal arithmetic for things like array dimensions earlier this year.
In general, the reason why we can’t assign arbitrary meaning to expressions in annotations is that the Python interpreter doesn’t differentiate between expressions occurring e.g. in the middle of a function body and ones in type annotations. That is, when it encounters a: (int, str) = (int, str) it treats both tuples as exactly the same even though we conceptually might think of the left as referring to the type of tuples with certain entries and the right as a particular tuple containing two class objects. In theory, type checkers could just start treating the left object as referring to that type, but this would conflict with runtime introspection of these annotations. If e.g. someone writes a: (int, str) | (str, int) to refer to the union of two tuple types, the annotation cannot be inspected at runtime since the tuple type does not implement the | operator.
Jelle Zijlstra also gave a detailed rundown of some of these issues in an article.
My proposal is that instead of creating new syntax that gets around these issues, we set up a mechanism that lets us change the meaning of currently existing syntax when used in annotations. That is, if (int, str) is written in an annotation the generated bytecode won’t create an actual tuple object but the same instance of GenericAlias that writing tuple[int, str] currently does. This mechanism could, for example, then also emit bytecode that creates a TypedDict class when you write {"new_key": int, **OldDict} instead of creating a dict and trying to iterate over OldDict.
Implementation
Currently, defining a function
def my_func(a: int, b: [str]): ...
creates a function object whose __annotate__ looks (for the purposes of this idea) essentially like this:
def __annotate__(format, /):
return {
"a": int,
"b": [str],
}
I.e. the annotations just are collected into dict, which leads to them being evaluated just like any other expression. My idea is to change these functions to the equivalent of this:
def __annotate__(format, annotations_as_types=False, /):
return {
"a": int,
"b": list[str] if annotations_as_types else [str],
}
That is, the bytecode in annotation functions (and the similar methods on TypeAliasType objects) is modified to either create the usual objects the expressions evaluate to or type annotation specific objects instead. For most expressions, the created code would be identical, but for ones that have type-specific meaning, there would be two code paths generated. To keep things somewhat intuitive and easy to reason about, the specific bytecode should always create some typing object that encodes the syntactic construct and contains the objects its subexpressions evaluate to.
The exact list of these specific expressions isn’t the core idea I want to throw into the ring, but an initial list could be something like this:
| Annotation Expression | Equivalent Expression |
|---|---|
| (expr1, expr2, …) | tuple[expr1, expr2, …] |
| [expr] | list[expr] |
| {expr} | set[expr] |
| {expr1: expr2} | dict[expr1, expr2] |
| int, bool or None literal | Literal[expr] |
None of these offer terribly exciting new capabilities, but they are reasonably straightforward to implement (except for tuples, which would require a small modification to the data stored in the AST) and should give enough of an idea whether this change is something the langauge even wants to begin with. If that is the case, people could then work on getting actually new functionality like typed dict literals into the type system using the new syntactic capabilities.
Potential Issues
Even though this repurposes existing syntax to create completely different objects, it is fully backwards compatible. Since the new codegen only is used if the optional argument to the annotation functions is set, any existing code will behave exactly the same. Further, even if libraries that do runtime introspection start setting the flag, any existing type annotations are evaluated exactly the same, since the only expressions whose meaning is changed are ones that currently aren’t valid type annotations.
The only case where things break is when someone uses annotations for something other than types and these then are interpreted by other code that expects types. But such code already is broken since the currently generated objects aren’t valid types. Going purely off of my personal experience, it’s also exceedingly uncommon to intentionally use annotations for things that aren’t types and even rarer to do so while also using a library that introspects type annotations. But even in that case, it’s fairly easy to e.g. use a decorator that forces the annotation function’s flag argument to be false.
There also is the hickup that we sometimes want to create type objects in places other than annotations, for example when using cast(SomeType, value). Of course, the compiler cannot differentiate this from any other function call, so it must generate the normal bytecode for it. One possible workaround is to first create a type alias (which can use the type-specific syntax) and then use that in the function call. We could also add a new function to the standard library that evaluates a string as though it was an annotation, returning a TypeAliasType-like object.
Closing Thoughts
I might be barking entirely up the wrong tree here, but the syntax limitations in Python’s type system due how it is handled at runtime has been something that’s been on my mind for a while now. I’ve seen a handful of ideas that got stuck not on their own merits but rather on technicalities of Python expression syntax and semantics, which seems unfortunate. And so far, most of those discussions had the tone that that’s just an unfortunate reality of how Python’s type system evolved. So my goal is to now show that we could reasonably change this and to start the discussion over whether that actually is something that the community wants.