Calling get_annotations(obj, format=Format.AST) gives a dict[str, ast.expr] but your create_annotate_from_asts function requires each expression also have an associated Mapping but it’s not clear where this is supposed to come from or what exactly it should be?
Presumably it’s related to the context in which names in the ast expression should be evaluated?
I’m also not sure how I would be intended to use this if I need to add objects that have not been obtained through get_annotations to the new annotations? Things like the return None value for dataclasses’ __init__ or a dynamically created TypedDict for an as_dict method[1].
I’m actually curious also to how memory usage would compare to capturing the original source text of the annotations. I assume that this wasn’t done originally for memory use reasons too, but it does have the unfortunate side effect that call_annotate_function(cls.__annotate__, format=Format.STRING) is significantly[2] slower than call_annotate_function(cls.__annotate__, format=Format.VALUE).
Speaking of STRING, I don’t like that this implementation means that the AST format can construct a more accurate STRING than the STRING format itself.
There appears to be a bug in the reference implementation with how the AST is constructed in the reference as op in a BinOp should be an instance of the operation and not the class but with a little editing…
from annotationlib import get_annotations, Format
import ast
class Example:
a: 1 | 2 | 3
a_anno = get_annotations(Example, format=Format.AST)['a']
# ast.unparse will fail as this is currently the class and not an instance
print(f"{a_anno.op = }")
# Editing in post
a_anno.op = ast.BitOr()
a_anno.left.op = ast.BitOr()
print(ast.unparse(a_anno))
print(get_annotations(Example, format=Format.STRING)['a'])
Output:
a_anno.op = <class 'ast.BitOr'>
1 | 2 | 3
3
In the current implementation, dynamically created TypedDict instances themselves can’t support this format. ↩︎
About 50x slower in a direct test on call_annotate_function, noticeable as about a 40% performance hit on constructing dataclass-like classes when using STRING instead of VALUE. ↩︎
It was my initial idea yes, <...> would create an inline type expression, and this PEP only introduces inline TypedDicts for now (but to avoid “reserving” the <...> syntax only to TypedDicts, I proposed having potential extensions for e.g. for tuples with <(...)>). Then @Jelle proposed in th thread that <...> would be an explicit way of exposing the AST.
I just read the draft PEP, thanks for the work put into it. Here is some feedback:
Although this is not the main goal of the PEP, I feel like (as mentioned a couple times in this thread) it would make more sense to have the more complex syntax (inline TDs, conditional type expressions) as motivating examples. Some of the current ones in the PEP can also lead to confusion (e.g. dict[int, str] → {str: int} will be too similar to the potential inline TD syntax – {'key': int}; getting rid of Literal for string keys clashes with string annotations).
I’m not too sure about having a separate Format.AST value. Up until now, users only had to call get_type_hints() without having to worry about the syntax being used. With the proposed changes, what would be the use case to call get_type_hints() with Format.VALUE, given that it could potentially raise any kind of runtime error (e.g. if we assume bare literal are now allowed, 1 | 'some_string' would raise if Format.VALUE gets used)? I feel like the default get_type_hints() (and maybe get_annotations()) behavior should be to return the “constructed” annotations directly (e.g. 1 | 'some_string' gets converted to Literal[1, 'some_string']).
With the previous point in mind, to be able to properly evaluate type annotations, get_type_hints() would need unconditionally[1] to convert the annotation to AST types, to then build the proper typing constructs from it. Memory usage was mentioned in some other feedback, but I’m also curious to know how performance would be affected as well.
Even for existing syntax (e.g. list[int]) that wouldn’t require the intermediary AST step. ↩︎
I’m still working on getting all the details right and rewriting the PEP, I’ll report back when I have more details ready. But I can share my current thoughts on the things you mentioned:
In my mind the more advanced future features also are the primary motivation of this change. I just avoided them since I didn’t want to throw in the cognitive load of entirely new features (and their motivation/use cases) with the syntax stuff of this proposal. But from all the feedback so far, that does seem to be the correct approach and I can totally get why. I’ll rewrite large parts of the motivation of the PEP to use better examples.
However, I disagree on the second point. We can’t (from how I understand the backwards compatibility guarantees) really change the behaviour of __annotate__, typing.get_type_hints or annotationlib.get_annotations. While 1 | 2 (or some fancy typed dict literal) isn’t currently legal from the typing spec’s POV, it is an entirely valid annotation in the Python runtime. So even if most people use annotations for type hints, we can’t just assume that every annotation is a legal type annotation and change the behaviour of all annotations.
That’s why I went to the workaround of the AST format, it lets users opt-in to typing specific syntax and then we can “break” non-typing annotations. I also really dislike my initial proposal of adding a weird keyword argument to typing.get_type_hints, but we can’t just change its behaviour either. We also cannot change annotationlib.get_annotations, both because of backwards compatibility and because we want to be able to backport annotations via typing_extensions. I would like to also really avoid a situation where we have three (or even more) different “please give me the stuff after the colons” functions.
My current thinking is that the best solution is to add a new typing.get_type_annotations (or whatever other new name, everyone please feel free to bikeshed here) helper that explicitly uses the AST format to create typing-specific annotations and to deprecate typing.get_type_hints. The intermediate period will be annoying since we have two very similar functions in the same module and one of them will be essentially useless. But we do have the advantage that most usages of these helpers occur in libraries rather than user code directly. The people that tend to write those libraries also tend to be more plugged in to new features and changes, so we have a greater chance of them seeing which function is the “correct” one.
Once the deprecation period is over, we’ll have a nice split of responsibilities: if you want to just get whatever arbitrary annotations exist in an object, you use annotationlib.get_annotations, but if you want type annotations specifically, you use typing.get_type_annotations (or the typing_extensions backport). This difference in use cases and behaviour will also be nicely reflected in their names and module placements.
That is exactly the behaviour I’m proposing. That is, calling get_type_annotations(func, Format.VALUE) doesn’t actually ever call func.__annotate__(Format.VALUE) but rather func.__annotate__(Format.AST) and resolves it using typing specific semantics. The updated PEP will explain this much better than it currently does.
I haven’t done benchmarking yet, but I do expect the performance to be worse than with the current Format.VALUE. Unfortunately, that’s more or less guaranteed since you can’t really beat having everything precompiled into bytecode and evaluated by the interpreter itself. If performance ends up being an issue, we could also think about not using ast.AST objects directly in order to skip their construction and/or write the evaluating code in C, but those have their own issues too.
On the topic of memory usage, I suspect that the impact will be reasonably small. The AST is stored as tuples which almost exclusively contain statically allocated strings and integers, so it really shouldn’t be more than a handful of pointers per annotation. There’s also a lot of room to optimize this more, my current implementation just dumps the AST nodes but a lot of their fields are redundant and/or represented inefficiently.
It might make sense to have a separate third-party library that evaluates annotations in C (or Rust?) for speed, so we can keep the standard library implementation relatively simple but users like Pydantic that care about speed can use a faster version. Such a library could also allow backporting new syntax to earlier versions of CPython.
I wouldn’t be so sure. Memory usage was a major motivation for PEP 563 and 649: people were noticing significant memory usage from typing objects, and those weren’t huge either.