100% agree. I always have to alias Literal as sz for numpy\
Iāve been thinking about this approach more and been working on a dummy implementation to see how things shake out. Overall I think the AST based approach is really useful, but thereās some annoying wrinkles to work out. Hereās the overall idea I have in mind now:
Thereās a new value AST added to the Format enum. And the auto-generated __annotate__ functions now look like this:
def __annotate__(format):
if format == Format.VALUE:
return {
"a": int,
"b": str | int,
}
if format == Format.AST:
import ast
return {
"a": ast._build((26, "int", 1))
"b": ast._build(...) # some more complicated tuple
}
That is, if the AST format is specified we build essentially the same tuple, but instead of actually evaluating the values we build the ast from a constant tuple that contains the necessary data.
And finally, we add a two new helper functions to the typing module. The first calls an annotate function with the AST format, modify the returned AST in some fashion to support future typing semantics and then resolve that AST to a value/string/forwardref. The second uses the same mechanism to evaluate a string using type expression semantics.
If users then want to get type annotations, they can use the first helper method and get the same kind of results as with the annotationlib functions, but the annotations can use typing-specific semantics. And if users want to use these semantics somewhere other than in an annotation or type alias, they can write eval_type("SomeTypeExpr"). In most cases users wouldnāt even have to explicitly call the eval function since library functions that expect type forms often already support stringified versions.
I donāt think that itās necessary to return a new kind of object that can look up names in the annotated objectās scope since we can get (most) of that info from the annotate function. The relevant global namespace is stored in the dunder and since the annotations themselves canāt define any locals, the only relevant ones are cell vars, which also are stored in a dunder. Of course, this has the weakness that we canāt look up names from stringified annotations. Iām not sure if there is a need to do that though since AFAIK the need for them is gone with delayed annotations. So if a user is having trouble with these name lookups, they should be able to just un-stringify the annotations and everything works.
Since most of the negative feedback here has revolved around the specific new semantics I originally proposed, Iād now limit this to just the general ideas and mechanics of typing specific semantics. There seems to be broad support for that idea even without an immeadiate payoff for builtin types or similar.
Being able to evaluate annotations in a typing-specific context could iron out some of the subtle differences that already exist depending on syntax.
For example, the union syntax and typing.Union donāt produce the same objects at runtime when one of the items in the union is unknown and so annotationlib doesnāt assume the syntax implies a union. If these were evaluated with that assumption then it would be possible to get the actual union in both cases.
>>> from annotationlib import get_annotations, Format
>>> from typing import Union
>>> from pprint import pp
>>>
>>> class Example:
... syntax: str | unknown
... subscript: Union[str, unknown]
...
>>> pp(get_annotations(Example, format=Format.FORWARDREF))
{'syntax': ForwardRef('__annotationlib_name_1__ | unknown', is_class=True, owner=<class '__main__.Example'>),
'subscript': str | ForwardRef('unknown', is_class=True, owner=<class '__main__.Example'>)}
If extra formats were added to annotate I would however like if they could also solve the issue I brought up before with regards to the need to create new __annotate__ functions[1].
like the one in dataclasses where I accidentally broke certain dataclasses in 3.14.1
ā©ļø
I wasnāt familiar with this issue before so I hope Iām understanding it correctly now. Itās that autogenerated functions like the dataclass dunders have problems with their __annotate__ functions since they run into issues if the annotations that they are trying to modify canāt be evaluated properly. So for example, if a dataclass looks like this:
@dataclass
class Thingy
a: int
b: WillNeverBeDefined
then __init__.__annotate__ canāt just fall back to Thingy.__annotate__ and modify the returned dict since bās annotation breaks things?
If my understanding is correct, the proposed AST format should already fix that. The autogenerated __annotate__ could then call get_annotations(Thingy, Format.AST) (regardless of which format it got passed), proceed to exclude arguments and modify the dictionary as needed and then resolve the AST into the actually requested format. To make that easier it might be best to add a helper method to annotationlib. 90% of the functionality is already there since the forwardref and string formats already do essentially that but with extra work of having to first generate the AST using the fake globals trick.
Itās a bit more like this:
@dataclass
class Example:
a: list[Example]
b: NeverDefined = field(init=False)
Currently (in 3.14.2 now the bug Iād unfortunately introduced has been resolved) you canāt get the VALUE annotations for Example.__init__ because the annotation for b wonāt resolve, even though it is not in the annotations for __init__.
The current FORWARDREF format is no good for creating VALUE annotations because it attempts to resolve as far as it can, for example a will be list[ForwardRef("Example", ...)] and not ForwardRef("list[Example]", ...). List here is an easy example but the ForwardRef could be anywhere in an arbitrary container.
The goal is to be able to collect objects from get_annotations(cls, format=Format.EVALUATE_LATER) at the time the dataclass is constructed, and to use those to make a new __annotate__ function. The generated __annotate__ should not need to refer back to the annotations from the class as they will have already been gathered when the dataclass was constructed. You would collect these objects in a dict, as you would in 3.13 or earlier, but instead of attaching to __annotations__[1], you would attach make_annotate_function(annos) to __annotate__.
Edit: Iāll also note that the logic for dataclasses is relatively simple, itās more annoying to do it for something like attrs which also has the annotations change due to the presence of converters.
as
attrsdoes for example, dataclasses used to write them into the source code. ā©ļø
What is the AST format missing that EVALUATE_LATER would provide? We can resolve AST nodes to values, forward refs and strings. So why can you not build the annotate function from the dict of AST nodes? Or is this not about the underlying functionality and more about making sure the utility functions that do that are added to the annotationlib module?
Itās largely about making sure the utility functions are there.
The requirement is only that the objects received are not evaluated and are able to be evaluated correctly later. The DEFERRED (or EVALUATE_LATER) format as proposed in the other post planned to use the unevaluated ForwardRef objects largely because they already exist inside call_annotate_function.
Format.AST may be fine for this, my point was more that the generated __annotate__ wouldnāt be calling get_annotations itself. It would use a dictionary of annotations collected in this new format and then just call some evaluate method on each to return the annotations in the requested format when the annotate function is called.
The only other thing is that you might need to make an annotate function from objects that are already resolved. Such as with the __init__ function we add the {"return": None} annotation. Iām not sure how that would work with this AST format as I havenāt looked too closely yet. If thatās not a problem then fine.
Does someone with more experience know what my next step here should be? Iād like to keep working on this set of ideas, but Iām not sure what I should be doing next and if the community sentiment expressed here even is strong enough to make this not dead in the waters already? Iām not very experienced with this process so Iād love any guidance.
I think the next step is a PEP. Iād recommend to start working on a draft, laying out the motivation and the precise spec. Iām happy to sponsor a PEP for this idea. If youāre comfortable hacking on the interpreter, a draft implementation would also be helpful.
How do you plan on disambiguating an inline declaration of a constrained type variable from one that declares a type variable bound to a tuple?
That is, is:
def foo[A: (str, bytes)](x: A) -> A: ...
going to be interpreted as:
def foo[A: tuple[str, bytes]](x: A) -> A: ...
?
Shall we special-case the former so an outermost tuple is not transformed into a generic alias when in the context of a type variable declaration, and document that those who wish to bind a type variable to a tuple need to do it in the old-fashioned way?
I think weāre more or less forced to have def f[T: (str, bytes)]... stay as a constraint for backwards compatibility reasons. This (and the indexing issue) definitely are problems with spelling tuple types like this. I think that since tuple bounds are fairly rare and itās not a huge problem to spell them using the current syntax, itās fine to just document the current behaviour as correct like you said.
Awesome, thanks! Iāve already got a draft implementation on my github. Itās still pretty rough around the edges, but the core parts are working. Iāll start working on a draft PEP then.
How about throwing in a call syntax-based shortcut for Annotated as well, freeing us from one more import while allowing the actual type to stand out at the front of a type hint?
That is:
| Annotation Expression | Equivalent Expression |
|---|---|
| expr (expr1, expr2, ā¦) | Annotated[expr, expr1, expr2, ā¦] |
So that:
n: int (Gt(0), Lt(10))
would be equivalent to:
n: Annotated[int, Gt(0), Lt(10)]
We might be able to take advantage of keyword arguments too:
n: int (Gt=0, Lt=10)
although the above transformation allows only callables that take exactly one argument, so Iām not sure if itās worth pursuing.
Iām glad to here you want to proceed further with this idea. As I previously experimented with a new syntax for inline type expressions, Iād be happy to collaborate and/or discuss on such a PEP.
When expr (expr1, expr2) is equal to Annotated[expr, expr1, expr2], would something like Gt(10) not be equal to Annotated[Gt, 10]?
I see a lot of problems with such syntax. I could however imagine this being added.
Since Gt is not a type to begin with, it would not be recognized as a spec-conforming type hint.
The suggested syntax of foo: [int, Doc("foo")] ā foo: Annotated[int, Doc("foo")] in your link would be in direct conflict with the shorthand for typing a list as proposed in this thread.
On the other hand, in that same thread I did suggest the use of int @Doc('age') as a shorthand for Annotated[int, Doc('age')], which would not be in conflict with this proposal.
Anyway, I see the point of this proposal not being about the possible shorthands themselves but rather a mechanism to transform type hints before they are evaluated, so the suggestion above is off-topic and can be discussed separately.
Oh, I didnāt like the answer, yes I meant the a @b syntax.
However with it being considered off-topic, Iād like to apologize.
As for it being possible for one to use x(y) as shorthand for Annotated[x, y], thatās rather what I meant with the section about calling. Iām sure there might be some fix sooner or later, but e.g. str (Doc("Hi"), ...) would try to construct a string using Doc("Hi").__str__ (or repr), which I imagine both being hard to fix without a somewhat big change in grammar, and/or hard to interpret by runtime checkers.
Still, the idea seems good, though it might also be worth considering alternatives for Annotated syntax, in the thread I linked.
Iāve written a first draft of the PEP (viewable here). It concerns itself only with the addition of the AST format and necessary helper functions, leaving any new typing specific semantics to future proposals. Thereās definitely some details in the API that I still need to figure out, but everything should be in its rough shape now.
Should I open a PR to the PEP repo? Iām not sure what stage it should be in for that to happen. Iām assuming that once Iāve done that and the PEP āofficially existsā I should also make a new discourse thread?
@Viicos Iāve read through that thread and your PEP. From what I can tell your focus was on the typed dict syntax rather than the <...> syntax to expose the annotationās AST. Iāve referenced that discussion in the motivation for my PEP. If you also worked on the implementation of the AST syntax Iād be happy to hear your feedback and stuff you found out. My current implementation certainly is far from perfect! Feel free to message me privately if you donāt want to clutter the discussion here.
This looks great, thanks! I think itās close to ready to be added to the PEPs repo.
However, here are some points that I think should be covered:
- Does the AST need to carry some extra information to enable resolution of names? If so, what does it look like? You previously argued that nothing extra is needed, but it would be nice to spell out why that is so. (Note that annotations can contain comprehensions, which introduce new names, so an AST evaluator will have to go beyond just knowing which names exist in context. Of course, comprehensions are not currently legal in type annotations, but there are plausible use cases for them.)
- I have mixed feelings about using
get_type_hints()here. That function has a lot of questionable legacy behaviors and currently it mostly adds confusion over the underlyingannotationlib.get_annotations. - Youāll need to talk about how this works at runtime. PEP 649 made a point of ensuring the runtime overhead from keeping annotations (in terms of memory usage) is low. With the new structure, the amount of extra data kept in memory might become significant. Weāll at least need benchmarks to know what the extra memory usage is going to be.
- There is a line about how there will be an exemption to the backward compatibility policy to allow get_type_hints() to stop raising SyntaxError on newly supported constructs. But I donāt think there is generally BC on what errors are being raised; weāre free to add support for new features. (In other words, it wasnāt a compatibility break when
ast.parse("type x = y")worked in 3.12 but not 3.11.) Iām also not sure SyntaxError is the right exception to use here: Iād rather leave it for errors in Pythonās syntax. - We need to ensure the AST is unoptimized; with your
1 | 2 | 3example, CPython optimizes out the|at compile time if youāre in normal code. Thatās perhaps more of an implementation issue, but I think worth mentioning in the PEP. - One of the suggested use cases is writing tuple types as
(T1, T2). But this still runs into the problem thatA[B, C]andA[(B, C)]have the same AST, but should have a different meaning as types.
I looked very briefly at this. It was hard to understand. Probably having some code examples would help. The abstract was unclear:
This PEP proposes that a new value,
Format.AST, is added to theannotationlibenum and associated protocol. It instructs annotation functions to not evaluate the annotation expressions directly, but rather to return the abstract syntax trees that define each of them. This lets runtime consumers of annotations observe their full definition, rather than just the object they evaluate to. Which, in turn, creates the possibility of simplifying many existing type annotations and using existing intuitive Python syntax in typing contexts.
There must be a more direct way of explaining what this means.