Idea: Simpler and More Expressive Type Annotations

100% agree. I always have to alias Literal as sz for numpy\

I’ve been thinking about this approach more and been working on a dummy implementation to see how things shake out. Overall I think the AST based approach is really useful, but there’s some annoying wrinkles to work out. Here’s the overall idea I have in mind now:

There’s a new value AST added to the Format enum. And the auto-generated __annotate__ functions now look like this:

def __annotate__(format):
    if format == Format.VALUE:
        return {
            "a": int,
            "b": str | int,
        }
    if format == Format.AST:
        import ast
        return {
            "a": ast._build((26, "int", 1))
            "b": ast._build(...) # some more complicated tuple
        }

That is, if the AST format is specified we build essentially the same tuple, but instead of actually evaluating the values we build the ast from a constant tuple that contains the necessary data.
And finally, we add a two new helper functions to the typing module. The first calls an annotate function with the AST format, modify the returned AST in some fashion to support future typing semantics and then resolve that AST to a value/string/forwardref. The second uses the same mechanism to evaluate a string using type expression semantics.

If users then want to get type annotations, they can use the first helper method and get the same kind of results as with the annotationlib functions, but the annotations can use typing-specific semantics. And if users want to use these semantics somewhere other than in an annotation or type alias, they can write eval_type("SomeTypeExpr"). In most cases users wouldn’t even have to explicitly call the eval function since library functions that expect type forms often already support stringified versions.

I don’t think that it’s necessary to return a new kind of object that can look up names in the annotated object’s scope since we can get (most) of that info from the annotate function. The relevant global namespace is stored in the dunder and since the annotations themselves can’t define any locals, the only relevant ones are cell vars, which also are stored in a dunder. Of course, this has the weakness that we can’t look up names from stringified annotations. I’m not sure if there is a need to do that though since AFAIK the need for them is gone with delayed annotations. So if a user is having trouble with these name lookups, they should be able to just un-stringify the annotations and everything works.

Since most of the negative feedback here has revolved around the specific new semantics I originally proposed, I’d now limit this to just the general ideas and mechanics of typing specific semantics. There seems to be broad support for that idea even without an immeadiate payoff for builtin types or similar.

2 Likes

Being able to evaluate annotations in a typing-specific context could iron out some of the subtle differences that already exist depending on syntax.

For example, the union syntax and typing.Union don’t produce the same objects at runtime when one of the items in the union is unknown and so annotationlib doesn’t assume the syntax implies a union. If these were evaluated with that assumption then it would be possible to get the actual union in both cases.

>>> from annotationlib import get_annotations, Format
>>> from typing import Union
>>> from pprint import pp
>>> 
>>> class Example:
...     syntax: str | unknown
...     subscript: Union[str, unknown]
...     
>>> pp(get_annotations(Example, format=Format.FORWARDREF))
{'syntax': ForwardRef('__annotationlib_name_1__ | unknown', is_class=True, owner=<class '__main__.Example'>),
 'subscript': str | ForwardRef('unknown', is_class=True, owner=<class '__main__.Example'>)}

If extra formats were added to annotate I would however like if they could also solve the issue I brought up before with regards to the need to create new __annotate__ functions[1].


  1. like the one in dataclasses where I accidentally broke certain dataclasses in 3.14.1 :frowning: ā†©ļøŽ

1 Like

I wasn’t familiar with this issue before so I hope I’m understanding it correctly now. It’s that autogenerated functions like the dataclass dunders have problems with their __annotate__ functions since they run into issues if the annotations that they are trying to modify can’t be evaluated properly. So for example, if a dataclass looks like this:

@dataclass
class Thingy
    a: int
    b: WillNeverBeDefined

then __init__.__annotate__ can’t just fall back to Thingy.__annotate__ and modify the returned dict since b’s annotation breaks things?

If my understanding is correct, the proposed AST format should already fix that. The autogenerated __annotate__ could then call get_annotations(Thingy, Format.AST) (regardless of which format it got passed), proceed to exclude arguments and modify the dictionary as needed and then resolve the AST into the actually requested format. To make that easier it might be best to add a helper method to annotationlib. 90% of the functionality is already there since the forwardref and string formats already do essentially that but with extra work of having to first generate the AST using the fake globals trick.

It’s a bit more like this:

@dataclass
class Example:
   a: list[Example]
   b: NeverDefined = field(init=False)

Currently (in 3.14.2 now the bug I’d unfortunately introduced has been resolved) you can’t get the VALUE annotations for Example.__init__ because the annotation for b won’t resolve, even though it is not in the annotations for __init__.

The current FORWARDREF format is no good for creating VALUE annotations because it attempts to resolve as far as it can, for example a will be list[ForwardRef("Example", ...)] and not ForwardRef("list[Example]", ...). List here is an easy example but the ForwardRef could be anywhere in an arbitrary container.

The goal is to be able to collect objects from get_annotations(cls, format=Format.EVALUATE_LATER) at the time the dataclass is constructed, and to use those to make a new __annotate__ function. The generated __annotate__ should not need to refer back to the annotations from the class as they will have already been gathered when the dataclass was constructed. You would collect these objects in a dict, as you would in 3.13 or earlier, but instead of attaching to __annotations__[1], you would attach make_annotate_function(annos) to __annotate__.

Edit: I’ll also note that the logic for dataclasses is relatively simple, it’s more annoying to do it for something like attrs which also has the annotations change due to the presence of converters.


  1. as attrs does for example, dataclasses used to write them into the source code. ā†©ļøŽ

1 Like

What is the AST format missing that EVALUATE_LATER would provide? We can resolve AST nodes to values, forward refs and strings. So why can you not build the annotate function from the dict of AST nodes? Or is this not about the underlying functionality and more about making sure the utility functions that do that are added to the annotationlib module?

It’s largely about making sure the utility functions are there.

The requirement is only that the objects received are not evaluated and are able to be evaluated correctly later. The DEFERRED (or EVALUATE_LATER) format as proposed in the other post planned to use the unevaluated ForwardRef objects largely because they already exist inside call_annotate_function.

Format.AST may be fine for this, my point was more that the generated __annotate__ wouldn’t be calling get_annotations itself. It would use a dictionary of annotations collected in this new format and then just call some evaluate method on each to return the annotations in the requested format when the annotate function is called.

The only other thing is that you might need to make an annotate function from objects that are already resolved. Such as with the __init__ function we add the {"return": None} annotation. I’m not sure how that would work with this AST format as I haven’t looked too closely yet. If that’s not a problem then fine.

Does someone with more experience know what my next step here should be? I’d like to keep working on this set of ideas, but I’m not sure what I should be doing next and if the community sentiment expressed here even is strong enough to make this not dead in the waters already? I’m not very experienced with this process so I’d love any guidance.

4 Likes

I think the next step is a PEP. I’d recommend to start working on a draft, laying out the motivation and the precise spec. I’m happy to sponsor a PEP for this idea. If you’re comfortable hacking on the interpreter, a draft implementation would also be helpful.

4 Likes

How do you plan on disambiguating an inline declaration of a constrained type variable from one that declares a type variable bound to a tuple?

That is, is:

def foo[A: (str, bytes)](x: A) -> A: ...

going to be interpreted as:

def foo[A: tuple[str, bytes]](x: A) -> A: ...

?

Shall we special-case the former so an outermost tuple is not transformed into a generic alias when in the context of a type variable declaration, and document that those who wish to bind a type variable to a tuple need to do it in the old-fashioned way?

I think we’re more or less forced to have def f[T: (str, bytes)]... stay as a constraint for backwards compatibility reasons. This (and the indexing issue) definitely are problems with spelling tuple types like this. I think that since tuple bounds are fairly rare and it’s not a huge problem to spell them using the current syntax, it’s fine to just document the current behaviour as correct like you said.

2 Likes

Awesome, thanks! I’ve already got a draft implementation on my github. It’s still pretty rough around the edges, but the core parts are working. I’ll start working on a draft PEP then.

6 Likes

How about throwing in a call syntax-based shortcut for Annotated as well, freeing us from one more import while allowing the actual type to stand out at the front of a type hint?

That is:

Annotation Expression Equivalent Expression
expr (expr1, expr2, …) Annotated[expr, expr1, expr2, …]

So that:

n: int (Gt(0), Lt(10))

would be equivalent to:

n: Annotated[int, Gt(0), Lt(10)]

We might be able to take advantage of keyword arguments too:

n: int (Gt=0, Lt=10)

although the above transformation allows only callables that take exactly one argument, so I’m not sure if it’s worth pursuing.

I’m glad to here you want to proceed further with this idea. As I previously experimented with a new syntax for inline type expressions, I’d be happy to collaborate and/or discuss on such a PEP.

When expr (expr1, expr2) is equal to Annotated[expr, expr1, expr2], would something like Gt(10) not be equal to Annotated[Gt, 10]?

I see a lot of problems with such syntax. I could however imagine this being added.

Since Gt is not a type to begin with, it would not be recognized as a spec-conforming type hint.

The suggested syntax of foo: [int, Doc("foo")] → foo: Annotated[int, Doc("foo")] in your link would be in direct conflict with the shorthand for typing a list as proposed in this thread.

On the other hand, in that same thread I did suggest the use of int @Doc('age') as a shorthand for Annotated[int, Doc('age')], which would not be in conflict with this proposal.

Anyway, I see the point of this proposal not being about the possible shorthands themselves but rather a mechanism to transform type hints before they are evaluated, so the suggestion above is off-topic and can be discussed separately.

Oh, I didn’t like the answer, yes I meant the a @b syntax.

However with it being considered off-topic, I’d like to apologize.

As for it being possible for one to use x(y) as shorthand for Annotated[x, y], that’s rather what I meant with the section about calling. I’m sure there might be some fix sooner or later, but e.g. str (Doc("Hi"), ...) would try to construct a string using Doc("Hi").__str__ (or repr), which I imagine both being hard to fix without a somewhat big change in grammar, and/or hard to interpret by runtime checkers.

Still, the idea seems good, though it might also be worth considering alternatives for Annotated syntax, in the thread I linked.

I’ve written a first draft of the PEP (viewable here). It concerns itself only with the addition of the AST format and necessary helper functions, leaving any new typing specific semantics to future proposals. There’s definitely some details in the API that I still need to figure out, but everything should be in its rough shape now.

Should I open a PR to the PEP repo? I’m not sure what stage it should be in for that to happen. I’m assuming that once I’ve done that and the PEP ā€œofficially existsā€ I should also make a new discourse thread?

@Viicos I’ve read through that thread and your PEP. From what I can tell your focus was on the typed dict syntax rather than the <...> syntax to expose the annotation’s AST. I’ve referenced that discussion in the motivation for my PEP. If you also worked on the implementation of the AST syntax I’d be happy to hear your feedback and stuff you found out. My current implementation certainly is far from perfect! Feel free to message me privately if you don’t want to clutter the discussion here.

10 Likes

This looks great, thanks! I think it’s close to ready to be added to the PEPs repo.

However, here are some points that I think should be covered:

  • Does the AST need to carry some extra information to enable resolution of names? If so, what does it look like? You previously argued that nothing extra is needed, but it would be nice to spell out why that is so. (Note that annotations can contain comprehensions, which introduce new names, so an AST evaluator will have to go beyond just knowing which names exist in context. Of course, comprehensions are not currently legal in type annotations, but there are plausible use cases for them.)
  • I have mixed feelings about using get_type_hints() here. That function has a lot of questionable legacy behaviors and currently it mostly adds confusion over the underlying annotationlib.get_annotations.
  • You’ll need to talk about how this works at runtime. PEP 649 made a point of ensuring the runtime overhead from keeping annotations (in terms of memory usage) is low. With the new structure, the amount of extra data kept in memory might become significant. We’ll at least need benchmarks to know what the extra memory usage is going to be.
  • There is a line about how there will be an exemption to the backward compatibility policy to allow get_type_hints() to stop raising SyntaxError on newly supported constructs. But I don’t think there is generally BC on what errors are being raised; we’re free to add support for new features. (In other words, it wasn’t a compatibility break when ast.parse("type x = y") worked in 3.12 but not 3.11.) I’m also not sure SyntaxError is the right exception to use here: I’d rather leave it for errors in Python’s syntax.
  • We need to ensure the AST is unoptimized; with your 1 | 2 | 3 example, CPython optimizes out the | at compile time if you’re in normal code. That’s perhaps more of an implementation issue, but I think worth mentioning in the PEP.
  • One of the suggested use cases is writing tuple types as (T1, T2). But this still runs into the problem that A[B, C] and A[(B, C)] have the same AST, but should have a different meaning as types.
4 Likes

I looked very briefly at this. It was hard to understand. Probably having some code examples would help. The abstract was unclear:

This PEP proposes that a new value, Format.AST, is added to the annotationlib enum and associated protocol. It instructs annotation functions to not evaluate the annotation expressions directly, but rather to return the abstract syntax trees that define each of them. This lets runtime consumers of annotations observe their full definition, rather than just the object they evaluate to. Which, in turn, creates the possibility of simplifying many existing type annotations and using existing intuitive Python syntax in typing contexts.

There must be a more direct way of explaining what this means.

1 Like