Idea: Simpler and More Expressive Type Annotations

Yes, that is an unfortunate consequence of stringified type forms. There is the hack of inspecting the call stack, but that shouldn’t really be a best practice. But I would like to also point out that this isn’t really an issue with the AST based type form approach itself, but rather with using stringified type forms. The issues you point out already exist. But yeah, they will become more widespread as we introduce more features that need stringified type forms.

To my knowledge there also isn’t really a great solution for this that doesn’t involve new syntax. The structure function simply cannot know the locals/globals of the calling function if they aren’t explicitly passed in. That doesn’t really depend on how we’re evaluating type forms or how we’re storing some kind of intermediate representation. We have to either ask the user to write structure(..., "SomeTypeForm", locals=locals(), globals=globals()), introduce an entirely new syntax feature like structure(..., type: SomeTypeForm) that does the namespace capturing for us or use a workaround like an intermediate type alias or introspecting stack frames.

@DavidCEllis I think I’m missing how exactly your approach differs from the one discussed before. Yes, your implementation stores strings rather than ASTs and you’re using a new class for namespace lookups instead of just plain dicts, but how does that meaningfully change the behaviour in these regards? For the runtime evaluation that Tin outlines, how would your eval_type be passed the EvaluationContext argument that references the caller’s locals and globals, if the user isn’t passing them in?

2 Likes

The implementation stores strings because that’s the most complete, consistent structure you can currently get from the _Stringifier mechanism. The DeferredAnnotation object will actually accept strings, AST objects, ForwardRef or regular objects and (mostly) work.

Another goal is to be backportable. All of the changes I’ve made to annotationlib work just as well on top of 3.14[1].

EvaluationContext exists to capture the original globals/locals and other scopes[2] and keep them “live”. It handles merging the scopes together at the last moment in order to eval and also creates the _StringifierDict for ForwardRef evaluation if necessary.

So this works, for example:

from annotationlib import get_annotations, Format
def outer():
    def inner(a: list[unknown]) -> undefined:
        ...
    annos = get_annotations(inner, format=Format.DEFERRED)
    undefined = str
    return annos['a'], annos['return']

a_ann, return_ann = outer()

unknown = int

print(a_ann.evaluate())  # list[int]
print(return_ann.evaluate())  # <class 'str'>

Another part is that I don’t want any extra fallible annotation formats.

I did some work before in order to make sure STRING and FORWARDREF annotations don’t fail if there’s an annotate function that only supports Format.VALUE. Ideally, VALUE is the only format that should be allowed to fail.

The AST format in your reference appears to be dependent on __annotate__ supporting it directly. (Still crashes on loading the REPL with a debug build for me, and PyREPL also can’t open on a non-debug build).

DEFERRED will work even if the user has defined __annotations__, if they have used from __future__ import annotations or if they have only defined Format.VALUE in their annotate function[3].


With regard to Tin’s examples, yes you still need a context. That’s a string issue though and is no change from now. EG: structure({}, MyClass) works but not structure({}, "MyClass").

Creating a TypeAlias first could be made to work, but using a plain string would require some other mechanism to capture the context - stack frames probably. I consider this separate from the annotation format though. Probably a bit of a side-track, the goal is to mostly avoid using strings directly.


  1. Side note: I would love faster string annotations, they’re currently much slower than VALUE annotations. Completeness to support things like 1 | 2 | 3 would be a bonus. ↩︎

  2. if I’ve done things correctly ↩︎

  3. ok, this was unintentionally broken because I forgot .items() ↩︎

Just to spell this out, the following should then work, right?

type _Alias = MyClass[{int: float}]
structure(json.loads(b"{}"), _Alias)

A potential syntax for passing type annotations to functions (such that they can actually be used within the function body) could be this:

structure.[MyClass[{int: float}]](json.loads("b{}"))

(note the . before the [)

I think this couldn’t be done with the existing __getitem__ syntax because, as you say, the interpreter doesn’t know when to give it the special treatment it needs.

1 Like

Yes. Whenever a type form appears in an annotation or a type alias, you would be able to use the new syntax without stringifying it. We can also recover the enclosing namespace for those, so even if you do stringify a type form, we can resolve it most of the time without issues.

The problems are limited to the specific case of inline type forms in normal expression contexts that are stringified and no information about globals/locals is passed. I.e. the following would not work:

structure(some_json, "{int: MyClass}")

While all of these would work:

type _Dummy = {int: MyClass}
structure(some_json, _Dummy)
type _Dummy = "{int: MyClass}"
structure(some_json, _Dummy)
structure(some_json, "{int: MyClass}", globals=globals(), locals=locals())

This is the same behaviour you get today if MyClass is a currently undefined forward reference. The first call not working isn’t because of the new syntax or AST based annotations, but because of string literals being used to spell type forms. The change would only increase the amount of times we need to use such strings in expression contexts.

1 Like

I’m fairly strongly against forcing tools to handle contexts and passing them annotations as strings directly. Things like structure should be given something that has already been evaluated (or that provides a standard method to evaluate).

The user would then do something like:

structure(some_json, as_type("{int: MyClass}"))

Where as_type is something like:

import sys
import typing
from annotationlib import DeferredAnnotation, EvaluationContext, Format

type_transformer = TypeTransformer()  # See earlier post

def as_type(
    t: str,
    context: EvaluationContext | None = None,
    format: Format = Format.VALUE,
    *,
    frame_level: int = 1
):
    extra = {"typing": typing}
    if context is None:
        globals = sys._getframe(frame_level).f_globals
        locals = sys._getframe(frame_level).f_locals
        context = EvaluationContext(globals=globals, locals=locals)

    anno = DeferredAnnotation(t, evaluation_context=context)

    return anno.transform(type_transformer).evaluate(format=format, extra_names=extra)

And hence this will (and in fact, already does) work:

from attrs import define
from cattrs import structure

@define
class MyClass:
    arg: str

struct = structure(
    {1: {"arg": "MyClass Argument"}},
    as_type("{int: MyClass}")
)

print(struct)

# {1: MyClass(arg='MyClass Argument')}
2 Likes

I feel like we are discussing something not quite related to the initial proposal, and this is an issue that already exists as of today.

As @ImogenBits mentioned:

And I agree it isn’t best practice mainly because sys._getframe() is a CPython implementation detail not guaranteed to exist in other implementations.

In Pydantic, we already make use of sys._getframe() for our TypeAdapter, so that the following works:

MyInt = int
ta = TypeAdapter(list['MyInt'])

And I feel like third-party libraries should be the ones deciding on whether they want to allow such capabilities. If the proposed as_type() utility were to be provided by stdlib, it isn’t clear what is the availability of other Python implementations.

I also feel that these are uncommon cases, and consumers of TypeAdapter/structure et al. are responsible for providing types that can be evaluated in a sane way.


Now, putting aside how globals/locals are resolved, there is value in exposing a function to evaluate standalone type annotations (so that forward references can be resolved). See my proposal in Adding a `typing.evaluate_type()` function.

Yes, my view is that generally you would want to avoid having to construct annotations directly from strings. There’s not a practical way without using sys._getframe that I’m aware of to do so without requiring users always provide globals/locals if you do though.

The problem with trying to evaluate forward references from Format.FORWARDREF with something like evaluate_type in general is that they can be contained in arbitrary objects, or even potentially unintentionally converted to some other object.

One goal of the Deferred format is to allow you to evaluate annotations individually - it’s required for a fully working dataclass __init__.__annotate__. For example your get_model_type_hints function would look more like this:

def get_model_type_hints(cls):
    hints = {}

    for base in reversed(cls.__mro__):
        anns = get_annotations(base, format=Format.DEFERRED)
        for name, type_ann in anns.items():
            try:
                evaluated = type_ann.evaluate(format=Format.VALUE)
                hints[name] = (evaluated, True)
            except NameError:
                forwardref = type_ann.evaluate(format=Format.FORWARDREF)
                hints[name] = (forwardref, False)
    return hints

This isn’t to say that something like a recursive forward ref evaluation tool may be useful if something else gives you annotations in FORWARDREF format, but that if you’re in control of get_annotations it wouldn’t be your primary tool.

Edit note: VALUE annotations can also fail for other errors like AttributeError. My current demo of the deferred format also doesn’t support this, a final version would. Technically ForwardRef can’t evaluate it either but if ForwardRef.evaluate(format=Format.FORWARDREF) fails it just returns itself.

If as_type was in the standard library, I could just call it myself with a larger frame_level when a string is encountered. Unfortunately I think stack crawling is very slow, isn’t it? And there’s no way to apply caching to the string only, since the same string in two different contexts could give different results.

Giving this a little more thought,

Out of curiosity, how do you see generic Pydantic classes working if this idea was accepted?

from pydantic import BaseModel

class MyClass[T](BaseModel):
    a: T

MyClass[Literal[1, 2]](**data)  # works today, but will be deprecated
MyClass[1 | 2](**data)  # doesn't work
'MyClass[1 | 2]'(**data)  # obviously doesn't work
MyClass['1 | 2'](**data)  # requires stack crawling, so will be slow and fragile

Giving this even more thought, this cannot work either, right?

class Base[T]:
    a: T


class Child(Base[1 | 2]):
    pass

You could, but I wouldn’t recommend it, I think if users are creating these they should - in general - have the context attached before passing them to anything else. At the point of creation, the user presumably knows what the correct context is (or at least, is in the correct context when they create it).

It may be nicer to try to figure it out, but I think that will eventually lead to more confusing problems and a more complex API.

1 Like

Both examples do work if you use the intermediate type alias workaround:

class Base[T](BaseModel):
    a: T

type _Inner = 1 | 2

Base[_Inner](**data)

class Child(Base[_Inner]):
    pass

Yes, it’s annoying that we’d have to write things out like that. But is it really such a big problem that you would rather not have any of the potential new typing features that depend on AST based annotations? We also currently have the same problem with forward references, i.e. if you replace 1 | 2 with LaterDefined in any of these examples, it’ll work in exactly the same cases. Is that currently a big problem for cattrs or Pydantic?

2 Likes

I am going to reiterate something I said before: I think general macros ala PEP 638 are a better idea long term.

E.g. we could have a type!(<ast>) macro provided somewhere that can then be used anywhere:

class Child(Base[type!(1 | 2)]):
    pass

OldStyleAlias = type!(3 | 4)

def foo[T](a: T, b: T) -> type!((T, T)):
    ...

This doesn’t always look great - but it requires far less special casing of annotations and makes evolution of this aspect of the language somewhat independent of CPython releases. Additionally, it removes all additional runtime performance cost - the only cost would be on first import, afterwards it’s cached with the exact same strategies as current annotations.

Once a version of type! has been tested for a bit, the language can be extended to implicitly apply type! to annotations, giving us the same end state as more direct proposals but in a more controlled speed.

4 Likes

This could work, but I’m not sure there’s a macro proposal that seems likely to move forward currently. Given the need for type! in front of all of these expressions it may be a performance improvement, but I’m not sure it’s a usability improvement over something that performs transformations at runtime.

Syntax or macros wouldn’t solve the issue of needing to evaluate annotations individually or at a later point.

I do worry that new forms of annotations need to be treated carefully to avoid breaking general use of get_annotations, but I think that area could be explored outside of core Python first - while a deferred, evaluable annotation format is still needed for other uses.


@ImogenBits I worry that it feels a bit like I’ve been working against you here which isn’t what I’m trying to do.

My move away from supporting the AST format comes both from your comments that:

  1. The string and AST are largely equivalent from a data POV
  2. Constructing AST nodes is expensive

Given that for my purpose I don’t need AST nodes, I see no benefit from this - STRING in 3.14 is already 50x slower. It seems like it would make more sense for the internal format to be more accurate strings and to build the AST on demand if needed? Potentially a different parser could be used if it would allow a distinction between Class[1, 2, 3] and Class[(1, 2, 3)][1].

I’ve been trying to see if DeferredAnnotation built on these more complete strings[2] can cover your requirements while still covering mine?

Perhaps I haven’t been clear enough in how I need these to work, ideally this is how I’d want to use them in dataclasses and my own classbuilder:

  1. Switch dataclasses to use Format.DEFERRED instead of Format.FORWARDREF
  2. Replace type on dataclasses Field
    • Field.type would become a settable property, backed by an internal _type which can be a DeferredAnnotation, it would evaluate as Format.FORWARDREF if _type is a DeferredAnnotation
    • This would resolve this issue
  3. Replace dataclasses’ _make_annotate_function with one using the helper from annotationlib
    • __init__.__annotate__ would then be make_annotate_function({f.name: f._type for f in fields(cls) if f.init} | {'return': None})[3]
    • This would resolve the issue of forward reference init=False fields in VALUE annotations
  4. Ideally not be significantly slower than using VALUE annotations.
    • classbuilder currently tries to use VALUE annotations first for performance, if these fail it falls back to STRING annotations - this results in a ~40% slowdown in class construction.
    • I’m not sure if this would be as prominent in dataclasses, they spend a lot of their time in exec while classbuilder defers the exec calls so getting the annotations is a larger portion of the construction time.

Perhaps that clears things up a little?


  1. Presuming that future STRING annotations distinguish between the two. ↩︎

  2. The internals would actually be private, so they can be constructed from other objects - but when retrieved from CPython generated annotate functions they would use strings. ↩︎

  3. With the correction for __qualname__ and the usual shenanigans for slotted class construction. ↩︎

I think we have completely different purposes behind our ideas here. This proposal is about enabling type annotations to sensibly contain things like int if SomeCond else str. Your concern is to help tools that programatically synthesize __annotate__ functions. I’m not against that in any way, it’s just not what this proposal is about. I only included the make_annotate_function to reduce the burden of introducing a new format, not as a primary design concern.

But luckily, we can actually solve both problems with the same changes! I’ve moved some of the AST construction logic into C and now the AST based formats are somewhere between as fast as the current STRING format and significantly (about 2-3 times) faster. Compared to VALUE, it’s only about 20 to 50 times slower. Constructing the Python AST nodes isn’t terribly fast, but luckily that wasn’t what was limiting things. The AST analyzing code also still is in Python, so the complexity and maintenance burden is fine I think. We can also get further speed increases should something like the faster AST nodes that Jelle mentioned ever land.

So if we use the AST approach, we get the potential for new typing features, more convenient synthesized annotate functions and it’s gonna be faster than it currenly is. Yes, specific use cases could have other solutions that are even faster for them, but I think one general-purpose solution that’s not really that slow is much better than several fragmented formats that try to target one specific purpose.

3 Likes

Yes, my primary concern is that adding another format should not make doing this harder, and preferably should make this easier. Some kind of individual evaluate and make_annotate_function that works with these separate objects is necessary for this and both of those need to support all possible formats.

Adding a new format without addressing this makes these problems more difficult.


The other thing I’m not getting is why the internal __annotate__ function needs to construct these AST objects. You need AST objects to perform transformations, but I don’t see why that means the __annotate__ functions themselves have to provide them. In the same way that currently __annotate__ doesn’t provide Format.FORWARDREF.

Is this faster, easier or better for memory than providing more complete/correct strings?

Yeah, those will be part of the updated PEP.

The exact memory impact really depends on the annotation, the longer it is the more the AST format is favoured. But both formats will be about on par with each other. Of course, just returning the string is faster than constructing the AST, but manually parsing the AST from the returned string about doubles the runtime of getting the annotations compared to storing the ASTs.

Since most people that use annotations, we’ll have to parse the strings into ASTs every time we want to evaluate the annotations. But even if you just want to synthesize a new annotate function, you’ll probably still want to do some kind of introspection or modification of them. E.g. dataclasses need to look into the annotations to see if they’re a ClassVar or similar. It also currently sets the __init__'s annotation as e.g. InitVar[int] even though it should probably be stripping the InitVar. I just don’t really see that many use cases where you don’t want to actually look at an annotation but only pass around opaque strings.

You could put that as a dataclasses bug I guess, that’s no change from 3.13 behaviour though.

My major use for this is in my classbuilder - it will identify ClassVar and KW_ONLY as strings and these are the only annotation specific behaviours it has. This is largely because I want to avoid evaluating annotations, especially as PEP-810 would mean that evaluating them could be triggering imports which I really want to avoid.

Currently all of the PEP-649 formats will force evaluation as even Format.STRING has to check __annotate__(format.VALUE_WITH_FAKE_GLOBALS) doesn’t raise NotImplementedError before it knows if it can call the annotate function in the fake globals context[1].


I guess my view is that in the long term, people will have to agree on one format for how to write these annotations and the transformation would then be best done internally on compilation, rather than being something everyone has to deal with at runtime. Otherwise get_annotations becomes essentially useless when using these forms and end users will always have to perform AST conversions at runtime.

Somewhat to that end I’d prefer if the objects returned by get_annotations be opaque in that we can change the internal representation in the future without breaking things. It would also mean we could backport something that will mostly work on 3.14 (in the cases where the annotation can currently be reconstructed) and that will give a sensible output for __future__ annotations.

I’m not actually sure what the internal representation of the annotations is before you build the AST in C. Is it possible to return an object and defer the AST construction?


Note: I still can’t get your fork to run cleanly.

With a debug CPython build I see: python: Python/compile.c:1496: _PyAST_Compile: Assertion `co || PyErr_Occurred()' failed. and the process aborts.

With a non-debug build pyrepl fails with: warning: can't use pyrepl: <built-in function compile> returned NULL without setting an exception


  1. Once inside the fake globals context, the name Format gets converted to a stringifier, so expressions like format == Format.VALUE always evaluate to True regardless of the value of format. ↩︎