Add a `Format.DEFERRED` option for PEP-649/749 annotations

Writing an __annotate__ function to handle annotations for generated objects[1] in 3.14+ is significantly more complicated to implement than creating a dictionary for __annotations__ in earlier versions.

To simplify creating these functions, add a DEFERRED format for annotations that can be used to generate correct and complete __annotate__ functions from a previously gathered dictionary of these deferred annotations.

Background

In 3.13 and earlier it is possible to gather annotations into a dictionary and assign that dictionary to the __annotations__ attribute of a generated method in order to support runtime annotations for the method. For example this is what attrs does for its generated __init__ functions. dataclasses similarly gathers annotations when the decorator is called but uses them slightly differently and has its own issues.

Currently in Python 3.14 with the new annotations implementation when you retrieve annotations using annotationlib.get_annotations or annotationlib.call_annotate_function you can choose from three evaluated or partially evaluated formats.

With Vector = list[float]

  • Format.VALUE
    • Vector becomes list[float]
    • list[unknown] would cause the function to raise a NameError
  • Format.FORWARDREF
    • Vector becomes list[float]
    • list[unknown] becomes list[ForwardRef('unknown', ...)]
  • Format.STRING
    • Vector becomes "Vector"
    • list[unknown] becomes "list[unknown]"

While this may be all that is needed for many use cases, none of these are sufficient on their own to create all of the other formats in order to synthesize a correct and complete __annotate__ function for a generated method.

The current proposed solution I have for dataclasses in 3.14 requires essentially reimplementing the dataclasses field gathering logic, which seems excessive for annotations and may fail VALUE annotations in cases where it could otherwise succeed (if there’s a forward reference in an init=False field for example).

Previous attempts tried to use the FORWARDREF format that dataclasses currently gathers in the type attributes of fields but as can be seen, those both lose names like Vector here and ForwardRef objects can be arbitrarily contained in other objects making them difficult to remove at a later point for VALUE annotations.

Proposal

Add a fourth[2] format for annotations Format.DEFERRED.

The requirement of this format is that all annotations returned must have an evaluate(format=...) method that can be used to retrieve all of the other annotation formats. For the core case, this would be the ForwardRef class, which already has this method.

So for the earlier example, this would be the result:

  • Format.DEFERRED
    • Vector becomes ForwardRef('Vector', ...)
    • list[unknown] becomes ForwardRef('list[unknown]')

As ForwardRef has an .evaluate(format=...) method, these can then be evaluated into any of the existing formats.

This makes it possible to gather annotations at the time of class decoration and use them to generate an annotate function that properly supports every annotation format.

Required Changes

All of these changes would be to annotationlib. A backport package could be made to support 3.14 if needed as long as the enum value is agreed upon. I don’t believe this requires any changes to the __annotate__ functions generated by CPython itself.

  • The Format enum gains a new value DEFERRED
    • This is needed so that user generated __annotate__ functions can also return this format
  • call_annotate_function and get_annotations would need to be updated for this format
    • The logic for this already exists to support the FORWARDREF format in certain cases
  • ForwardRef.evaluate can support this format by returning the object itself
  • A helper to create objects with an evaluate method would be needed for cases where annotations have already been evaluated in some way
    • For example if one of the annotations sources used __future__ string annotations or if other extra values need to be added to the annotations such as the return None for dataclasses’ __init__
  • A make_annotate_function helper function to convert these gathered deferred references into a new __annotate__ function which can be attached to the object that needs to be annotated.

make_annotate_function would be something like this:

def make_annotate_function(annotations):
    # pre-processing logic to make sure all annotations support `.evaluate()`
    ...
    def __annotate__(format, /):
        match format:
            case Format.VALUE | Format.FORWARDREF | Format.STRING:
                return {k: v.evaluate(format) for k, v in annotations.items()}
            case Format.DEFERRED:
                return annotations.copy()
            case _:
                raise NotImplementedError(format)
    return __annotate__

There will certainly be some fine details that still need to be worked out.


  1. such as for attrs and dataclasses generated __init__ methods ā†©ļøŽ

  2. I’m ignoring Format.VALUE_WITH_FAKE_GLOBALS as that’s not really a format, but an indicator that the fake globals environment can be used during evaluation ā†©ļøŽ

5 Likes

So this didn’t get much attention but it’s still something that will ease some issues with the new annotations and I’d like to have it both for dataclasses and my own dataclass-like.

I ended up implementing this as part of the other discussion on annotation transformations, but it really covers a separate issue. I also have a branch reworking dataclasses on top of this new format to demonstrate the problems it solves.

I hope that this demonstrates why this format is useful because I’d like to know if it’s worth following this up with a proposal or if there’s just no interest as it first appeared.


Just the annotationlib changes

The demo/reference implementation of the annotationlib changes with no dataclass modifications is currently here: GitHub - DavidCEllis/cpython at deferred-annotations

This adds 2 classes and one function to annotationlib, along with the new enum value for Format.DEFERRED. These changes could be made available in a backport to support 3.14 if necessary.

Calling get_annotations or call_annotate_function with Format.DEFERRED will always return a dict with DeferredAnnotation objects as values given a sensible, working annotate function or __annotations__. This is unlike Format.FORWARDREF which will attempt to resolve as much as possible.[1]

from pprint import pp
from annotationlib import get_annotations, Format

Vector = list[float]

class Example:
    a: int
    b: Vector
    c: undefined
    d: list[undefined]

forwardrefs = get_annotations(Example, format=Format.FORWARDREF)
deferred = get_annotations(Example, format=Format.DEFERRED)

print("ForwardRef:")
pp(forwardrefs)
print("\nDeferred:")
pp(deferred)

Output:

ForwardRef:
{'a': <class 'int'>,
 'b': list[float],
 'c': ForwardRef('undefined', is_class=True, owner=<class '__main__.Example'>),
 'd': list[ForwardRef('undefined', is_class=True, owner=<class '__main__.Example'>)]}

Deferred:
{'a': DeferredAnnotation('int'),
 'b': DeferredAnnotation('Vector'),
 'c': DeferredAnnotation('undefined'),
 'd': DeferredAnnotation('list[undefined]')}

DeferredAnnotation objects internals are private, partly as they can be several different types but also to allow for future optimizations if there is a better representation in a future Python release. It is also possible to construct them from objects that have already been evaluated for make_annotate_function in order to support cases such as make_dataclass where annotations need to be created from evaluated objects.

from annotationlib import (
    call_annotate_function, make_annotate_function, Format, ForwardRef
)

annotate = make_annotate_function({"a": str, "b": ForwardRef("Any", module="typing")})

print(call_annotate_function(annotate, format=Format.FORWARDREF))
print(call_annotate_function(annotate, format=Format.STRING))
print(call_annotate_function(annotate, format=Format.DEFERRED))

import typing
print(call_annotate_function(annotate, format=Format.VALUE))

Output:

{'a': <class 'str'>, 'b': ForwardRef('Any', module='typing')}
{'a': 'str', 'b': 'Any'}
{'a': DeferredAnnotation('str'), 'b': DeferredAnnotation('Any')}
{'a': <class 'str'>, 'b': typing.Any}

Dataclasses example

As an example of their utility, here is another branch that also changes dataclasses to use Format.DEFERRED: GitHub - DavidCEllis/cpython at deferred-annotations-dataclasses

This makes dataclasses use get_annotations(cls, format=Format.DEFERRED), replacing the use of Format.FORWARDREF. It also removes the two custom annotate functions that were created (one for __init__, one for make_dataclass), replacing them with new standard ones from annotationlib.make_annotate_function.

This fixes and improves a number of things:

  1. References to the class itself will resolve in field.type - issue. field.type is made into a settable property with an internal field._type which may be a DeferredAnnotation.
from dataclasses import dataclass, fields

@dataclass
class Example:
    examples: list[Example]


for f in fields(Example):
    print(f"{f.name}: {f.type}")

3.14.2

examples: list[ForwardRef('Example', is_class=True, owner=<class '__main__.Example'>)]

deferred-annotations-dataclasses

examples: list[__main__.Example]
  1. The annotate function for __init__ no longer fails if there is an unresolvable non-init annotation.
import inspect
from dataclasses import dataclass, field

@dataclass
class Example:
    not_in_init: list[undefined] = field(init=False, default=None)
    in_init: int

print(inspect.signature(Example))

3.14.2

...
    not_in_init: list[undefined] = field(init=False, default=None)
                      ^^^^^^^^^
NameError: name 'undefined' is not defined

deferred-annotations-dataclasses

(in_init: int) -> None
  1. Annotations removed from the class that were present when __init__ was generated are kept in the generated __init__.__annotate__ function and hence the function signature.
from dataclasses import dataclass
import inspect

@dataclass
class C:
    "doc"  # prevent inspect.signature from running early
    x: int

C.__annotate__ = lambda _: {}

print(inspect.signature(C))

3.14.2

(x) -> None

deferred-annotations-dataclasses

(x: int) -> None
Note on a potential change to untyped `make_dataclass` behaviour

In the current deferred implementation, untyped fields made with make_dataclass no longer import typing when get_annotations is called. As such, get_annotations on such a generated class will fail with VALUE annotations if typing is not imported. This could be changed if desired, but I think it’s consistent with how PEP-649 annotations work in general.

Note: Don’t run this in the REPL, as the REPL imports typing itself.

from dataclasses import make_dataclass, fields
from annotationlib import get_annotations, Format

C = make_dataclass('C', ["x"])

print("ForwardRef")
print(get_annotations(C, format=Format.FORWARDREF))

x_field = fields(C)[0]
print(f"{x_field.type = }")

print("\nValue")
try:
    print(get_annotations(C))
except NameError as e:
    print(repr(e))
    print("Retry with typing imported")
    import typing
    print(get_annotations(C))
    print(f"{x_field.type = }")
else:
    print("With typing imported")
    import typing
    print(f"{x_field.type = }")

3.14.2 - Note that even trying to use Format.FORWARDREF imports typing, as get_annotations will first try to get __annotations__ which uses VALUE annotations.

ForwardRef
{'x': typing.Any}
x_field.type = ForwardRef('Any', module='typing')

Value
{'x': typing.Any}
With typing imported
x_field.type = ForwardRef('Any', module='typing')

deferred-annotations-dataclasses

ForwardRef
{'x': ForwardRef('Any', module='typing')}
x_field.type = ForwardRef('Any', module='typing')

Value
NameError("name 'Any' is not defined")
Retry with typing imported
{'x': typing.Any}
x_field.type = typing.Any

  1. Note that these will not attempt to resolve stringified __future__ annotations, matching how get_annotations will also not attempt to resolve such annotations. ā†©ļøŽ

3 Likes

From my perspective, this seems valuable for solving a few dataclasses gotchas; and I like the concept of make_annotate_function, especially as annotate functions continue to evolve over time.

I also like the idea that this could open more ways to represent annotations, though we’ll still need AST-based annotations for unions of tuples or whatnot.

However, as much as the fixes are helpful, do you have any other concrete uses cases for deferred annotations outside of dataclasses? If we’re adding this much complexity just to solve a few dataclasses issues, I’m not sure how much traction the proposal will get.

My view on further thought is that actually requiring users to do AST transformations at runtime is just going to make performance worse and make handling annotations more difficult.

Any tool which works with annotations at runtime will essentially be forced to perform the transformations in order to work, at which point I think it’s better to do the transformations before they are baked into the __annotate__ functions. @MegaIng brought up a similar thought in that thread.

As such, adding additional ways to represent types is not a goal of this proposal. The goal is to add a format which can both be easily evaluated in all of the supported annotation formats and can be used to create new __annotate__ functions that work correctly and completely.


I disagree with the characterisation that this is adding ā€œthis much complexityā€. Both the evaluation logic and the construction logic underlying DeferredAnnotation are already part of annotationlib.

Format.DEFERRED is built from the implementation of Format.STRING, just without the final step of string conversion. The evaluation logic is mostly extracted from ForwardRef and moved to EvaluationContext in order to be shared.

The change DEFERRED brings is that it makes it possible use to these internals to make creating new __annotate__ functions much easier. This moves the complexity out of dataclasses and into annotationlib and makes it reusable for anything else that needs to construct an __annotate__ function.


Other examples of issues

You characterise these as ā€œa few dataclass gotchasā€ but note that the third dataclasses example is what SQLAlchemy was doing that caused its use of dataclasses to break in 3.14.1 when the logic I’d written for the __annotate__ function mistakenly assumed the annotations that were there when dataclass(...) was called would still be there. The patch in 3.14.2 just prevents this from breaking if the annotation isn’t there, it doesn’t make the annotation work correctly as it does with DEFERRED.

Some other examples I’m aware of:

`attrs` leaks forward references in VALUE and STRING annotations

Currently attrs is getting PEP-649 annotations ā€˜wrong’ and it’s not possible to easily fix it for the same reason dataclasses doesn’t have a proper fix.

from attrs import define
from annotationlib import get_annotations, Format

@define
class Example:
    examples: list[Example]

print(get_annotations(Example.__init__, format=Format.VALUE))
print(get_annotations(Example.__init__, format=Format.STRING))

Output

{'return': None, 'examples': list[ForwardRef('Example', is_class=True, owner=<class '__main__.Example'>)]}
{'return': 'None', 'examples': "list[ForwardRef('Example', is_class=True, owner=<class '__main__.Example'>)]"}

This is because attrs doesn’t write an __annotate__ function, it’s just putting the Format.FORWARDREF annotations into __annotations__. As such it leaks ForwardRef objects into VALUE and STRING annotations.

Handling this with DEFERRED annotations is a much simpler implementation change than rewriting all of the attribute gathering logic purely to attach to __init__. The logic would be more complicated for attrs than for dataclasses as it also has to care about the annotations in any converter functions and it would be just as broken as dataclasses implementation.

This could broadly use the same fix as my dataclasses changes.

My own class building library uses STRING if VALUE fails to avoid leaking ForwardRef

My own class builder currently ā€˜cheats’ and uses VALUE annotations where possible, when those fail it uses STRING annotations because FORWARDREF annotations lead to the issues attrs has.

from ducktools.classbuilder.prefab import Prefab
from annotationlib import get_annotations, Format

class Example(Prefab):
    examples: list[Example]

print(get_annotations(Example.__init__, format=Format.VALUE))
print(get_annotations(Example.__init__, format=Format.STRING))

Output

{'examples': 'list[Example]', 'return': None}
{'examples': 'list[Example]', 'return': 'None'}

Constructing a ā€˜correct’ __annotate__ function for this is about as complicated as it is for attrs, but slightly different - __prefab_post_init__ works as a partial __init__ function and changes its annotations.[1]

I’d probably replace the use of STRING annotations with DEFERRED annotations. VALUE annotations would likely still be used where they succeed for performance reasons.

Bonus digression on current annotations implementation details

In PEP-749 as part of the implementation of deferred annotations, the way of recognising if an annotate function can be called in the ā€˜fake globals’ context is to call it with VALUE_WITH_FAKE_GLOBALS. If it it supported, this outputs the same annotations as VALUE.

Unfortunately, you can’t check this within a fake globals context as names like NotImplementedError and annotationlib.Format don’t exist in that context. Thus you have to call the annotate function in the standard globals space first to see if it raises NotImplementedError.

As such, the other formats are by their nature always going to be slower than Format.VALUE in the cases where VALUE would succeed.

DEFERRED actually exploits this required call and uses successful results to pre-fill it’s cache for VALUE evaluations.

A Pydantic developer wants to be able to distinguish *which* annotations fail to evaluate

Another example is this helps to handle functions such this one, based on an example given by @Viicos in this comment on an annotations issue

This is given as roughly the logic that Pydantic has to use:

def get_model_type_hints(cls):
    hints: dict[str, tuple[Any, bool]] = {}

    for base in reversed(cls.__mro__):
        anns = get_annotations(base, format=Format.FORWARDREF)
        for name, type_ann in anns.items():
            try:
                evaluated = typing._eval_type(type_ann, globalns=..., localns=..., type_params=...)
                hints[name] = (evaluated, True)
            except Exception:
                hints[name] = (type_ann, False)

    return hints

First, this requires the library handle the evaluation contexts itself from somewhere and second it requires using an internal typing._eval_type which special cases traversal of some typing objects.

The current[2] DEFERRED equivalent of this would be:

def get_model_type_hints(cls):
    hints: dict[str, tuple[Any, bool]] = {}

    for base in reversed(cls.__mro__):
        anns = get_annotations(base, format=Format.DEFERRED)
        for name, type_ann in anns.items():
            value = type_ann.evaluate(format=Format.FORWARDREF)
            hints[name] = value, type_ann.is_evaluated
    return hints

This no longer requires handling of evaluation contexts directly and doesn’t do any special case traversal of annotations to try to uncover nested ForwardRef objects like _eval_type.

I’d be interested if there are other examples, but my primary interaction with runtime annotations is through dataclasses, attrs and ducktools-classbuilder. All of which essentially have some variation on the same ā€˜constructing __annotate__ for a generated __init__’ problem and all need their own bespoke solutions without something like DEFERRED.


  1. Yes, static tools don’t understand this correctly. The annotation for __init__ should be that of the argument to __prefab_post_init__, not the class annotation. ā†©ļøŽ

  2. I only just added the public .is_evaluated - previously you would have had to check internals. This indicates if a VALUE annotation has been cached. ā†©ļøŽ

2 Likes

Thanks for the real-world examples! With that added context, it certainly feels like a lot more of an issue than just a few gotchas :slight_smile:

Regarding the implementation - I noticed that your original proposal had DEFERRED return a ForwardRef - is there a particular reason you switched to a new DeferredAnnotation class?

1 Like

There were a few things that made me switch to DeferredAnnotation.

Initially I thought it would be possible to just take the STRING logic and use .transmogrify without evaluating, and hence use ForwardRef as the type - but it turns out some annotations can ā€˜escape’ the stringifiers. This came up while looking into the AST annotation format…

I just now realised this also makes them currently fail to evaluate under ForwardRef too under the ā€œrightā€ circumstances. Using the ā€˜b’ attribute to raise an attribute error and force the backup evaluation logic.

import typing
from annotationlib import get_annotations, Format

class Example:
    a: [str, int]
    b: typing.attribute_error

print(get_annotations(Example, format=Format.FORWARDREF)['a'])
[ForwardRef('str', is_class=True, owner=<class '__main__.Example'>), ForwardRef('int', is_class=True, owner=<class '__main__.Example'>)]

(This is handled correctly by DEFERRED already)


For make_annotate_function and for the DEFERRED format it is useful to be able to construct them from existing objects to support things like DeferredAnnotation(None). DeferredAnnotation.evaluate handles how to evaluate the different internal representations rather than making the user do this.

There are also some slight differences in behaviour that are necessary for emulating the behaviour of get_annotations on a class/function. For example, DeferredAnnotation will not attempt to evaluate strings that do not have an associated context in order to match the behaviour of get_annotations on __future__ annotations.

ForwardRef("str").evaluate()  # <class 'str'>
DeferredAnnotation("str").evaluate()  # "str"

This is to support this behaviour:

from __future__ import annotations
from annotationlib import get_annotations, Format, make_annotate_function

class Example:
    a: list[str]

new_annotate = make_annotate_function(get_annotations(Example, format=Format.DEFERRED))

for f in Format:
    if f == Format.VALUE_WITH_FAKE_GLOBALS:
        continue
    assert get_annotations(Example, format=f) == new_annotate(f)
1 Like