Two polls on how to revise PEP 649

Good point, thanks for bringing that up! Yeah, annotations inside a class definition that refer to the class itself are currently impossible to resolve to a real value in a class decorator, and it’s not even fixable.

For those of you unfamiliar with the problem, this:

@decorate
class Node:
    child: Node

is compiled into something akin to this pseudocode:

Node = decorate(__build_class__('Node', ...))

That is, all class decorators are called before the class name is bound.

PEP 649 ameliorates a lot of class definition ordering problems by delaying evaluation of the annotations from class definition time (just before the __build_class__ call) to some arbitrary future time when code asks for the annotations. But if that arbitrary future time is “while running a class decorator”, the name still isn’t bound, so the lookup to populate the annotation fails.

I remember discussions where we discussed trying to fix this. What if Python instead bound the class name before calling the decorator(s)? This wouldn’t work, because class decorators can return a different class, or even a different object altogether. And this isn’t exactly a rare technique. For example, dataclass does this when the class it’s decorating uses slots.

The closest you could get would be to bind the class name before the first decorator, then rebind it after every decorator. But this could still get you into trouble. If you decorate a class with two decorators, and the first class decorator examines the annotations, and the second class decorator returns a different class, the first class decorator may have kept a reference to the old class object. This is a danger today too, but it seems to me that binding the class name would invite more inobvious corner case bugs.

(Perhaps we’d also need to forget any calculated annotations after every class decorator call, to drop any possible references to the old class, just in case. But this still wouldn’t solve every corner case.)

The good news is, we’ve solved the problem in a different way, with PEP 673, the “Self” type hint. Instead of referring to the bound name of the class inside of the class definition, best practice will be to use typing.Self. So my take is, PEP 649 doesn’t have to do anything to address this situation, as typing.Self will work perfectly with 649.

1 Like

The re-evaluation would have to be explicit. As you point out, caching results leads to its own problems.

I propose the API look something like this:

ann_values  = inspect.get_annotations(o)  # default value for 'format' is inspect.VALUES
ann_hybrid  = inspect.get_annotations(o, format=inspect.HYBRID)
ann_strings = inspect.get_annotations(o, format=inspect.STRINGS)

(typing.get_type_hints would also get the same format parameter, and would behave similarly to inspect.get_annotations when called with different format values. I’m guessing internally typing.get_type_hints will just call inspect.get_annotations to get the values.)

“hybrid” and “strings” modes would calculate a fresh annotations dict each time, and the calculated values wouldn’t be cached. (Apart from Python’s own caching–interned strings, small ints, the empty tuple, etc.) “values” mode would also return a different dict every time–that’s part of its API definition–but internally it would cache a copy of the annotations dict as o.__annotations__. So the values wouldn’t change.

Unlike the other two formats, calculating an annotations dict with real values can fail if evaluating the annotations fails. For example, __co_annotations__ can throw a NameError because of an as-yet undefined value. inspect.get_annotations doesn’t handle this error, so execution stops before the annotations dict is completed and cached.

One thing to consider: when calculating a “hybrid” dict, if o.__annotations__ is set, should it just return (a copy of) that? After all, o.__annotations__ can only be set once all values are calculable. Which means “hybrid” mode won’t have to create any ForwardRefs, they’ll all be real values. So at that point the “hybrid” annotations dict should be identical to the “values” annotations dict. I suspect the answer is, yes, it should. I can propose funny corner cases where the cached values become invalid, but I think caching the annotations is an important part of the API, so at this point I think having “hybrid” return the cached values is less surprising.

2 Likes

Yes, I think this is correct. The downside is that everyone who cares about the “current” annotations will have to re-evaluate. dataclasses won’t have to do this, because the top-level InitVar is good enough, and I’m comfortable saying it can’t be a forward reference. I’m not positive how pydantic works, but I assume they would want to re-evaluate.

1 Like

I think the combination of “using decorators that access annotations right after class definition” and “wanting to introspect real objects at some point” is a use case that really exists. We’ve seen with PEP 563 that even a significant number of dataclass users want and expect to introspect real objects out of the type field of their dataclass Field objects, and were upset when they turned into strings. @mdrissi suggests this use case also exists for pydantic users with recursive models, and I’ve also seen it in an internal framework at work.

I don’t think that typing.Self solves the problem of recursive references, it just helps with the one edge case of direct self-reference. How do you use typing.Self in this case?

@dataclass
class Parent:
    child: Child

@dataclass
class Child:
    parent: Parent

Definitely agree that general-purpose code to eagerly recursively reify any object possibly containing a ForwardRef anywhere inside it doesn’t belong in the stdlib, wouldn’t be 100% safe in all cases, and would probably never be written or needed. But it’s a lot easier to imagine someone writing some code that works for their specific case, probably not by eagerly recursively resolving everything, but rather by just getting the specific information they need and lazily reifying any ForwardRef they happen to run across while doing so.

“Just re-evaluate the annotation” is a good idea that I hadn’t thought of; in a lot of cases that definitely seems like the best approach. But it fails in the very same cases where just-eval-a-string fails because you don’t know the right module globals to eval: where the annotation value has been copied away from its original location to somewhere else. And the stdlib itself includes two examples of that: TypedDict copies annotations from “bases” (and then doesn’t even keep the base as an actual base), and dataclass also will sometimes take an annotation from base class and “reuse” it as part of the annotations for a generated __init__ method on a subclass. I suspect people in the wild will find more creative reasons to divorce introspected annotations from their original source.

Is there a strong reason not to allow hybrid mode forward-ref/stringizer objects to also keep a reference to the original module globals, so people in these situations can have some limited hope of reconstructing things?

To work properly, it’d also have to keep a reference to any closures used inside:

def fn():
    x = Int
    def inner(a: undefined_type[x]):
        ...
    return inner

fn2 = fn()
hybrid_fn2 = inspect.get_annotations(fn2, inspect.HYBRID)
undefined_type = typing.Array

At that moment, hybrid_fn2 should look something like ForwardRef('undefined_type[x]'). How, specifically, do you propose to resolve the reference to x?

I guess if we are keeping around a globals dict, it’s not much of a stretch to keep both a globals and locals dict?

But I also think that the combination of “defined in a nested scope,” “uses non-global names in annotations,” “uses not-yet-defined global names in annotations,” and “I later need to introspect real objects” probably reaches the point of a vanishingly rare intersection, so I wouldn’t be too worried about it. The goal here isn’t perfect reproducibility in every possible scenario (your example of “object makes decisions on initialization based on type of nested object” is already a great example of why that’s impossible); the idea is just to avoid needlessly throwing away context that’s not hard to keep around and has a reasonable chance of being useful to someone.

An alternative, instead of preserving the source globals (and maybe locals), would be to keep a reference to the object the annotation was originally fetched from and the key in that object’s annotations dict that it came from. This would allow “re-fetch the annotation” in all cases, which as you observe is more reliable. The downside would be that the context is no longer sensible if a compound annotation ends up getting split apart (i.e. someone unwraps an InitVar and passes along the wrapped type which is a ForwardRef; re-evaluating the original annotation would “undo” the unwrapping.)

Sure, we’d probably need to keep around a locals dict too, for values defined in an enclosing class namespace. But in my example x isn’t looked up that way; it isn’t even looked up by name. It’s a closure, so it’s looked up by index into a tuple of “cell variables”.

Shooting from the hip, here’s one approach that I think would work. When creating the ForwardRefs, we keep a reference to the globals dict, the locals dict if needed, and if there’s a closure, a copy of the tuple of cell variables, and a copy of the names of the free variables. At the point that the user asks the ForwardRef to evaluate itself, we create a dict mapping the names of the free variables to their current values. If there are no locals defined, we use that dict as locals; if there are also locals defined, we create a new dict, update it from locals, and then update it from our closure dict, and use this new dict as locals. We then eval the string using those globals and locals.

Folks who are better at closures than I am: would that work? Is there a better way?

Carl: is it still worth it?

I do have pydantic like decorator that deals with this issue of wanting actual value, but having recursive annotations in an internal library. The core logic simplified is,

from inspect import currentframe

class AnnotationParser(Generic[T]):
  def __init__(self, cls: type[T], declare_frame: FrameType):
    self.cls = cls
    self.declare_frame = declare_frame

  # This function is not called at time AnnotationParser is made. It gets triggered
  # later when needed and by that time the forward references should all be defined.
  @cached_property
  def annotations(self):
    globals = self.declare_frame.f_globals
    locals = self.declare_frame.f_locals
    annotations = get_type_hints(self.cls.__init__, localns={**locals, self.cls.__dict__}, globalns=globals, include_extras=True)
    ... 

def my_dataclass(cls: type[T]) -> type[T]:
  current_frame = currentframe()
  caller_frame = current_frame.f_back
  parser = AnnotationParser(cls, caller_frame)
  ... # add parser to a global registry mapping from type -> parser.

This has been enough to handle self recursion/mutual recursion/most edge cases I’ve hit. There are a couple even trickier ones that this misses and I have some extra logic for, but saving both globals/locals gets pretty far.

The main edge case I remember this still missing and I had to write extra logic for was self recursive type alias,

# Is a type alias a type? No, but it is a type form and it's nice to be able to handle Unions/similar
# things too.
Foo: TypeAlias = int | Sequence["Foo"]
configurable(Foo)

I haven’t hit closure case and on my end I do have control over the types this decorator is applied to so I do sometimes just adjust code on my side if necessary.

1 Like

Ah, good point! I was thinking closures weren’t involved because the definition of inner is still happening in the local scope of fn – but of course that’s no longer true with PEP 649, when it comes to the annotations!

I think your idea would work, but to me it feels like at that point we are too close to reinventing “the function” in yet another guise; PEP 649 is nice because it just uses a function instead of doing that. (Which of course brings to mind an even crazier idea, which is that a ForwardRef could carry along with it an actual function to reify its value!)

But no, I don’t really think this is all worth it. To me, keeping a reference to the globals dict passes the 80/20 bar, keeping locals might also, anything beyond that probably isn’t necessary.

You’re right, typing.Self doesn’t help in this case. But also, this isn’t directly recursive. An annotation in Parent is referring to something that isn’t defined yet. From the perspective of “how do we solve this”, this feels more like a garden variety forward reference, than specifically the self-recursive annotation example from earlier. The wrinkle is that we can’t reorder the classes to get rid of the forward reference. But technologically, any solution that solved forward references would solve this problem.

Again, I proposed a language-level forward class declaration last year and it was not well received. So I fear use cases like this are stuck with the classic type hint solution for a forward reference: a string, or a ForwardRef instance.

1 Like

Is that going to help if Parent and Child are defined in different modules? That’s where this has bitten me before with recursive importing

Nothing here changes how imports work, so you still can’t have mutually recursive imports actually execute at runtime and be used at module level. But if you guard one of the imports with if TYPE_CHECKING: so it isn’t actually imported at runtime, and the references are used in annotations, and you use “hybrid mode” to resolve the annotations (as dataclass will do), then yes, PEP 649 will help. It will effectively let dataclass ignore the fact that one of the annotations is referencing an object that doesn’t exist in that module at runtime.

Again my mind goes to boundary cases. Consider this extremely contrived example:

from __future__ import annotations
from typing import *
x = int
def outer():
    x = str
    def inner(a: undefined[x]):
        pass
    return inner
inner = outer()
ann_hybrid = inspect.get_annotations(inner)
undefined = List

If we now do your proposed “evaluate the ForwardRefs”, and we don’t handle closures correctly, we’ll silently produce the wrong result. We’ll find the module-level definition of x and use that.

I concede that this scenario seems wildly contrived. I have no idea if it’s analogous to real-world code out there anywhere. Still: that result seems unacceptable to me. I’m a firm believer in doing things the “right way”, particularly when writing code for the Python standard library. It seems to me that either we should produce predictably correct results, or we shouldn’t add it to the library in the first place. Adding code to the library that can get it silently wrong strikes me as awful.


Oh! While writing this, I realized: I guess we already do exactly that! typing.get_type_hints will attempt to resolve ForwardRef objects back into real values, using the globalns and localns passed in, and with absolutely no support for closures. So let’s consider a slightly simpler example:

from __future__ import annotations
import typing

x = int
def outer():
    x = str
    def inner(a: x):
        pass
    return inner
inner = outer()

print(typing.get_type_hints(inner))

This produces List[int], which is wrong. If we comment out the from __future__ import annotations, this produces List[str], which is correct.


I’ll make two proposals, and we can discuss which one we want to do (or neither).

First proposal: “Make ForwardRef Evaluate Itself”. In this, we add a reference to the appropriate globals and locals to the ForwardRef, which we populate when we create it. We change ForwardRef._evaluate so you don’t pass in the types, which we can do as it’s an internal interface. This gives ForwardRef a much better chance to be evaluated correctly, as you suggest. But it still won’t handle closures.

Second proposal: “Add Closure Support To ForwardRef. This does everything in “Make ForwardRef Evaluate Itself”, but it also adds references to the closure information when present. ForwardRef._evaluate uses that information when evaluating the string.

I don’t know how important any of this would be in a post-649 world where we had the three rendering formats for inspect.get_annotations and typing.get_type_hints. So I don’t have a strong preference overall. But if we do want one of these two proposals, my preference is for “Add Closure Support To ForwardRef. The code I proposed to handle correctly evaluating closures seems awful and clunky, but I think it’ll work correctly, which is a big improvement over silently producing the wrong result. (And maybe someone else will suggest something more elegant!)

3 Likes

I’m happy with “add closures support to ForwardRef.” I think it will rarely be needed, and I haven’t put in the time to fully convince myself that it will always do the right thing, but I think it will, and it will clearly do better than the existing hacks in typing for evaluating string annotations. There doesn’t seem to be a strong reason not to do it, when we know it will allow handling cases correctly that would otherwise give wrong results.

1 Like

Thanks, Carl, but hmm. I was hoping for more folks to weigh in. I know, how about two more polls!

Should PEP 649 do anything to make ForwardRef work better? (Remember, "Add Closure Support to ForwardRef is a superset of “Make ForwardRef Evaluate Itself”.)

  • No, leave ForwardRef._evaluate like it is.
  • Yes, “Make ForwardRef Evaluate Itself”.
  • Yes, “Add Closure Support To ForwardRef”.

0 voters

Should PEP 649 change ForwardRef.evaluate to a public API?

  • No, ForwardRef._evaluate should stay private.
  • Yes, there should be a public method on ForwardRef objects that attempts to evaluate them.

0 voters

2 Likes

I’ve voted, but I’ll be honest, I don’t ever expect to use this API, so my interest is almost entirely theoretical. I voted for “Add closure support” mostly on the “if we’re going to do it, let’s do it right” basis. And I voted to keep _evaluate private mostly because I couldn’t remember the arguments about that question, and because it’s easier to make something public later than to make it private later.

Feel free to ignore my votes if uninformed opinions from people just watching the thread for interest are of no use to you :slightly_smiling_face:

No worries, Paul. After all, where would Python be without uninformed opinions swaying the design of the language! :wink:

3 Likes

It’s been several days since the last vote on the second poll came in, a couple days more than that since the last substantive discussion, and thirteen days since I started the topic. I think it’s about time we bring this discussion to a close. I’m gonna keep it all open for one more day, just to make it a nice round two weeks. So, tomorrow, I’m gonna declare this thread concluded, close the polls, and proceed with my revising. Last looks, everybody!

1 Like

I have also voted for private implementation for ease of changes in future. Keeping it as implementation detail will be much better for this purpose.

But on the other hand… Public evaluate could help type checkers for example if I understand correctly.