Howdy howdy. Let’s talk about PEP 649, Deferred Evaluation Of Annotations Using Descriptors. I think the PEP has an excellent chance of being accepted. However, the PEP needs to be revised, and there are two particular points on which I can’t quite make up my mind. I figured it was best to raise the issues here and see what the community thinks. To that end, I’ve added two polls to the end of this message. Please vote in both!
Let’s start with the simpler question. Should 649 specify that the __co_annotations__
attribute is a supported public interface? Or should it specify that __co_annotations__
is an internal implementation detail? I can see both sides.
On the one hand, Python tends to make these things public. And __co_annotations__
is a reasonable API, one that users would have no trouble working with.
On the other hand, I can’t contrive a scenario in which anyone would want to write their own __co_annotations__
function. For that to happen, there would have to be users who:
• create their own Python code objects and functions by hand, not using the compiler,
• which they’d want to annotate,
• and they’d want to examine the annotations of those objects at runtime,
• and those annotations would have to be lazily evaluated.
I don’t think such a user exists. (But if you are such a user, please! speak up!)
Another consideration: if we declare __co_annotations__
is a private implementation detail today, that gives us the flexibility to change it as needed later, allowing us to more easily accommodate unforeseen future needs. And of course, if we declare it private now, we could simply change our minds and make it public in the future.
Here’s the second, more complicated question. In order to accommodate users who want annotations as strings, we’re going to modify inspect.get_annotations
and typing.get_type_hints
(which I will collectively call library functions computing annotations) to explicitly support that feature. By default, both functions will function as they did in the initial 649 prototype: when the user asks for the annotations (or type hints) for a function that has a __co_annotations__
function, it’ll return the real values. We’ll add a new parameter to the library functions computing annotations that says “return the annotation values as strings”.
How will that work? Initially we planned to use Carl Meyer’s “Stringizer” proposal. In this approach, we’d rebind the __co_annotations__
code object to a custom empty __globals__
dict that supported calling a callback for every undefined symbol (e.g. collections.DefaultDict.__missing__
). That callback function would return a new “Stringizer” object for every missing symbol, which would roughly reconstruct the original string of the annotation. The library function computing annotations would then simply call this rebound annotations function normally, using the real Python interpreter. Our rebound function would compute and return the annotations dict, but all the values in the dict would be “Stringizer” objects. The library function would then walk through the resulting dict, turning these “Stringizer” values into real strings, and finally return the dict with those strings. The result would be very similar to the stringized annotations produced by PEP 563’s from __future__ import annotations
.
How does this “Stringizer” perform its magic? First, every “Stringizer” is created as a mock replacement for a symbol; we tell the object the name of the symbol it’s replacing. Then, the “Stringizer” implements every dunder method exposed by an object called when evaluating an expression, returning an object that represents performing that operation on that name.
Here’s a high-level hand-wave-y example of how this works. Let’s say you have an annotation that reads mymodule.MyType[Int]
, and you want to back-compute the string based on the bytecode. When running the bytecode, Python would first ask the custom globals dict for the symbol 'mymodule'
. The custom globals dict would return a “Stringizer” that knew its name was 'mymodule'
. Then, Python would evaluate the '.MyType'
, which means it would next call the __get_attribute__
method of our “Stringizer”. Our “Stringizer” would override that method and return a new second “Stringizer” object that knew its name was 'mymodule.MyType'
. Third, Python would ask our fake globals for Int
, which would return a third “Stringizer” that knew its name was 'Int'
. Finally Python would evaluate the [Int]
, which would turn into a __getitem__
call on the second “Stringizer”, passing in the “Stringizer” named 'Int'
. This would return a new fourth “Stringizer” that knew its name was 'mymodule.symbol[Int]'
. Finally, Python would store this fourth “Stringizer” as the value in the annotations dict.
I’ve prototyped this, and I’m delighted to report that the “Stringizer” works fine. After all, all __co_annotations__
functions do is evaluate a bunch of expressions, then build a dict out of them and return it. And Python calls dunder methods to compute nearly everything in an expression, meaning the “Stringizer” approach really can handle almost everything.
But there’s an important exception, a part of the language where Python doesn’t call dunder methods when computing an expression: flow control. Although the Python interpreter does consult objects at certain points during flow control—asking an object for its true-ness or false-ness, or asking for an iterator over the object—the actual computation of the expression is done internally in the Python interpreter, out of the “Stringizer”'s control.
Examples of flow control used in expressions:
• Short-circuiting or
• Short-circuiting and
• Ternary operator (the if
/ then
operator)
• Generator expressions
• List / dict / set comprehensions
• Iterable unpacking
The good news was, nobody did this stuff in type hints. So the “Stringizer” approach seemed like it should work great for the Python static typing community in practice.
But that’s changed with the acceptance of PEP 646, “Variadic Generics”. This adds TypeVarTuple
objects to Python, which are designed to be unpacked in type hints using iterable unpacking. At the bytecode level this turns into a tiny loop, iterating over the values yielded by the TypeVarTuple
being unpacked. This is the one sticking point that prevents the “Stringizer” from being viable.
What should 649 do about this? From a high level, it seems like there are two ways we could go.
First, there’s an approach I call “Hard-Code The Stringizer”. This rests on two assumptions, which I think are true, and which I’m hoping the community will confirm (or, regrettably, debunk). First, the only people who care about turning their annotations back into strings are people using annotations for type hints. And second, unpacking a TypeVarTuple
is the only type of iteration we’ll ever see in an annotation. If those are both true, we can simply hard-code the “Stringizer” to assume that one use case. Unpacked TypeVarTuple
objects simply return an “unpacked” form of themselves, which really just prints an asterisk in front of its name in its repr. We code the “Stringizer” so it assumes any call to __iter__
is unpacking a TypeVarTuple
, and return an iterator that yields one of these objects. And if that’s good enough, we’re done, the “Stringizer” becomes viable again.
If “Hard-Code The Stringizer” won’t work, the next step is an approach I call “A Simple Custom Bytecode Interpreter”. And, yes, this involves writing a bytecode interpreter. Maybe that sounds awful—but I’ve prototyped it and it’s not really all that bad. The key insight is that __co_annotations__
functions generated by Python are simple: they evaluate a bunch of expressions, then build them into a dict, and finally just return that dict. Our custom bytecode interpreter would only have to support one statement, return
; all the other work is done inside expressions. In practice it’s not that much more work than the “Stringizer” was in the first place; it does much the same thing, it just does its work based on bytecode dispatch rather than in dunder methods.
I prototyped this too. It works fine, and isn’t even that much longer than the “Stringizer” prototype. Performance is presumably much worse, but hopefully nobody is querying the stringized annotations of objects in performance-sensitive code.
But then again! Let’s quickly revisit my first question, about __co_annotations__
functions being public or private. If we allow users to write their own __co_annotations__
functions, they could potentially set __co_annotations__
to any Python function they like, and the function could use any Python statement they wanted. If the function they write uses bytecodes my simple custom bytecode interpreter doesn’t implement, it would simply fail, presumably noisily.
This suggests a possible third approach, “A Full Custom Bytecode Interpreter”, where our bytecode interpreter would implement every bytecode and actually do a better job of simulating running the __co_annotations__
function. But I assert this isn’t viable in practice. My first two proposals rely on the fact that __co_annotations__
functions don’t do any real work. They’re predictable and simple. So creating mock objects for every symbol works fine when computing the stringized annotations dict. But if you substitute your own arbitrary __co_annotations__
function, our hypothetical full custom bytecode interpreter wouldn’t know which values used by the function should be “Stringizer” objects and which have to be real objects. It just won’t work.
Final random notes:
- In addition to flow control, the “Stringizer” approach also can’t correctly handle an annotation that uses the walrus operator. However, this has already been declared illegal in annotation expressions. So it’s not a concern.
- In the original PEP 649, once a
__co_annotations__
function was called, it was discarded. This was pleasingly sanitary; either an object had an__annotations__
dict set, or a__co_annotations__
function, or neither—but never both at the same time. Now that we’re going to support this “Stringizer” or “custom bytecode interpreter” approach for back-computing strings, objects will retain their__co_annotations__
functions even after they’ve been called, in case the user later requests their stringized form. This could cause a weird situation: if the user modified the annotations dict, then asked for the stringized version, they might be surprised to see that the two no longer match. Bad news: this isn’t fixable. At best, we could notice that the annotations dict was modified and throw an error, but even that seems like it would be too expensive. I think the best option here is to simply document the behavior and categorize it as “consenting adults” stuff. If you’re the sort of person who modifies annotations dicts… good luck! - If you’re thinking “o-ho! the custom bytecode interpreter approach is made way more complicated by the specializing adaptive interpreter work!”, happily no it isn’t. The specializing adaptive bytecode work is all done in secret by the Python interpreter, and you won’t see those funny specialized bytecodes if you examine the bytecode normally (
co_code
on the code object). So the “custom bytecode interpreter” doesn’t need to worry about them. - The library functions computing annotations will also support an alternate “mixed” mode, in which the fake globals dictionary returns real values for names that are bound, and “Stringizer” objects for unbound names. This should work perfectly for use cases like
dataclass
, where all it cares about is whether or not an annotation is an instance ofInitVar
orClassVar
. In this case, unbound values, presumably from modules bound byif TYPE_CHECKING
, will be Stringizers, but they’ll get wrapped with realInitVar
andClassVar
objects. This allowsdataclass
to do realisinstance
tests, rather than manually parsing PEP 563 stringized annotations. It’s great!
Poll 1: should __co_annotations__
be public or private?
-
__co_annotations__
should be an unsupported internal implementation detail, at least for now. -
__co_annotations__
should be an supported public API.
0 voters
Poll 2: what approach should we use to stringize __co_annotations__
functions?
- “Hard-Code The Stringizer”
- “A Simple Custom Bytecode Interpreter”
- “A Full Custom Bytecode Interpreter”
0 voters