A massive PEP 649 update, with some major course corrections

A little peek behind the curtain for you. This discussion was borne out of a lot of thinking about 649, and a massive multi-week email thread between four people (but mostly volleys between Carl and myself). We proposed a lot of things, and opinions differed; some ideas were abandoned, others were merely tabled for later. Here’s the most interesting of those alternate ideas, boiled out of that discussion and presented here for your interest… in case you’re not already bored with reading about this!


In my previous big 649 thread, Petr Viktorin proposed we “just store the strings”, rather than reverse-engineer them by running __compute_annotations__ in a “fake globals” runtime environment. That idea has definite merit. I’m not averse to this approach, but tabled it for now for three reasons.

First, we already know we need to support HYBRID format, which itself requires all the same machinery we’d need to reverse-compute STRING format. So, in a way, by doing the work to implement HYBRID format–which we’re already committed to doing–we get STRING format for just a little more work. In contrast, “just store the strings” would require a lot of novel work: instrumenting the compiler to also store the source code for the annotations somewhere we can get at them at runtime, then modifying inspect.get_annotations to return that when asked for STRING format.

Second, there was some worry about the memory consumption of the annotation source code strings, particularly as they’d rarely get examined at runtime. Petr was aware of this, and as part of his initial proposal suggested adding a lazy-loading facility–though this proposal was only in the abstract. Adding a lazy-loading facility to modules is something we’ve been talking about for a while now, but no firm proposal exists yet, much less an implemented solution–in part because we haven’t really needed it. Adding such a facility would probably also have to be be part of this approach, yet right now we don’t know how best to implement it. (I actually had a brainstorm about this a day or two back and posted a new topic outlining my idea. But so far that’s only an idea.)

Third, it’s not clear to me precisely what “just store the strings” means. The obvious meaning is, “store all the text of the annotation, starting immediately after the colon and continuing until immediately before either the comma or the curly-brace that ends the annotation”. But this seems a little strange once you mix in newlines and comments. Consider this code sample:

def foo(a: typing.Union[
    int, # obviously!
    str, # if you think about it, we need this too
    unloaded_module.MonkeyBusiness # surprised? don't be!
    ]): pass

The literal “just store the strings” annotation for this would be

' typing.Union[\n    int, # obviously!\n    str, # if you think about it, we need this too\n    unloaded_module.MonkeyBusiness # surprised? don't be!\n    `

What a mess! Is that really what we want? If so, then okay, this isn’t a real concern. But if we want to “clean it up” before returning it for STRING format, we’d have to figure out what “cleaning it up” meant–how far to take it. Stripping the string of leading and trailing whitespace makes sense. Strip comments? Probably. Convert newlines into spaces, and normalize non-quoted spaces into a single space? Not sure.

I’m not claiming this third concern is a showstopper, just that it’s pretty up in the air at the moment.