PEP 709: one behavior change that was missed in the PEP

thomas · May 15, 2023, 7:43pm

(Speaking for the SC, although the text is all mine and I apologise for not having the time to make it shorter.)

Given the newly uncovered change in semantics, the SC feels we need to unaccept the PEP and revert the implementation (at least for 3.12). It’s not so much about the likelihood of code breaking as it is about creating more confusion about scopes in an already muddled area (class bodies). The (realistically very small) performance win doesn’t seem worth it.

The SC actually discussed this at length, and apart from the decision to unaccept the PEP (for now) we haven’t really drawn any conclusions. We want to know what the community thinks, really. So here are some ideas and questions we discussed, for everyone here to consider as well:

Does the change of semantics (of comprehensions in class bodies) make sense? Unrolling comprehensions into for loops is easy to explain, but the way the loop variable is handled makes it more complicated than that. Class bodies are already subtly different, and we can’t reasonably change that (looking up names in the class body from methods would be dangerously confusing). Does it make more sense for comprehensions to behave like for loops, or like functions?

If the new semantics make sense, can we reasonably change the semantics without a transition period? That isn’t just about breaking existing code but also about how discoverable the error is if you run code that relies on the new behaviour in an older version of Python.

If we need a transition period, what would that look like? In this case, I think we could have the compiler detect the problematic cases (comprehensions in class bodies) and require a future import to enable the new semantics or emit a warning that the semantics will change. Considering how rare affected code appears to be, the future import and warning would probably be very rarely seen.

Alternatively, if the new semantics don’t make sense (or aren’t really desirable enough to warrant the change), what about just not doing the optimisation in that case? (That’s assuming it’s a detectable case.) That would mean comprehensions in class bodies would have more frames in a traceback than comprehensions elsewhere. Is that too confusing to users?

And if we’re talking about not performing the optimisation in class bodies, what about not performing the optimisation in other places where there are end-user noticeable effects, specifically when the loop variable clashes with a function-local variable? (That might make the implementation of this PEP easier, since it does not have to do any special handling of the loop variable at all.) This would exacerbate the traceback-difference question even more, since it would mean changes in a different part of the function (e.g. using the same loop variable for an unrelated loop) would change whether the comprehension is optimised and thus the traceback.

That does leave open the question of what ‘end-user noticeable effects’ should include, though. Clobbering variables would obviously be right out. The presence of frames in a traceback? What you see in a debugger? What locals() returns? The SC accepted that those things can change (by accepting PEP 709). Does it get more problematic if they only sometimes change, though?

In all of that, does it matter how big the impact of the optimisation is? PEP 709 has only a relatively small impact on real code. Would the answers to any of the above questions change if we’re talking about, say, a 10% overall performance benefit? And at what point do the optimisations become disruptive enough, especially if it isn’t clear when they are or aren’t applied, that we need a way to disable them altogether, for debugging if nothing else?

(FWIW, if the semantic differences can be resolved in the next couple of days, the SC is not opposed to considering an updated PEP for 3.12… but it’s getting really tight, and considering the relatively small size of the performance win it’s probably a better idea to take a few more weeks and land it early in 3.13 instead.)

carljm · May 15, 2023, 8:11pm

Thanks Thomas!

I’ll take some time to consider the scoping questions (and am interested to hear others’ thoughts as well.) I do want to ask some questions about the characterization of the performance benefits as “relatively small” – I don’t quite understand where that description comes from, or what it is being considered relative to?

The PEP results in an individual comprehension running up to 2x faster (2x is the best case scenario when the comprehension does very little work besides the iteration), and shows an 11% improvement on a real-world benchmark making heavy use of comprehensions. Like any optimization targeted at a specific language feature, the win is highly dependent on how much that feature is used in hot code paths, so it can range from zero up to better than 10% in realistic code.

In my experience with optimization work, I would call this a quite significant win, for a compiler optimization of a specific language feature. So I’d be glad for any clarification from the SC on what specific points of comparison factored into the “relatively small” description.

I’m not sure what the definition of “overall” is here. For a comprehension-heavy workload, the optimization is already in the 10% range. For a no-comprehensions workload, the optimization benefit is of course zero.

Jelle · May 15, 2023, 8:33pm

Note that we can’t detect all such cases statically, because class scopes are not statically determined, as new names could be injected by e.g. the metaclass. It’s no doubt extremely uncommon, but not impossible.

What do you mean by “resolved” here? If we can preserve PEP 709 inlining while removing the behavior change, would that be acceptable?

pf_moore · May 15, 2023, 8:36pm

I’m disappointed that this improvement has been delayed. I’m not at all sure I understand why the “confusion about scopes” is so bad. The scoping implications of PEP 595 seem far more confusing to me, but that PEP is not being unaccepted.

As community feedback, my personal view is that the change of semantics is perfectly fine, and actually reflects what I’d prefer to be the semantics. As far as behaving like for-loops or functions is concerned, I don’t tend to think in either of those terms (comprehensions are just comprehensions) but I think I’d say I prefer “like for-loops” in the sense of not creating a new scope. The extra scope always felt to me like a necessary evil, rather than an expected part of the semantics (in practice, it rarely made a difference, of course).

I’m basically fine with not needing a transition period. The review that’s just been done indicates that code where any of this makes a difference is vanishingly rare.

thomas · May 15, 2023, 8:43pm

Carl Meyer:

I do want to ask some questions about the characterization of the performance benefits as “relatively small” – I don’t quite understand where that description comes from, or what it is being considered relative to?

The PEP results in an individual comprehension running up to 2x faster (2x is the best case scenario when the comprehension does very little work besides the iteration), and shows an 11% improvement on a real-world benchmark making heavy use of comprehensions. Like any optimization targeted at a specific language feature, the win is highly dependent on how much that feature is used in hot code paths, so it can range from zero up to better than 10% in realistic code.

In my experience with optimization work, I would call this a quite significant win, for a compiler optimization of a specific language feature. So I’d be glad for any clarification from the SC on what specific points of comparison factored into the “relatively small” description.

thomas:

Would the answers to any of the above questions change if we’re talking about, say, a 10% overall performance benefit?

I’m not sure what the definition of “overall” is here. For a comprehension-heavy workload, the optimization is already in the 10% range. For a no-comprehensions workload, the optimization benefit is of course zero.

It’s primarily our interpretation of the part of PEP 709 that says “Other benchmarks in pyperformance (none of which use comprehensions heavily) don’t show any impact outside the noise.” “Overall” means “for programs in general”, or “for typical Python code”. The fact that the benefit only shows up in one macro benchmark is what makes us consider it, in general, a minor performance win.

guido · May 15, 2023, 8:44pm

See my message from May 11 in this thread on scoping. I might add that Python’s scoping rules are already very complex to explain fully. Never mind that they are easy to explain in approximation – the same is true for all modern languages.

The question, “which X gets used in [X + i for i in …]” is easy to answer exactly in Python 3.0 through 3.11: pretend the comprehension is a function and determine which X gets used in that function; and the same rule applies to all comprehensions as well as to generator functions. Yes, there’s a wrinkle if the whole thing appears in a class scope, but the answer is still exactly “what do functions do”. That’s what PEP 709 breaks.

PEP 709 adds at least two wrinkles here:
(a) distinguishing between generator expressions and list/set/dict comprehensions;
(b) doing away with the anonymous function altogether in the latter case.

Moreover it is not entirely clear whether the changes are mandatory (part of the language spec) or optional (leaving other implementations free not to bother with this optimization).

I agree that it’s a tough call, given the obvious perf benefits. But I do worry about making the scoping rules more complicated.

thomas · May 15, 2023, 8:45pm

That would certainly resolve it, yes. I think the complexity of the solution also matters. (Hypothetical answers are always tricky ;P)

jeanas · May 15, 2023, 8:54pm

Perhaps this is very far-fetched and/or not feasible at this point, and it’s obviously a far-reaching change, but has there been prior discussion on “just” making functions see values from the enclosing class scope?

(I know a very skilled programmer whose first reaction to the word “Python” in a discussion about PLs was “Python has scoping rules that I don’t understand”.)

ntessore · May 15, 2023, 8:56pm

As far as the evidence presented here goes, every case affected by this PEP already assumed the scoping as of this PEP.

The existing scoping rule “comprehensions run as if they were functions” is perfectly clear, but so is “comprehensions run in the enclosing scope”. However, the existing rule is far trickier to apply, as shown by the examples in this thread. I would like offer two more; imagine a learner trying to draw where exactly the boundary of the inner scope of the comprehension currently is:

items = [1, 2, 3]

class A:
    items = [4, 5, 6]
    pairs = [(i, j) for i in items for j in items]

class B:
    items = [4, 5, 6]
    selected = [i for i in items if i in items]

So yeah, the change does make sense, I believe. Except there’s now a divergence in behaviour between comprehensions and generator expressions, which I personally would consider a reason to hold off on the change until generator expressions follow the new rules as well.

Jelle · May 15, 2023, 8:58pm

That could lead to confusing bugs when people forget to use self. Consider a class like this:

class Cls:
    @property
    def some_property(self): ...

    def some_method(self):
        do_something_with(some_property) # oops forgot to use `self.some_property`

Currently that would fail with a NameError, but if functions could see names from their enclosing class scope, we’d suddenly pass the property object into do_something_with().

carljm · May 15, 2023, 9:01pm

I see. I think rather that this is an indication that pyperformance is in many aspects not very representative of modern Python code. For many types of optimizations this doesn’t matter; for optimizations focused on certain language features, it does.

But of course it’s also possible that “most” Python code in the world is also not “modern Python code.” This is what makes optimization work for CPython itself so tricky: it’s very hard to know what constitutes a “typical” Python workload.

gpshead · May 15, 2023, 10:13pm

We were also discussing things along the lines of potential simplification that’d also remove the potentially confusing or inconsistent behavior cases:

ie: What if this optimization were only enabled on list comprehensions appearing within function def bodies? (meaning not within class bodies or global scope)
and: Not doing it when a local variable within the function exists that matches a comprehension loop variable name. (that’d simplify dealing with the possibility of an outer local existing?)

Meaning the existing status quo slower frame bytecode is generated in those detectable scenarios, but the faster no-frame bytecode is emitted otherwise. Could that wind up as a simpler implementation that still achieves the bulk of the performance win without introducing an observable difference in behavior aside from the additional frame not always being present if someone is poking into frames or examining tracebacks in detail?

I don’t expect application list comprehension execution time to be dominated by global scope or class scope list comprehensions. Most program logic happens within function bodies.

carljm · May 15, 2023, 10:22pm

The initial version of PEP 709 did specify inlining of only comprehensions within function bodies, since those are likely to be the performance-sensitive ones. I added inlining of module- and class-level comprehensions later on, because it turned out to not be too hard to do, and it seemed like it offered better consistency of behavior for e.g. tracebacks and locals(). But it would be easy to go back to only inlining within functions, if preserving existing name-visibility rules in class-body edge cases is more important than consistency of tracebacks and locals().

I would not feel good about conditionally skipping inlining based on name collisions. It would certainly simplify the implementation, but I believe the implementation already does a good job of handling isolation, including in cases of name collisions, and AFAIK no scenarios have been raised where the PEP changes behavior due to a name bound both inside and outside the comprehension. I don’t believe that changing the name of a variable in a function should cause visible changes to the performance or behavior of a comprehension in that function.

carljm · May 15, 2023, 11:00pm

Would eliminating inlining of module and class scope comprehensions (which would of course eliminate this change to class scope visibility rules) also count as “resolved”?

This one is not hypothetical in terms of the implementation; because this was the first version I implemented, I already know for sure that this would eliminate complexity from the current implementation, without adding any new complexity.

What it would add is observable inconsistency between function-scope and class/module-scope list/dict/set comprehensions in terms of the traceback and locals() behavior – but this inconsistency already exists with generator expressions.

guido · May 15, 2023, 11:40pm

You mean, that already exists in your PEP 709 implementation, right? AFAIK there is no such inconsistency in the 3.11 (or current main) behavior.

FWIW, whatever my objections are about, they are not about the appearance or disappearance of traceback entries. Those are fungible in other situations.

carljm · May 15, 2023, 11:45pm

Right, I meant in the PEP 709 implementation. PEP 709 implementation was already merged into main last week, so the observable difference between tracebacks for genexps vs list/dict/set comprehensions is already there in main (but does not exist in 3.11.)

thomas · May 16, 2023, 2:00pm

I think a majority of the SC is fine with a discrepancy on when comprehensions get their own frame in tracebacks as a consequence of optimisations, so I think yes, I think that would resolve it for the SC. I will point out, though, that the SC still tries to go by consensus among core developers, which is why the short timeframe for 3.12 is a concern: it’s not a lot of time to discover actual consensus. (However, there’s still time after 3.12b1 to roll it back if it turns out we’re misjudging either the impact or the consensus among core developers.)

Jelle · May 16, 2023, 2:33pm

I put up a PR that preserves inlining while restoring the old behavior for class scopes: gh-104374: Remove access to class scopes for inlined comprehensions by JelleZijlstra · Pull Request #104528 · python/cpython · GitHub. The change is reasonably simple, so hopefully this is enough to resolve the problem.

jsbueno · May 17, 2023, 12:28pm

But obviously the “see the class scope” is only while the class scope exists - it ceases to exist after the class is defined (and under PEPs 649, and maybe (optional??) 595 it may continue to exist, but it is “hidden” from ordinary runtime) - but the changes of “functions being able to see the class scope” only affects functions called while the class body is being executed (which affects generator expressions and non-PEP 709 comprehensions) - an ordinary instance method like in your example should not “see” an un-existing scope, after the class is created and instantiated, of course.

jsbueno · May 17, 2023, 12:36pm

and, while I am here, maybe this PEP benefits Python more for unweirdfying the current way comprehensions work inside class bodies, than for the performance gains per-se.

I’ve been hit a couple times by writing a comprehension which could not access the class scope - and the workaround, while easy to write for one acquainted to the language, might be a blocker for a newcomer
(

class A:
    x = [1, 2, 3]
    def doubler(x):
         return [i * 2 for i in x]
    x2 = doubler(x)
    del doubler

VS. post pep 709:

class A:
   x = [1,2,3]
   x2 = [i * 2 for i in x]

It is literally a no-brainer.

While accepting this would mean that generators would behave differently for now, my personal impression is that generators are less used/useful than comprehension inside class bodies, where one generally wants to pre-compute some values that will be available as class-attributes - and that is usually eager.

If this turns out to be so good, or the need to unify behaviors with generators arise, in a post-PEP 649 Python, a generator could make use of the class-body closure which will need to be kept around to have the same semantics.