Inlining comprehensions -- what are the compatibility constraints?

It’s also possible to do some kind of CALL_COMPREHENSION opcode that instead pulls from a tuple of function objects on the module or code or something like that, but at that point we’ve left the realm of only doing compiler work.

I think it’s somewhat surprising that [x for x in y] currently creates and calls a nested function;

I don’t think the nested function part is in and of itself critical.
But I do think that list comprehensions should run in their own scope,
as if it were a nested function. If there is some other way to get the
same effect without actually having the expense of creating a function
object and calling it, that would be grand.

Silly question: since every function object wraps a code object, could
you just eval the code object without worrying about constructing the
function wrapper?

When comprehensions first came out, I don’t think that I would have
predicted one way or another whether the loop variables inside the
comprehension would leak out to the caller’s scope. And if I recall
correctly, in Python 2 they did leak.

But now that they don’t, I wouldn’t want it any other way. A
comprehension should run in its own scope, isolating its loop vars from
the caller. Changing that would probably break a lot more code than
changing the traceback.

Thoughts?

1 Like

What if the hoisted function was attached to the current function, and only stored in globals if the comprehension was run in the top level of the module?

spam = [x for x in something]

def eggs():
    aardvark = [x for x in something]

The “spam” comprehension would be hoisted to globals, because there’s nowhere else for it to go. But the “aardvark” comprehension could be hoisted to a field in the eggs function object, say, eggs.__inner__ could hold a tuple of such hoisted functions. Or in eggs.__code__.co_consts.

Since most comprehensions are inside functions, that would minimise the pollution of globals.

2 Likes

Yes, this is possible and discussed above.

For both “don’t bother with a function object” and “hoist/cache the function object somewhere,” the sticky bit is closures, which (as discussed above) are not uncommon in comprehensions. The closure is kept on the function object, and should be different each time. Options here include adding an f_closure pointer to the frame, so we don’t need to access the closure off of f_func (this is discussed above), or Max’s suggestion that the compiler detect whether the comprehension writes to the closure (should be much more rare than just having one) and if so, don’t optimize at all, if not, convert closed-over vars to parameters.

I’m slightly more inclined to add f_closure to frames if we don’t go for full inlining, because hoisting/caching function objects and having to detect writes to closures in the compiler sounds like it will add more complexity. But I think these questions are at a level of detail that’s best answered by writing the code (and testing the perf impact), which I’m hoping to find time for still this week.

Yep, I think my OP is already pretty clear that this will not change. When I mused about what best matches user expectations, I’m talking about the other ephemera of “nested function,” like the comprehension showing up as a separate frame in tracebacks. Maybe the expectation that only functions can create scopes is strong enough that if comprehensions create scopes, they should also carry all the trappings of a call to a nested function. Or maybe if comprehensions did act as their own scope, but didn’t show up in tracebacks, nobody would ever think twice about it and it would actually be slightly more convenient.

Just noticed there is a long-standing (and long-closed as “too hard to implement”) issue about this debugging annoyance: list comprehensions don't see local variables in pdb in python3 · Issue #65360 · python/cpython · GitHub

1 Like

Thanks for digging up that issue, I’ve reopened it.

2 Likes

I’ve opened two (mutually exclusive) PRs in this area for comparison. The first adds a new opcode dedicated to “calling” a comprehension in streamlined fashion (without creating a new function object), and the second fully inlines list/dict/set comprehensions into their containing function, with added manipulation of stack/locals to provide the same isolation of comprehension “locals” without requiring even a new frame.

The second PR changes observable behavior in the ways I mentioned in my OP: calling locals() inside a comprehension will show also variables from the outer scope, and tracebacks will not list a separate frame for comprehensions. There were zero changes required in the standard library and test suite to accommodate these behavior changes. (Some changes were required in disassembly test cases where comprehension bytecode was directly checked.)

For a simple micro-benchmark ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]', the first PR gives approximately a 25% speedup and the second gives a 67% speedup.

Neither PR is currently able to break through the (significant) noise in the full pyperformance benchmark suite. Code inspection suggests pyperformance is currently lacking in benchmarks that heavily exercise comprehensions, and in internal workloads with more recently-authored code we’ve seen noticeable gains from comprehension inlining, so I think this is mostly just a gap in pyperformance. I’m planning to add a benchmark to pyperformance based on real-world code using comprehensions that can give a somewhat more realistic idea than the super-micro-benchmark above.

Tentatively though, I think results so far suggest that full inlining may be worth the minor behavior changes. (Especially considering that it can also unlock some further optimizations.)

12 Likes

Thanks all for the helpful feedback! I’ve written a short PEP for this proposal; discussion thread at PEP 709: Inlined comprehensions

3 Likes

I’ve found entering interactive mode is the only way to be able to do comprehensions when debugging with pdb then exit and continue on the pdb session

I had a thought this morning: I don’t think it matters if there are closure variables and there is a hoisted function—even if it writes to them. I think the compiler can still treat this the same as now and pass in a cell object, which the hoisted comprehension can read from or write to.

After some offline discussion with Carl Meyer, I now think this is more in the realm of “gross hacks” and should not be too seriously considered.