Inlining comprehensions -- what are the compatibility constraints?

That sounds like a plan. The compatibility trade-offs concern me as a matter of principle — things get ever so much messier, debuggers and other tooling must learn about it, etc.

2 Likes

stacklevel. The result of warnings.warn(stacklevel=...) in a comprehension (directly or indirectly) will be changed if inline the code of the comprehension.

To me, the largest concern about this idea is that it will add a completely different path in the compiler, and it will be much more complex (and therefore less maintenable) than the code for generator expressions.

3 Likes

There’s a solution in the works for that specific warnings.warn() API flaw: Migrate off of stacklevel=. Counting runtime stack frames above you which can change based upon language VM implementation details just as well as code refactoring happening elsewhere to determine where a warning should show up attached to in the call stack was always a fragile hack.

My PR #100840 adds a skip_file_prefixes= alternative enhancement to stacklevel=.

How common do we expect code that relies on a specific stack depth to be? I don’t think that is a good assumption to make. I’d never have assumed a comprehension created a new frame myself, any reliance on that would’ve been something unfortunate by observation because it appeared to be implemented that way.

This is not important to me as worded (just the traceback question)… the larger question of if removing an implicit frame counts as breakage remains a compatibility question but I’m not intuitively sure it should. Code gets refactored all the time in ways that changes its stack profile.

1 Like

This is true, but it seems unlikely in practice to matter. Typically stacklevel is used in eg a deprecated API to ensure the responsible caller whose code needs to be changed is correctly identified by the warning. In such a scenario this change would typically mean that the comprehension-containing function, on the line of the comprehension, is identified as responsible, rather than the comprehension itself. That seems… totally fine?

I can construct a scenario where it might cause wrong results, but it requires imagining a deprecated API that is always called through a comprehension in the same library, so the stacklevel is intended to bypass the comprehension (and probably another frame or several) to reach client code. I don’t think I’ve ever seen such a case, and it’s already inherently fragile to refactoring (and raises the question why the warning wouldn’t be raised at an api layer closer to client code in the first place) but I suppose it could happen.

That’s a reasonable concern, thanks for the feedback! I suspect the implementation will not be quite as complex as you suggest, and it seems that we’ve recently become more open to accepting some complexity in the interest of performance, but I think this is a trade off that has to be evaluated concretely with actual code and performance results. So I appreciate all the feedback in this thread, and I think the next step is for me to come back with both of those.

3 Likes

Had a chat with Carl and another approach is to hoist the comprehensions to top-level functions. This is nice because:

  • No more function object allocation on each comprehension
  • You can promote freevars to function parameters
  • Keeps frames the same, etc

It does come with some caveats:

  • It pollutes the global namespace
  • Can’t optimize comprehensions that write to cellvars
  • Function calls are still expensive, unless some external force makes them cheaper (like a JIT)
1 Like

If I understand the situation correctly, I think you are saying that:

[func(x) for x in items]

would overwrite or create a global x. Is that correct?

If so, I think that’s much worse than merely changing the traceback slightly in the event of an exception.

How does the performance loss due to the cellvars compare to the performance gain from avoiding the comprehension function object?

Are the function calls more expensive than they currently are? If not, I think that’s a moot point.

1 Like

Ah, no. What I mean to say is that

def foo(x):
  return [item for item in x]

might create a new function called <_comprehension_hoisted_0> or similar that is called from foo, as if written as:

def <_comprehension_hoisted_0>(x):
  result = []
  for item in x:
    result.append(x)
  return result

def foo(x):
  return <_comprehension_hoisted_0>(x)

and then <_comprehension_hoisted_0> is in globals() of the module.

1 Like

Ah, that’s not as bad as I thought, but it’s still pretty invasive,
especially for those who use lots of list comps, and even more so for
anyone that processes globals() looking for functions.

If you have a bunch of list comps that get run once, their hoisted
functions will live forever, in globals.

2 Likes

I think the hoisting approach in some ways ends up pretty similar to the “just optimize out the new function object and don’t change anything else” approach (you’re still making a call but you don’t have to create a new function object each time), but I don’t really like it in terms of user expectations. I think it’s somewhat surprising that [x for x in y] currently creates and calls a nested function; it would be arguably less surprising if it didn’t, and it would be much more surprising if it not only created a function but also hoisted it to module global level under some auto-generated name. Since these are observable, not transparent optimizations, this is a relevant factor.

4 Likes

It’s also possible to do some kind of CALL_COMPREHENSION opcode that instead pulls from a tuple of function objects on the module or code or something like that, but at that point we’ve left the realm of only doing compiler work.

I think it’s somewhat surprising that [x for x in y] currently creates and calls a nested function;

I don’t think the nested function part is in and of itself critical.
But I do think that list comprehensions should run in their own scope,
as if it were a nested function. If there is some other way to get the
same effect without actually having the expense of creating a function
object and calling it, that would be grand.

Silly question: since every function object wraps a code object, could
you just eval the code object without worrying about constructing the
function wrapper?

When comprehensions first came out, I don’t think that I would have
predicted one way or another whether the loop variables inside the
comprehension would leak out to the caller’s scope. And if I recall
correctly, in Python 2 they did leak.

But now that they don’t, I wouldn’t want it any other way. A
comprehension should run in its own scope, isolating its loop vars from
the caller. Changing that would probably break a lot more code than
changing the traceback.

Thoughts?

1 Like

What if the hoisted function was attached to the current function, and only stored in globals if the comprehension was run in the top level of the module?

spam = [x for x in something]

def eggs():
    aardvark = [x for x in something]

The “spam” comprehension would be hoisted to globals, because there’s nowhere else for it to go. But the “aardvark” comprehension could be hoisted to a field in the eggs function object, say, eggs.__inner__ could hold a tuple of such hoisted functions. Or in eggs.__code__.co_consts.

Since most comprehensions are inside functions, that would minimise the pollution of globals.

2 Likes

Yes, this is possible and discussed above.

For both “don’t bother with a function object” and “hoist/cache the function object somewhere,” the sticky bit is closures, which (as discussed above) are not uncommon in comprehensions. The closure is kept on the function object, and should be different each time. Options here include adding an f_closure pointer to the frame, so we don’t need to access the closure off of f_func (this is discussed above), or Max’s suggestion that the compiler detect whether the comprehension writes to the closure (should be much more rare than just having one) and if so, don’t optimize at all, if not, convert closed-over vars to parameters.

I’m slightly more inclined to add f_closure to frames if we don’t go for full inlining, because hoisting/caching function objects and having to detect writes to closures in the compiler sounds like it will add more complexity. But I think these questions are at a level of detail that’s best answered by writing the code (and testing the perf impact), which I’m hoping to find time for still this week.

Yep, I think my OP is already pretty clear that this will not change. When I mused about what best matches user expectations, I’m talking about the other ephemera of “nested function,” like the comprehension showing up as a separate frame in tracebacks. Maybe the expectation that only functions can create scopes is strong enough that if comprehensions create scopes, they should also carry all the trappings of a call to a nested function. Or maybe if comprehensions did act as their own scope, but didn’t show up in tracebacks, nobody would ever think twice about it and it would actually be slightly more convenient.

Just noticed there is a long-standing (and long-closed as “too hard to implement”) issue about this debugging annoyance: list comprehensions don't see local variables in pdb in python3 · Issue #65360 · python/cpython · GitHub

1 Like

Thanks for digging up that issue, I’ve reopened it.

2 Likes

I’ve opened two (mutually exclusive) PRs in this area for comparison. The first adds a new opcode dedicated to “calling” a comprehension in streamlined fashion (without creating a new function object), and the second fully inlines list/dict/set comprehensions into their containing function, with added manipulation of stack/locals to provide the same isolation of comprehension “locals” without requiring even a new frame.

The second PR changes observable behavior in the ways I mentioned in my OP: calling locals() inside a comprehension will show also variables from the outer scope, and tracebacks will not list a separate frame for comprehensions. There were zero changes required in the standard library and test suite to accommodate these behavior changes. (Some changes were required in disassembly test cases where comprehension bytecode was directly checked.)

For a simple micro-benchmark ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]', the first PR gives approximately a 25% speedup and the second gives a 67% speedup.

Neither PR is currently able to break through the (significant) noise in the full pyperformance benchmark suite. Code inspection suggests pyperformance is currently lacking in benchmarks that heavily exercise comprehensions, and in internal workloads with more recently-authored code we’ve seen noticeable gains from comprehension inlining, so I think this is mostly just a gap in pyperformance. I’m planning to add a benchmark to pyperformance based on real-world code using comprehensions that can give a somewhat more realistic idea than the super-micro-benchmark above.

Tentatively though, I think results so far suggest that full inlining may be worth the minor behavior changes. (Especially considering that it can also unlock some further optimizations.)

12 Likes

Thanks all for the helpful feedback! I’ve written a short PEP for this proposal; discussion thread at PEP 709: Inlined comprehensions

3 Likes

I’ve found entering interactive mode is the only way to be able to do comprehensions when debugging with pdb then exit and continue on the pdb session

I had a thought this morning: I don’t think it matters if there are closure variables and there is a hoisted function—even if it writes to them. I think the compiler can still treat this the same as now and pass in a cell object, which the hoisted comprehension can read from or write to.

After some offline discussion with Carl Meyer, I now think this is more in the realm of “gross hacks” and should not be too seriously considered.