Generator expressions behavior with non-iterable objects

I’ve started some discussion related to this topic on Oct 2024 here:

There was a crash with these three lines of code:

g = (x for x in range(10))
g.gi_frame.f_locals['.0'] = range(20)
list(g)

There were various suggestions for fixing this crash.
Mark Shannon (@markshannon) comes with this solution (gh-125038: Crash after genexpr.gi_frame.f_locals manipulations is fixed by efimov-mikhail · Pull Request #125178 · python/cpython · GitHub),
(gh-125038: Crash after genexpr.gi_frame.f_locals manipulations is fixed by efimov-mikhail · Pull Request #125178 · python/cpython · GitHub):

Currently, the bytecode compiler assumes that local variables in generator expressions cannot be modified in.
This is no longer true, but it remains true that the evaluation stack cannot be modified.

The bytecode for (x for x in ...) starts with:

      0 RETURN_GENERATOR
      2 POP_TOP
      4 RESUME                   0
      6 LOAD_FAST                0 (.0)
>>    8 FOR_ITER                 6 (to 24)

Compare that with the code for for x in ....:

      0 RESUME                   0
      2 LOAD_FAST                0 (...)
      4 GET_ITER
>>    6 FOR_ITER                 4 (to 18)
      10 STORE_FAST               1 (x)

Note the additional GET_ITER as the bytecode compiler doesn’t know what ... is.

I think adding the GET_ITER before the FOR_ITER would gain the robustness required in this case, without complicating the iteration code.

We also want to avoid emitting GET_ITER twice. It is currently emitted in the scope surrounding the generator expression.
We should move it into the generator expression.

In other words, this function:

def f(s): return (x for x in s)

currently generates the following bytecode:

  1           RESUME                   0

  2           LOAD_CONST               1 (<code object <genexpr> at 0x7fc1c815e6a0, file "<python-input-0>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST_TEMP           0 (s)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7fc1c815e6a0, file "<python-input-0>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
       L2:     FOR_ITER                 6 (to L3)
               ...

I think it should generate:

  1           RESUME                   0

  2           LOAD_CONST               1 (<code object <genexpr> at 0x7fc1c815e6a0, file "<python-input-0>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST_TEMP           0 (s)
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7fc1c815e6a0, file "<python-input-0>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

This is a behavior change, but I think a correct one. If f(2) is called, we get a TypeError, but this seems wrong. The error should occur once the generator expression is iterated over.

One key thing that was used in solution for 3.13 is fact that GET_ITER is idempotent (gh-125038: Crash after genexpr.gi_frame.f_locals manipulations is fixed by efimov-mikhail · Pull Request #125178 · python/cpython · GitHub):

As @Jelle points out, GET_ITER is idempotent, so we can drop CHECK_ITER and use GET_ITER instead.

For 3.14, we can then drop the GET_ITER in the caller.

As we can see from this issue, someone can write a code which reacts on GET_ITER as it is NOT idempotent:

There were comment by Mark Shannon (see Regression of 3.13.1 with iterator creation being duplicated · Issue #127682 · python/cpython · GitHub):

The problem is specific to generator expressions which have acted in a subtly different way from generators that (almost) no one noticed.

For a generator, any iteration occurs once the generator is executed. For generator expressions creating an iterator from the iterable happened when the generator expression is created. However there was no check that the iteration variable held an iterator when the generator expression executed, which could lead to a crash in exceptional circumstances.

This can be seen by disassembling this function:

def f(seq): return (x for x in seq)

Up to 3.13

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
       L2:     FOR_ITER                 6 (to L3)       # This is unsafe if .0 contains a non-iterator
               ...

Now:

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

What I would like:

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

I think we should make generator expressions behave exactly like generators. It seems surprising that they would not.

And solution that he was provided:

Another related issues and PRs:

In fact, today we have slightly different behavior for 3.13, 3.14 and main/3.15 branches.
One can find an arguments for some ways from this comment and beyond:

We have to decide, which behavior is best for 3.15+.
Should we raise TypeError instantly for (x for x in None) or it’s better to raise TypeError only when this generator expression will be iterated over?

There was my another atempt to provide better solution here:

But there was not enough contention and that PR was closed by myself.

P.S. I’m not sure about topic organization and hope it’s not very hard for reading.
I’ll be happy to improve it and provide any needed information.

A little joke.

5 Likes

Looks like changing the behavior would require changing the language reference — and the current documented behavior actually feels more Pythonic, in line with the Zen’s “Explicit is better than implicit” and “Errors should never pass silently.”

Even though it’s not consistent with generator functions, it is consistent with list, set, and dict comprehensions like [x for x in None] and {x for x in None}, which are visually and semantically similar. Importantly, there are no other Python expressions that use the same comprehension-style syntax and defer evaluation of the iterable — this makes immediate error raising in generator expressions both expected and consistent, so probably restoring the original behaviour is the only way to avoid a semantic trap.

2 Likes

As reported by me in the issue tracker, the iterator is created immediately, but apparently only for checks, discarded and later created again and that one gets used.

I wouldn’t be against creating it sooner, I am against doing it twice. That was an incompatible change, I would contend a regression. And contractions don’t do it twice, so there is no harmony. Generator expression always had the iterator value stored as .0 at the time it was created and not the code to create it.

I do not understand the code well enough, but not re-using the iterator post the checks is wrong in my book.

3 Likes

As was pointed out in one of the github issues, errors here can pass silently. Only the leftmost “for” in a generator expression is eagerly checked for iter, making this a place where the behavior is inconsistent even within a single type of expression.

Python 3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ((i,j) for i in range(3) for j in False)
<generator object <genexpr> at 0x000002C4B0B483C0>
>>> ((i,j) for i in False for j in range(n))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'bool' object is not iterable
>>>

I agree with not creating an iterator just to check this, destroying it, and then creating another iterator. I can only see that as error prone (and it did create real issues in working code)

Personally, I’m fine with the behavior here changing to only create the underlying iterator when the generator expression is iterated so long as it happens not in a patch version and the behavior is documented. Such a change seems more inline with the expectation that generator expressions are lazily evaluated. (from what I saw, the issues changing this in a patch version caused were related to libraries that care about bytecode changes, such as numba)

2 Likes

I’m not sure that removing the exception is a good idea. Projects with poor test coverage could end up with serious but silent issues. Maybe we could roughly estimate how many projects rely on the current behavior using publicly available source code if we are eager to change this.

That’s a good point — despite it being mentioned in the language reference, this is inconsistent with comprehensions as well — so probably better to make it fully consistent with them by adding more exceptions, or go ahead with treating it as syntax sugar only.

1 Like