Allow comprehension syntax in loop header of for loop

mimre25 · April 16, 2023, 9:00am

Hi,
first time posting here, so please forgive me if I’m doing something wrong.

I have an idea for improving a common for loop pattern by allowing the same/a similar syntax as in generator or comprehension expressions.

Current situation:

I often find myself iterating over an iterable and then just skipping elements that don’t fulfill some requirements. To illustrate this, let’s consider a simple example in which we want to print only the even numbers out of a given list. There are multiple ways to do this right now.

Option 1:

def print_evens(x: list[int]) -> None:
    for i in x:
        if i % 2 != 0:
            continue
        print(i)

This has the conditional plus a continue which can hinder the reading flow to some degree and increase the cognitive complexity a bit.

Option 2:

def print_evens(x: list[int]) -> None:
    for i in (j for j in x if j % 2 ==0):
        print(i)

This uses a generator in the loop header, making the loop header a bit more complex to read.

Proposed Solution:

def print_evens(x: list[int]) -> None:
    for i in x if i % 2 == 0:
        print(i)

This would use a new syntax for the loop header that is exactly the same as we are used to from generator and comprehension expressions. This avoids adding the cognitive overhead of comprehending an additional generator construct or the extra conditional and the continue, at the cost of the same cognitive load any comprehension or generator statement already has.

What do you think of this syntax addition?

abessman · April 16, 2023, 9:11am

Previous discussion on the same topic: A "for ... if" statement

For the given example, you could do

def print_evens(x: list[int]) -> None:
    print(*[i for i in x if i % 2 == 0], sep="\n")

mimre25 · April 16, 2023, 9:29am

I do understand, that there are simpler way to do this for the given example.

But let’s assume you have a list of objects, that you want to loop over but exclude some work on specific conditions.

I just used this example as it’s simple enough to be understandable in 5 LoCs instead of much more

gkb · April 16, 2023, 10:00am

Why don’t you give a name to your generator expression?

def print_evens(x: list[int]) -> None:
    evens =  (j for j in x if j % 2 ==0)
    for i in evens:
        print(i)

In my opinion, this makes the code nice and readable.
You could even add further filters:

def print_evens_lt_5(x: list[int]) -> None:
    evens =  (j for j in x if j % 2 ==0)
    evens_lt_5 = (j for j in evens if j > 5)
    for i in evens_lt_5:
        print(i)

Note that the complexity of the filters is kept out of the loop.

All in all I think that the existing language features are already sufficient and it is not worth it to complicate the loop syntax.

kknechtel · April 16, 2023, 3:22pm

Because this isn’t DRY; the for _ in logic needs to be repeated even though there should conceptually only be one loop. Chaining generator expressions compounds the issue; while it’s nice sometimes to be able to split up the operations, the comprehension syntax allows multiple clauses for a reason, and the repeated j for j in is noisy.

Dutcho · April 16, 2023, 5:41pm

Is this:

for i in filter(lambda n: n % 2 == 0 and n < 5, x): 
    print(i)

better/DRYer?

mimre25 · April 17, 2023, 3:37pm

This shows exactly the issue.

If you don’t want to repeat the for ... in part, you need to bend over backwards to make this happen.

Note that your example

for i in filter(lambda n: n % 2 == 0 and n < 5, x): 
  print(i)

is just more verbose than:

for i in x if i % 2 == 0 and i < 5:
    print(i)

which would be possible with my proposed new syntax.

pf_moore · April 17, 2023, 4:00pm

So to be clear, you have listed two ways of doing this that are currently possible, and you are suggesting that new syntax is worth adding to provide a third way of doing the same thing? The benefits of such new syntax would have to be fairly significant to make this reasonable. Do you have any examples of real-world code that you can link to where the improvement is clearly visible?

To be honest, while it’s slightly verbose, I really don’t think that your Option 2 (a generator in the loop header) is that bad for simple cases. You say yourself that it’s merely “a bit more complex to read”. And for more complicated examples, I’d imagine that factoring out the condition into a separate (named!) function would be necessary for readability anyway.

In practice, all your proposal does is save one line and one level of indentation:

for i in x if condition:
    body

replaces

for i in x:
    if condition:
        body

This is a simpler form of your option 1, which avoids the continue and the inverted logic, if you don’t like that.

I’m sorry, but I’m -1 on this proposal - it simply doesn’t seem to add enough value to be worth it for me. If we’d been designing a new language from scratch, then maybe a “generalised for loop” like this would be natural to consider. But for a language with as many users, and as much established documentation and tutorial material, as Python has, the disruption such a change would add doesn’t seem worth it.

kknechtel · April 17, 2023, 6:13pm

It does seem like a lot of work, but I don’t really understand what you mean about “disruption”. Would such an addition actually break something? Or are we just talking about the extra overhead of documentation, “how to teach this” in the PEP etc.? Because that seems pretty minimal to me, given that we’re talking about a syntax that just parallels the way that multiple clauses already work in comprehensions/generator expressions (alternately: that just telescopes lines of code that already work in the imperative approach, in much the way that elif combines else and a corresponding immediate, indented if).

ntessore · April 17, 2023, 6:41pm

I think what the proposal adds is consistency, the ergonomics are just a fringe benefit.

pf_moore · April 17, 2023, 7:22pm

I didn’t have anything specific in mind, but it’s a common consideration that proposals like this tend to ignore. Specific things that I’d consider under “disruption” include:

Projects needing to update their style guides to cover using the new feature.
Linters and auto-formatters having to be updated to deal with the new syntax.
IDEs and syntax highlighters needing to be updated.
Documentation and tutorials now being perceived as “out of date” because they don’t cover the new feature.
Answers to questions on sites like Stack Overflow being questioned because they didn’t use the new feature, or being superseded by answers that do. User confusion caused by the subsequent debates.
Code examples needing to be qualified with statements like “this only works in Python 3.13 and later, for older versions do the following…”
Projects getting well-intentioned PRs suggesting the use of the new feature, or PRs that simply use it without thinking of compatibility issues.
Users getting confused as to why there are multiple ways of doing the same thing.

To be clear, I don’t think any of these items are particularly significant. But that’s sort of the point - neither is the benefit of the proposal, so even a series of minor inconveniences like this are enough to make the proposal not worthwhile.

… and that’s my point about “if we were designing a new language from scratch”. A consistent, general approach would be much more attractive in a green field design. Trade-offs would be different, having a generalised loop may influence other design choices, etc. But none of that is true in a language with 25+ years of history behind it.

Dutcho · April 18, 2023, 8:41pm

<dream mode>
If that were the case (and purely hypothetical; for avoidance of doubt: this is NOT a proposal, just fond memories of Algol-68 [yes, I’m that old], which partly sprung from the same source as Python did), I’d rather have for i in count() while i * i < n: print(i) than for i in count() if i * i < n: print(i).
Its economy is 2 lines and 2 indents, but more importantly, it adds some expressions currently not directly possible.
</dream mode>
But then, none of this is really necessary, so let’s be happy what we have.

storchaka · April 19, 2023, 7:12am

It is difficult to people to correctly recognize the following examples if both are syntactically valid:

for i in x if i % 2 == 0 and i < 5:
    ...

for i in x if i % 2 == 0 else i < 5:
    ...

And it is difficult to computer to parse it. It needs to backtrack to if after encountering else, and the code after if can be arbitrary complex.

kknechtel · April 20, 2023, 7:48am

… But is that not already true of the existing list comprehension syntax?

Rosuav · April 20, 2023, 8:07am

It kinda is, but the decision was made other direction.

>>> i = 1
>>> [i*i for i in range(5) if i % 3 else range(10)]
  File "<stdin>", line 1
    [i*i for i in range(5) if i % 3 else range(10)]
                                    ^^^^
SyntaxError: invalid syntax
>>> [i*i for i in (range(5) if i % 3 else range(10))]
[0, 1, 4, 9, 16]

I don’t think this would need to be a blocker necessarily, but it does introduce the potential for confusion. That said, though, how many people are ACTUALLY going to write this in their code?

for i in range(5) if some_cond else range(10):
    print(i)

As long as a rule can be defined that makes it unambiguous (which should be fine, given that “else” on its own would be a syntactic issue), I’d be okay with style guides recommending against unparenthesized conditions in ‘for’ loops. That is to say, write this instead:

for i in (range(5) if some_cond else range(10)):
    print(i)

mimre25 · April 23, 2023, 12:29pm

I didn’t have anything specific in mind, but it’s a common consideration that proposals like this tend to ignore. Specific things that I’d consider under “disruption” include:

Projects needing to update their style guides to cover using the new feature.

Linters and auto-formatters having to be updated to deal with the new syntax.

IDEs and syntax highlighters needing to be updated.

Documentation and tutorials now being perceived as “out of date” because they don’t cover the new feature.

Answers to questions on sites like Stack Overflow being questioned because they didn’t use the new feature, or being superseded by answers that do. User confusion caused by the subsequent debates.

Code examples needing to be qualified with statements like “this only works in Python 3.13 and later, for older versions do the following…”

Projects getting well-intentioned PRs suggesting the use of the new feature, or PRs that simply use it without thinking of compatibility issues.

To be clear, I don’t think any of these items are particularly significant. But that’s sort of the point - neither is the benefit of the proposal, so even a series of minor inconveniences like this are enough to make the proposal not worthwhile.

I have to agree that these are valid points. However, the added consistency for “all” `for … in" constructs has the benefits that it avoids confusion of newcomers. Further, I doubt that the overhead for linters/IDEs and style guide updates are that dramatic considering that the syntax is already know for comprehension expressions.

Users getting confused as to why there are multiple ways of doing the same thing.

There are already multiple ways to do “that” thing, and I’d argue that it’s even more confusing that some of them work in some places and in others don’t.

Furthermore, all the arguments above are valid for any new code construct/syntax addition, so people have to deal with it.
Maybe I’m underestimating the effort for all those aspects by a lot, but I don’t see why this would be a blocker for a change that adds consistency for the long run.

b11c · June 1, 2023, 2:37pm

Unaware of this thread I have started another one, and have reached this conclusion:

OK, reading through these threads I realise that the damage has already been done by allowing if expressions to have different syntax in comprehensions from the basic loops. So we can’t implement this idea without breaking backward compatibility, even taking into account that if..else is extremely rarely used in for loops.

mimre25 · June 3, 2023, 1:00pm

Thanks for copying over the conclusion.

I’m wondering, where you spot the backwards incompatibility? Could you shed some light on that please?

b11c · June 6, 2023, 7:57am

Well, “backward incompatibility” is admittedly a bit of a strong phrase; techncally speaking, we can have both syntaxes and they could work; let’s look at two hypothetical examples:

for foo in foos if foo > 123:
    ...

for foo in foos if len(bars) > 123 else bars:
    ...

In the first example, if foo only works because foo was defined in the loop itself. If we try that in the second example (i.e. if we try if..else with the variable defined in the loop) we would get a syntax error.

That being said, from a human perspective, the two syntaxes are similar enough that I would expect them to lead to subtle issues (some of which could be caught by linters, but not all).

mimre25 · June 6, 2023, 1:41pm

Ah I see what you mean.

I haven’t thought about this but that’s a really big concern.

Just to verify I understand you completely.
The first example is compatible to

for foo in (x for x in foos if x > 123):
    ...

and the second example is equivalent with:

tmp = foos if len(bars) > 123 else bars
for foo in tmp:
    ...

I can see how these things can be quite confusing given that it took me a minute to realize the different in your example. Without proper style guidelines there will probably be too many issues stemming from this.

What a shame