Itertools.takewhile but in a list comprehension

Would it make any sense to allow while in a list (or generator, more importantly) comprehension? For example, you can currently filter an entire list or generator like:

[item for item in iterable if item < 10]

But suppose I wanted to do the equivalent of itertools.takewhile, that is consume elements until some condition is met. I feel like it’s succinctly summed up by:

[item for item in iterable while item < 10]

without having to import this function. Given that this function exists already, this idea is probably already a low priority, but I’d be interested to see what people think.

2 Likes

I don’t think it’s worth the cognitive overhead. You would have to start reading every generator expression statement to see if someone wrote if or while. Right now you only have to care if something is for or if after that initial for.

I also don’t think this pattern comes up often enough to warrant the addition. Generator expressions are already a productivity optimization and don’t open up new doors of possibilities, so in this sort of instance I would say just construct the list manually.

5 Likes

I’ve certainly had use for this pattern recently, resorting to using a generator nested inside a function. The for-while comprehension made immediate sense to me when I read it above

2 Likes

Yeah, that’s fair. I’m biased about the cognitive overhead part, because of course what I wrote makes sense to me hah. But I can’t deny that it would only be convenient for a small percentage of folks.

I don’t mind the concept, but I don’t like how the exact suggestion breaks the duality between comprehensions and their unravelled form. The comprehensions are supposed to be like compacted for loops.

[item for item in iterable if item < 10]
# <==>
for item in iterable:
	if item < 10:
		yield item

But comprehensions don’t easily unravel anymore with the suggested change.

[item for item in iterable while item < 10]
# <==>
for item in iterable:
	while item < 10:  # ??
		yield item

In some world, even this would make more sense.

[item for item in iterable if item >= 10 break]
# <==>
for item in iterable:
	if item >= 10:
		break
	yield item
4 Likes

Call me a heretic if you will, but the cleanest way to maintain the duality would be to change for loops rather than comprehensions.

[item for item in iterable while item < 10]
# <==>
for item in iterable while item < 10:
	yield item

It’s a bit of an unusual use-case though, so I don’t expect anyone to like the idea. Still, it does let you maintain the duality…

1 Like

Would it make any sense to allow while in a list (or generator, more
importantly) comprehension? For example, you can currently filter an
entire list or generator like:

[item for item in iterable if item < 10]

But suppose I wanted to do the equivalent of itertools.takewhile, that is consume elements until some condition is met. I feel like it’s succinctly summed up by:

[item for item in iterable while item < 10]
without having to import this function. Given that this function exists 
already, this idea is probably already a low priority, but I'd be 
interested to see what people think.

I tried this, but it doesn’t work in current Python:

 >>> def until(x):
 ...   if x > 10: raise StopIteration
 ...   return True
 ...
 >>> [ a for a in range(20) if until(a) ]
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "<stdin>", line 1, in <listcomp>
   File "<stdin>", line 2, in until
 StopIteration
 >>>

Alas,
Cameron Simpson cs@cskk.id.au

It doesn’t, and variants that mess with the iteration in other ways will end up becoming RuntimeError instead. IMO this is a good thing; it’s not possible in an unrolled loop to have a function interrupt the loop, and it would be very surprising if that happened in a comprehension:

def breaker():
    """Calling this function is like having a break statement"""

for a in range(20):
    if a > 10: breaker()

Would you like it if there were code that could be placed in breaker() that would behave as described? It’d be pretty confusing.

So there are basically two approaches available for interrupting a list comprehension part way: changing the iteration, and enabling the break statement in some way. Given the way that a comprehension is stacked, I’m not enthusiastic about the second option, but if someone comes up with a really good syntax for it, my opinion might change. Not holding my breath.

Changing the iteration is what we currently have with takewhile. That’s why I think it makes more sense, if syntax were to be added, to make it a variation of the for loop. Consider, also:

stuff = [a for a in things if cond(a) for _ in range(a)]
stuff = [a for a in things if cond(a) while looper(a)]

In the first example, we definitely expect that the condition is checked once, and the inner loop is entirely inside that check. So what would it mean to have a while loop inside that? I’m not sure it makes the right sort of sense. OTOH, if a for loop can have a condition attached to it, this would simply not be valid, as the only loop is the for.

stuff = []
for a in things while cond(a):
    stuff.append(a)
# or #
stuff = [a for a in things while cond(a)]

REXX has this kind of concept. A REXX loop is always introduced with the DO keyword, and then it can have any number of clauses after that:

  • n
  • var = initial [to limit] [by increment] [for count]
  • while cond
  • until cond

Your classic “count by numbers” loop is do i = 1 to 10, equivalent to for i in range(1, 11) (REXX is double-inclusive with its range). Saying do i = 1 to 10 by 3 is like for i in range(1, 11, 3). And do i = 1 for 7 will do seven loop iterations and then stop, kinda like using islice on the iterator.

But the concept I want to focus on is do i = 1 to 10 while cond(). It’ll loop, just like any other, but check the condition each iteration. Once the function returns false, the loop will stop.

REXX doesn’t have the idea of “iterate over this collection”, so Python definitely wins there, but IMO it’s worth considering the possibility of adding conditions to the loop itself.

(For completeness: do 10 means “iterate ten times”, and is broadly equivalent to for _ in range(10); and do until cond() is slightly different from a negation of a while loop in that it’s checked at the end of the loop rather than the start - like C’s do-while loop. No direct Python equivalent. There’s also do forever which can’t be combined with other clauses, because a simple doend will run once - it’s REXX’s equivalent of an indented block in Python, or a pair of braces in C.)

Yes, absolutely it makes sense! Clojure has exactly that feature.

It’s a frequently asked question too, e.g. on Stackoverflow I’ve found at least four:

It’s been discussed various times on, e.g., the Python-Ideas mailing list. Here’s one example that I stumbled across while searching for an unrelated matter.

Some objections:

  • “Use takewhile, a while-loop, or a for-loop with a break.”

Sure, we can do that, but that reasoning equally applies to regular list comprehensions as well. We added comprehensions as a more readable, easier to use, alternative to imperative style for-loops and functional style map() and filter(). The same applies here: the only difference is that takewhile loops are a bit less common than filter.

  • “Comprehensions have a correspondence to nested for- and if-statements, and this would break the correspondence.”

No, it doesn’t break the correspondence, it merely modifies it to include a term that corresponds to something spelled differently.

A comprehension like [expr for x in seq if cond] maps neatly to nested for- and if-blocks:

accumulator = []
for x in seq:
    if cond:
        accumulator.append(expr)

except that the whole thing is buried inside a hidden function. Changing the if to a while would be just a small modification:

# [expr for x in seq while cond]
accumulator = []
for x in seq:
    # NOT "while cond"
    if not cond: break
    accumulator.append(expr)

We still have a correspondence, with a change of spelling of “while” to “if not … break”.

That’s okay. We also have for...else and while...else where the else statement has no connection to the if...else version, so having two meanings of “while” is no big deal.

3 Likes

Similar had been proposed some time ago in PEP 3142 – Add a “while” clause to generator expressions, and it was rejected by Guido:

I didn’t know there was a PEP for that. I hereby reject it. No point
wasting more time on it.

oh cool, I didn’t realize this. and oops! sorry to rehash this again

Rejected PEPs can be revisited, and this one is quite old. It’s worth reading over the document to get an idea of the problems facing the proposal.

1 Like