Add optional `if break:` suite to `for/while`

dg-pb · January 14, 2025, 6:04pm

In your example above:

loop_i.broken # True
loop_j.broken # False
loop_k.broken # False

loop_i.exhausted # False
loop_j.exhausted # False
loop_k.exhausted # False

So exhausted/finished/ended is more general, while broken is just one special case how loop did not reach exhaustion.

Another possibility is: “loop did not exhaust because of return”

With “this”, one more case becomes possible: “loop did not exhaust because some outer loop was broken”.

While broken should in practice be sufficient most of the time (some case analysis would be usefule here ofc), combination of 2 above should cover a large portion of more complex cases as well.

hprodh · January 15, 2025, 9:22am

Yes, clear enough (though I think ended is shorter thus better).
→ I think we can assume there is no use to store an info about whether the loop was exited because of returning or raising an exception.

I actually had another use case for the running flag, but I failed to clearly explain it previously : we might have a function that can be called from inside or outside the loop and that requires to know if we are inside or outside. broken and ended cannot convey this information so running should be True inside the loop and False outside.

hprodh · January 17, 2025, 2:47pm

One last thing maybe worth thinking about :
As named loops are objects, we might have usage possibilities to set “callback functions” (or something like overriding internal methods). I can think about several callback placeholders, to which a function can be assigned, so that the function is executed on different conditions.
For example :
on_break, on_continue, on_end, on_except when loop is broken, continued, exhausted, or raising an exception, on_next on_next_end at the begin and end of each iteration.

Questions remain about this : Is it worth it ? Would it create footguns or debugging hells (like recursive hell or similar) ? And how will be the syntax defined ? maybe something like :

for i in range(N) as loop:
    loop.set_callbacks(on_next=func1, on_end=func2)
    ...

with a dedicated class, a case of one-liner could be (similarly to an above case) :

for i in range(N) as (loop := ForLoop(on_next=func1, on_end=func2)):
    ...

But maybe, given the fact a user can already do something almost equivalent as in the following quote, this is not worth it.

dg-pb:

class LoopManager:
    def __init__(self, it):
        self.it = iter(it)
        self.interupted = True

    def __iter__(self):
        return self

    def __next__(self):
        try:
            return next(self.it)
        except StopIteration:
            self.interupted = False
            raise

dg-pb · January 18, 2025, 12:02am

Functionality above is well contained in iterator space and there is no need to mix it with this. I.e. more_itertools.side_effect can already be used as one-liner for this:

for i in side_effect(func1, range(N), after=func2):
    ...

dg-pb · January 18, 2025, 12:51am

Or another way (one more line, but a bit more readable to me):

with IterManager(range(N), **callbacks_here) as it:
    it.add_more_callbacks_here(...)
    for i in it:
        pass

hprodh · January 22, 2025, 9:34am

→ Ok, that’s right, such complexities should remain on the user-side.

Also, I actually realized running does not actually have a real use-case (unless the loop is asynchronous, but this should actually be handled by dedicated methods).

So for now, we validated that named loops should provide the selective break, selective continue, and the public attributes broken, exhausted, n_iter.

dg-pb · January 22, 2025, 11:49am

I think this can also be left out as it can be done independently:

it = more_itertools.countable(range(10))
for i in it:
    ...
print(it.items_seen)    # 10

hprodh · January 22, 2025, 12:35pm

Of course, n_iter is not mandatory, but it provides convenience.

lukjak · January 27, 2025, 7:07pm

Hi,

Let me play another advocatus diaboli - is a limited try/catch/else/finally (with outlooks to be married with match) with 1 level of indentation less and presumably better performance really needed in the language?

I use for/else and while/else routinely, but I really can not remind myself any situation where I would benefit from the ability to execute any common “breaks” code - maybe it’s just me and/or maybe alternatives are unintrusive enough for me to not notice.

Also, all these if break, if_break or break look horrible in my eyes - for the reasons already stated above. interrupt is the nicest, there’s raise/except and it could be break/interrupt.

for x in y as z looks very smooth but x and z being different things would be confusing - as is used for aliasing, not creating a new value (and it reads so). Aren’t generally these loop control objects just more complicated than the problem they are trying to solve?

hprodh · January 29, 2025, 2:43pm

I myself almost never use breaks, some continues or return, and I obtain ‘early stopping’ by converting my for loops to while flag_run : .... I manage nested loops with nested flags.
I think I would do it differently if named loops with autoflags were existing.

True, some might read for x in ( y as z ), but what is meant above is (for x in y ) as z. I understand the non-consistency with other as usages, maybe an alternative should be found, idk.

dg-pb · January 29, 2025, 3:07pm

I would love that. It would be more performant than any existing methods (more_itertools is extremely poor in this aspect). Also, it would be nice to be able to use it instead of enumerate(iterable) where index is needed for a small fraction of cycles.

Furthermore, if its type was exposed:

# types.py module
for _ in () as loop:
    pass
LoopType = type(loop)

# __main__
from collections import deque
from types import LoopType
it = LoopType(iterator)
deque(it, maxlen=0)
print(it.n_iter)    # Count of consumed items

This would end my never ending quest for obtaining properly efficient “iterator-element-counting” a.k.a. more_itertools.ilen. Ref: Challenge: Quest for counting iterator length at C speed

hprodh · February 1, 2025, 1:57pm

I frequently have nested loops for outputs vs inputs parametric studies, requiring frequent refactoring (there are numerous ways of doing this, but generally the reordering of one loop requires a modification of at least three lines (init, gather, post)). I would like to be able to simplify the process, and probably overriding or adding callback to some __iter__ method of the named loop would help… but I think that I am in a niche here and that is highly complex to generalize that process to more ‘universal’ use cases… idk.
(Btw, my typical pattern below)

click to expand code

model = SomePseudoExperimentalNumericalModel(...)

data = AppenDict()  # suitably tailored class for data gathering
for na in range(Na):
    model.set(a=params_a[na])
    data_a = AppenDict()
    # alternative : data.new_level()
    for nb in range(Nb):
        model.set(b=params_b[nb])

        data_ab = model.compute()  # returns dict

        data_a.gather(data_ab)  # append dict values to lists within appendict
    data.gather(data_a)  # lists of lists (ndim=2) within appendict
    # alternative : data.end_level()
data = data.finalize()  # post-process

hprodh · February 4, 2025, 12:53pm

New considerations about named loops after more reflection :

The for x in y is already a shortcut for for x in range(y), an iterator is created, thus I think the named loop should yield ForLoopIterator instances, and homogeneity of types is respected.

The tqdm module constitutes a good example of what overriding named loop class can offer.
While it is already possible to use simply for x in tqdm(list_x), the update of the tqdm message after each iteration requires three lines :

pbar = tqdm(list_x)
for x in pbar:
    ...
    pbar.update(post=f"processing {x=}")

The ForLoopIterator class would help reduce one line of this syntax, by making tqdm act as a “decorator” of the class :

for x in tqdm(list_x) as pbar:
    ...
    pbar.update(post=f"processing {x=}")

Also for cases where some behavior should be adjusted regarding the ‘loop nesting level’ (like my data gathering use case, possibly many others…), one additional public attribute might be of interest : parent_loop. This would allow user-made decorators to retrieve the full nesting scheme of the decorated loop.
(The loop can only have one ‘direct’ parent loop, but can have chained ones, retrievable with loop.parent_loop.parent_loop..., this attribute should point to a ForLoopIterator instance, or a WhileLoopIterator instance, or any instance of a class inheriting from one of them, then every possible cases of parent loop retrieval are covered.)

pf_moore · February 4, 2025, 2:09pm

Why not just write it as

for x in (pbar := tqdm(list_x)):
    ...
    pbar.update(post=f"processing {x=}")

I’ve not been following the rest of the discussion, but this specific point is perfectly manageable with what we already have.

dg-pb · February 4, 2025, 2:38pm

I think for the purpose of clarity this can be broken into 3 steps:

break loop

So really this is the only absolutely necessary bit.
If break is called with loop, then it sets loop.broken=True (or maybe calls loop.__break__() or whatever). With this, one can already implement his own LoopManager with all of the features that have been mentioned in this thread, such as callbacks, counters, etc:

class LoopManager:
    broken = False
    # Can implement whatever

for el in (loop := LoopManager(it, ...)):
    break loop
if loop.broken:
    do_stuff()

Implement default LoopManager in CPython, which is optimized object specifically for this purpose. 3 features:
a) broken attribute (necessary)
b) exhausted attribute (very useful)
c) Iteration counter (very convenient and the only performant implementation in the library)

for el in (loop := types.LoopManager(it)):
    break loop
if loop.broken:
    do_stuff()
print(loop.n_iter)    # 1
print(loop.exhausted) # False

Convenient syntax to automatically wrap provided Iterable in types.LoopManager.

for i in range(10) loop:
    ...
for i in range(10) as loop:
    ...
for i in loop := range(10):
    ...

Not sure, about the syntax, but if this is desirable, I think it is possible to come up with something reasonable.

peterc · February 4, 2025, 3:23pm

I think this one looks best, but I don’t know how you would actually pass a message to pbar when you break the loop. (I mean besides doing it manually, which is possible already.)
This also becomes rather orthogonal to the original if break

hprodh · February 4, 2025, 6:37pm

The subject of named loops is orthogonal to eminent OP but it covers both the issues originally adressed and ‘more cases’ (esp. nested loop management). The post of @dg-pb just above summarizes what we’ve been validating previously here as relevant to the ‘more cases’. Also, he forgot this one possible instruction : continue loop_i.

Basically, if ForLoopIterator implements some empty __onbreak__ method, that issue is easily covered by overriding (possibly __oncontinue__ might also find some use elsewhere).

Also, tqdm is not informed internally about its current nesting level, it has to be passed as the position kwarg. It is probably already possible to have a factory with the __new__ method to track the nesting level of a loop… Yet I hypothetically wonder if parent_loop attribute would provide worthy convenience. → I think this last points completes the investigation on the generalizability of named loops.