Breaking/continuing out of multiple loops

Not the first time? Then maybe you can show us another example.

Multi-level break is NOT A GOAL in itself. It is a means to an end.

I don’t, because “the loop flow problem” is not one of my projects. It’s an implementation detail. So “how would you have coded this?” is a much more appropriate question, and a reasonable response is “use nested generators” as Bryan proposed. (Although his example is far from complete; as stated, the code given shows “in very broad strokes” what could be done. You would probably end up with 3-4 different generator functions.)

OK? So then the first generator gets to interrogate the data structure built up by the filter generator, and can decide to skip calling the expensive process_game_stats when that’s appropriate. That’s one conditional with one (regular) continue.

def generate_stats(leagues: Iterable[League]) -> Iterator[GameStat]:
    for league in leagues:
        for player in league:
            for season in player.seasons:
                for game_stats in season:
                    if should_skip(league, player, season, game_stats):
                        continue
                    yield process_game_stats(game_stats)

def should_skip(league, player, season, game_stats) -> bool:
    # make a decision based on current inputs, and the previous
    # stats and data collected by earlier filter iterations

def filter_stats(stats):
    for stat in stats:
        # save whatever metadata you need about this to inform `generate_stats`
        if is_a_keeper:
            yield stat

So now generate_stats may have to “empty loop” over a bunch of cases where you skip, that’s not a problem for sure:

In [1]: %timeit for _ in range(100_000_000): pass
1.81 s ± 76.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Is this a little bit more complicated due to the “feedback” between the generators? Sure, but at least the points of contact between the parts are well-defined and amenable to being tested individually.

Here even is a complete demonstration example. [1] The code below generates between one and four million multiply-nested “gamestats”. Then it will chew through all of them but only call the expensive function on a fraction of them (so only take a fraction of the time it would take otherwise):

Processed 673 out of 2247765 game stats

You can tinker with the base sleep value to make it simulate more expensive process_game_stats or with the base probability to make it skip more or less of the game stats.

It’s just a toy that uses a current league id to progressively increase the probability that a future gamestats will be skipped. It’s up to you to implement the real business logic to decide which should actually be skipped. But the point is that there are clear places to put that logic, that aren’t smeared across seven levels of a 50 line loop. You can actually run mypy on this sort of code, and make sure the parts fit together like they are supposed to. In other words: Decoupling the different concerns can be done, and more importantly, should be done (for testing ease, maintainability, and sanity).

from __future__ import annotations

from dataclasses import dataclass
from random import randint, random
from time import sleep
from typing import Iterable, Iterator

# Controls (tinker with these)

_BASE_PROBABILITY = 0.001
_BASE_SLEEP = 0.001

# Types

@dataclass
class GameStats:
    football_bats: int

@dataclass
class Season:
    game_stats: list[GameStats]

@dataclass
class Player:
    seasons: list[Season]

@dataclass
class League:
    id: int
    players: list[Player]

ProcessedStats = tuple[League, Player, Season, GameStats]

# Fake data

data: list[League] = []

TOTAL_STATS = 0

for i in range(1, 11):  # leagues
    league = League(i, [])
    for j in range(200):  # players
        player = Player([])
        for k in range(randint(10, 20)):  # seasons
            season = Season([])
            for l in range(randint(50, 100)):  # game_stats
                season.game_stats.append(GameStats(randint(2, 10)))
                TOTAL_STATS += 1
            player.seasons.append(season)
        league.players.append(player)
    data.append(league)

# Processing

FILTER_DATA = _BASE_PROBABILITY

# simulate expensive function
def process_game_stats(
    league: League, player: Player, season: Season, game_stats: GameStats
) -> ProcessedStats:
    sleep(game_stats.football_bats * _BASE_SLEEP)
    return league, player, season, game_stats

def generate_stats(leagues: Iterable[League]) -> Iterator[ProcessedStats]:
    for league in leagues:
        for player in league.players:
            for season in player.seasons:
                for game_stats in season.game_stats:
                    if should_skip(league, player, season, game_stats):
                        continue
                    yield process_game_stats(league, player, season, game_stats)

def should_skip(
    league: League, player: Player, season: Season, game_stats: GameStats
) -> bool:
    # business logic that skips based on collected data goes here
    return random() > FILTER_DATA

def filter_stats(stats: Iterable[ProcessedStats]) -> Iterator[ProcessedStats]:
    # real data for the business logic is built up here
    global FILTER_DATA
    for stat in stats:
        FILTER_DATA = _BASE_PROBABILITY / stat[0].id
        yield stat

processed = filter_stats(generate_stats(data)) # lazy evaluation pipeline

final = []
for i, stat in enumerate(processed):
    if i % 10 == 0:
        print(f"{i} processed, current league {stat[0].id}, P={FILTER_DATA}")
    final.append(stat)
print(f"Processed {len(final)} out of {TOTAL_STATS} game stats")

  1. I guess I needed a distraction this afternoon. ↩︎

Actually it is a damn good reason. That’s why we don’t have goto.

Languages are not random collections of features. They are designed, or at least the good ones are, with a certain style of programming in mind, and that includes:

  • encouraging people to write in a certain style;

  • while also discouraging other styles.

Big monolithic blocks of code with many nested loops are worth discouraging. Only if such code is clearly better than the refactorings would we want to encourage that style by adding multilevel break.

It’s not enough for us to be neutral towards multilevel break to add it. If it had been put into the language 30 years ago, then even if we thought it was fairly pointless but not actively harmful, we would keep it.

But it wasn’t added back then. In order to justify adding it now, its not enough to say that it is harmless, or that it might occasionally be useful, or no worse than the refactored solutions. It has to be clearly better than the refactored solutions, and not just in made-up fake code, but in real code.

Other languages may have different values, and prefer to encourage and discourage different things. That’s fine too.

3 Likes

Alas, I fear that is the problem: your code is too complex, which is why you need ad hoc gotos to jump around the loops to make it work.

Of course these aren’t the bad sort of unrestricted goto and you aren’t jumping into a loop, only jumping out (break or continue), but chances are good that simplifying or reorganising your design will eliminate the need for this.

Haven’t we already covered this?

  1. Redesign your algorithm so that the loop flow problem goes away.

  2. Use exceptions.

  3. Use functions with return to perform a multilevel break.

  4. Use flags.

Are there any real-life examples of nice looking, not painfully complex code, which aren’t successfully dealt with by a combination of those four?

@bryevdv I appreciate all your effort but, unfortunately, your solution doesn’t work. All of the slow functions without exception happen before I know whether the player, table or whatever has enough data to be useful. They are, in fact, what determines this. Each player has 10-100 HTML pages so I would need to do os.listdir() about 100 thousand times and open, parse and process over 5-10 million HTML pages in total if I don’t skip a bunch of them. All I do after I know whether I want to keep the information or not is a little bit of extra processing and saving.

Do you have any other proposed solutions for my specific use case without trying to change the algorithm, which, unfortunately, really cannot be changed in this case?

Big monolithic blocks of code with many nested loops are worth discouraging.

I agree, which is why this notation is so much better than any of the existing workarounds. It adds zero lines of code and only a few characters and requires no restructuring whatsoever. Literally none of this is true about any of the alternative workarounds.

Are there any real-life examples of nice looking, not painfully complex code, which aren’t successfully dealt with by a combination of those four?

The code I shared is not painfully complex. It’s very, very simple code. I’m sure you understood my needs immediately, namely:

I have literally multiple millions of HTML pages that I need to parse and process. Depending on how much missing data there is at any given level, I need to skip the corresponding level the missing information is at: table, game, season, player, league. Can you accomplish EXACTLY what I am trying to accomplish with the existing workarounds in a more efficient way? If so, I am all ears.

I already used flags in my actual “solution” and it was just absolutely awful. Far more complicated, far more lines of code. I would love to see how you would accomplish the exact thing I’m trying to accomplish using one of the other workarounds you propose in fewer lines of code and or in a clearer manner.

If that is true, then it is literally nowhere evident in the sample you provided, and in fact I don’t see how this discussion is relevant to literally anything since: the only unknown thing in that code is process_game_stats so if that’s not what is slow, then your issues must be before the loop entirely (or else you have not given us a real representation of your actual problem).

Do you have any other proposed solutions for my specific use case without trying to change the algorithm, which, unfortunately, really cannot be changed in this case?

No, and I reject the claim that it cannot be improved.

If you think so, then I would love to see how you you would solve the EXACT problem I am describing:

You are looping through information at four different levels: league, player, season, game. Let’s say you have something like 40 leagues, each of which has between 800 and 7000 players. Each player has between a few hundred to around 1500 games. Each game has a few dozen tables spread out in several files.

You must extract all data, determine which of these leagues, players, etc. are usable based on missing information and save it all. To make it clear: if too many game stats are missing, you must skip the season. If too many season stats are missing, you must skip the player. If too many player stats are missing, you must skip the league.

How would you achieve this multi-level skipping? I already know flags is a far worse solution than the syntactic sugar I am proposing.

The code you showed doesn’t match the algorithm you described, so now I don’t know what I am being expected to do here.

1 Like

What you are really expected to do is to create a four-level (or more) nested loop where you can continue and/or break any loop from within any of its nested loops as needed. Once you do this, compare your solution with the proposed notation.

I simply gave you my own real-life example where the functionality that I describe above is actually useful in practice. Such cases might not be the most common in the world but this proves that they DO exist in practice. In fact, if they didn’t, no language would ever have implemented this functionality

And you’re back to assuming that multi-level break is the necessity, from which it logically follows that multi-level break is indeed the best solution.

I’m done arguing. You’re not listening.

1 Like

I wrote the code above with that use case explicitly in mind, and even left comments about where you need to supply your own business logic rules to apply filtering as the lazy pipeline incrementally progressed. Look, the actual chances of this feature ever making into Python are about zero. So, I assumed the conversation behind the conversation was really about ways to improve your code, given that you think the feature is what you need. But you seem personally invested in your current approach, and it is clear no amount of coaching or advice is going to sway you to look at better alternatives. Good luck!

2 Likes

This discussion has run its course. Continuing this is not a productive use of anyone’s time. Please be considerate of maintainer time and perspective when opening a discussion in the Ideas category.

7 Likes