Breaking/continuing out of multiple loops

Rosuav · August 29, 2022, 2:27pm

Good reason to do the refactor first, as its own commit, and (if necessary) backport that change to the other version, since it causes no harm.

Gouvernathor · August 29, 2022, 2:32pm

So, on the base version, you would have a single loop in a function by itself, for no other reason than resembling the update version ? I find that even worse, because it makes no sense on its own, so it impedes code readablity on the base version. Yes, in my view it does cause harm.
What’s more, you may be doing a pull request from a fork of the repo, and not have access to the master branch, where the refactor of the base version would take place. In that context, even if you refactored in a single commit and did the reste in the subsequent commits, the diff of the whole PR is far less readable than using the multi-break.

Rosuav · August 29, 2022, 2:44pm

This is all incredibly theoretical on your part. Do you have any actual examples of real code that would be improved by this? It’s easy to invent theoretical objections to theoretical alternatives to theoretical problems.

Gouvernathor · August 29, 2022, 3:03pm

Any situation where the lines of a file are iterated through, and where the loop contains a break, is subject to what I’m describing. Typically when implementing a multi-file search, like most IDEs do, or when looking for files with trailing whitespace, whatever. I don’t find that an incredibly theoretical or uncanny situation, and it’s a very simple example.

I find your positions quite contradictory, because on the one hand you’re defending seemingly obvious and better solutions to a problem, and on the other hand you’re saying that problem never occurs outside of theoretical scenarios. So, which is it ? Are the try/except way and the single-use function way good and widely used solutions, or is the problem non-existent ?

Rosuav · August 29, 2022, 4:37pm

It’s not contradictory. The problem does not exist because there are MANY alternatives, including refactoring, exceptions, etc, etc, etc. You’re trying to say that we need multi-level break, but so far, you haven’t shown any examples that can’t be as well (or better) handled with other techniques.

So if you want to dispute my stance, show an example.

pf_moore · August 29, 2022, 6:11pm

I’m referring to reconsidering the logic - something that’s broader than any form of simplistic, “one size fits all” solution. I agree the try/except approach is not good. I disagree strongly with your idea that naming a block of code (i.e., “single use functions”) is unacceptable - naming a subexpression in a complex calculation is fine, why is naming a block of code in a complex loop any less acceptable? And yes, I’d often refactor by naming the inner loop. Or I might refactor by merging the 2 loops into one using a generator. Or something else. Basically I’d think about the code and what it was trying to do, and as I say, I’d nearly always find a better approach.

Think of it in terms of “having to break out of multiple loops is a code smell, indicating that you should re-think your approach”.

It does if you simply call the function stuff or something equally passive-aggressively unhelpful. But if you think about the logic, part of that is thinking about how to describe parts of the process, and that usually results in a good name.

Of course not all code warrants that much effort. A lot of my code is quick hacks. But even in that case, I can usually find a better way than “exit from 2 loops”.

I don’t think everyone piling in with their code samples is particularly helpful. The most recent example I had, though, was a “try to fetch a URL 10 times before giving up” loop, inside the body of a function. I wanted to break out if one of the tries returned a 304 Not Modified status. A double-break would have worked. But in reality, stopping and thinking for a moment and factoring out the inner loop into a fetch_url function was far better, named the operation in a way that was more readable, and made the outer loop shorter and hence more readable itself.

OK, so you’ve been lucky enough to not have had my experience. How many real life examples of code using labelled breaks have you seen (obviously they wouldn’t have been in Python, as there’s no labelled break in Python)? Were they all easy to read, and easy to maintain? How big (in terms of numbers of lines) were the loops? My experience suggests that the feature is not a particular problem for short loops (less than 10 lines, like most constructed examples) but scales badly to loops that are tens or hundreds of lines long^[1].

However, I’ll also note that this is irrelevant (see your comment above about getting back on topic). Python doesn’t have labelled breaks (or any kind of multiple breaks). The debate isn’t about how they might be problematic, it’s about what value they have that justifies adding them to the language. The onus is very much on you (and anyone else in favour of the proposal) to argue why they benefit users, in the first instance. If we get to the point where there’s a clear understanding of the benefits, then (and only then) is there a point to discussing whether the (perceived) problems are important enough to outweigh the benefits.

So please, can we stick to the point here. What are the benefits of this syntax? Clearly it lets you do something that at the moment can’t be done without some form or refactoring of the code. But how important is that? How much real world code (not theoretical examples, or code written to demonstrate a point) would be improved by this feature? Even if I concede (which I don’t - but as I say that’s not the point yet) that multi-level breaks are perfectly fine for readability and maintainability, you still need to demonstrate that there’s a need, and it’s not just a neat idea that no-one actually needs. Otherwise the status quo wins.

Yes, I know loops that long are unmaintainable anyway. That’s why having some pressure to refactor them helps… ↩︎

pf_moore · August 29, 2022, 6:18pm

There’s always fileinput for that. That’s a perfect example of factoring out the double loop into an iterator.

Gouvernathor · August 29, 2022, 7:00pm

Sure, but the well-named function still moves a chunk of code from one place to another, which doesn’t help git diff simplicity or overall readability, regardless of its name.

Also, your solution of using a single generator instead of several loops can help in several examples, but not in ones where 1) you need to break some of the loops but not all of them, or 2) you need to do stuff inside the inner loops, yet outside the outer ones.

My point was not that it’s unacceptable, or a bad practice, rather that it being the only solution is a bad thing. If you want to define single-use functions, it’s good that the language allows you to do so (and it can include much more advantages listed above by someone else, such as unit tests), but it should not be the only solution.

Anyway, that was my point, but yes I hear your point about multiple solutions for multiple contexts.
I’m not sure I agree with that, but then again it’s not surprising since I guess it’s based upon compiling a lot of examples you encountered (also called “experience ”) so it would be hard for you to come up with a hard proof anyway.

Sure, makes sense.

Rosuav · August 29, 2022, 7:22pm

It’s not the ONLY solution. Multiple solutions have repeatedly been given. Are there situations where the current half-dozen solutions aren’t good enough, and that having “only” this handful of solutions is a problem? You have given no such indication.

PythonMillionaire · August 30, 2022, 3:46pm

But I wouldn’t consider the needing-to-break-from-nested-loops case as an example of this : we don’t choose to separate it from the rest of the code just to make it more understandable, we’re doing it because the syntax gives us no other choice.

Exactly!!

It’s not the ONLY solution. Multiple solutions have repeatedly been given. Are there situations where the current half-dozen solutions aren’t good enough, and that having “only” this handful of solutions is a problem? You have given no such indication.

They are all ways of getting around the problem, not actual solutions presented by the programming language, which should be the case. You either:

need to refactor it into a function, which you would have done already if you wanted to do that for reasons other than achieving this functionality. It also doesn’t provide a way to use multilevel continue and you would need one function for each loop level so that you can return at different points in the loop and not just the outermost like @steven.daprano 's workaround.
need to use things that weren’t meant to be used for this purpose such as a try and except block. If the block was already there, this means you will pollute it by adding an exception (or two for break and continue) that’s not really an exception and make your code harder to understand. If it wasn’t, you will push everything one indent forward and make your code harder to understand at a glance because it will seem like you are catching a real exception but aren’t. On top of it all, this workaround also suffers from the same problem as workaround 1, which is that you need one try-except block at each loop level in order to have full control over the flow.
need to use flag variables, which also add a bunch of unnecessary lines of code as well as useless variables
other horrible solutions that don’t allow for full loop control and/or continue and/or require a bunch of lines of code or refactoring or whatever else

All of these workarounds, which is what they actually are as opposed to actual solutions, have very clear drawbacks. They are forceful and impose redesigns, pollute your code and are not quick to implement or remove.

Named and numbered break/continue have the following advantages:

Fine control. At any loop level, you can fully control the flow of any preceding loop
Allows for both break and continue
Extremely quick to implement and remove from code
Clear, unambiguous and concise. Does not increase the indent level
Is not a makeshift, jury-rigged workaround and does not force you to change your code in any way
Not ONE single line of code is added

The fact that multiple other languages have implemented this functionality proves beyond any doubt that there is demand for it. Even 15 years ago, when Guido originally rejected it, he said it had already been brought up several times. He argued:

… before you know it you have an incredible mess on your hands of unintelligible code.

I think it’s abundantly obvious that the alternatives are what actually lead to messy, unintelligible code. This can be clearly seen because:

They require many more characters and several more lines of code as opposed to a single integer or “as [loop_name]” added to a line that already existed
They raise the indent level, particularly if, for some reason, you need to control the loop flow at more than one loop level
They use Python functionality that wasn’t really meant to be used for such purposes
They force you to refactor your code in ways you didn’t want to
They invariably involve adding more variables, if clauses, etc. making your code more polluted and harder to read

I honestly can’t think of a single real advantage these solutions have over labeled or numbered breaks.

oscarbenjamin · August 30, 2022, 4:28pm

I don’t agree with the premise that “you would have [refactored] already” here. In large codebases with many contributors it can be common that over time the body of a loop will tend to grow as more and more gets added but no one chooses to refactor it until their hand is forced in someway. Perhaps the original decision not to have a single use function was not unreasonable at the time it was made. Later on though as more and more code is added it gets clearer and clearer that it should be refactored but at the same time the more it grows the harder it gets to actually do the refactoring. The reason I dislike the idea of labelled break is probably for precisely the opposite reason that you want to propose the idea: I don’t want to provide more opportunities to postpone what I would consider to be the necessary refactoring (that very often should have been done a long time ago).

I can’t actually picture in my mind real maintainable code where labelled break is significantly better than a reorganisation. When I try to imagine this being used in practice I just picture it being used to extend the kind of spaghetti code that I already wish people didn’t write in the first place. Perhaps I don’t have the imagination to see the kind of situation where this could be used in a good way but I can definitely imagine it being used in a bad way.

Of course if real life examples were provided then we could discuss the pros and cons in those cases without depending on my imagination.

Rosuav · August 30, 2022, 5:45pm

Hold on a moment. You’re starting from the assumption that “I need multi-level break” is the problem/goal. People don’t pay programmers to write multi-level breaks; they pay programmers to solve problems - not problems like “what is love?”, but practical problems. So start with actual code that needs to be written, and THEN figure out what the best way to write it is; don’t start with “we need multi-level break”, and then justify it by saying “if we need multi-level break, then multi-level break is the best way to write multi-level break”.

erlendaasland · August 30, 2022, 6:06pm

Learn proper refactoring and your future self (and everyone else who read your code, including your coworkers who review it) will thank you. Do as Serhiy: create a function and use it only once.

(This is my first and only post in this thread; I’ll mute it right away.)

Recommended further reading:

Refactoring — Improving the Design of Existing Code (Fowler): Refactoring
The Practice of Programming (Pike, Kernighan): The Practice of Programming

BTW, did I mention I’m -1 to this multilevel labelled break proposal?

steven.daprano · August 30, 2022, 6:15pm

No, they are solutions. You just don’t like the solutions.

And the drawbacks are not “very clear”. Python is a 30+ year old mature language, used by thousands or tens of thousands of programmers, with millions of lines of code. If the drawbacks were so obvious, we would have added this feature 20 years ago.

So if these drawbacks exist, they are minor, rare, a matter of opinion, or all three.

Good! Because if you find yourself writing a big nested loop where you need to jump out of multiple levels, you should redesign it.

That is also an argument for unrestricted goto with the ability to jump anywhere. “But someday I might need the fine control of being able to jump into the middle of a function!”

Um, okay, for some definition of “need”.

Let’s get back to this “fine control” to jump out of multiple loops. Do you have actual examples of real code that needs this “fine control”, or is this purely hypothetical?

I’ve written a lot of loops, and I’ve never wanted this. So let’s see some examples of real code that needs this feature.

And the fact that for every language which has included this feature, a dozen languages have not, proves “beyond any doubt” (your words) that the feature is a mistake, no matter how much demand there is. Right? No?

Neither am I that we should copy languages that have this feature. Languages have all sorts of unnecessary bloat and cruft.

Let’s see real-world examples, not made up hypotheticals.

Solutions that work now are better than solutions that rely on you waiting a year or three until you can use it.
Functions, flag variables and exceptions already exist. They need no extra documentation or testing, and no extra code in the interpreter to make it work. So we avoid all the maintenance headache from introducing a new syntactic feature.
Refactoring big, complex, monolithic blocks of code with multiple nested loops into functions will simplify the code and make it easier to understand and easier to maintain.

Do you still disagree? Then show us real-world examples, not made up hypotheticals.

PythonMillionaire · August 30, 2022, 7:23pm

I don’t want to provide more opportunities to postpone what I would consider to be the necessary refactoring

Please excuse me but “I don’t want this feature to be added because otherwise [I, people] won’t have an incentive not to be lazy anymore” is a pretty poor argument.

I can’t actually picture in my mind real maintainable code where labelled break is significantly better than a reorganisation.

They are not mutually exclusive. People can reorganize their code all they want and if they find they still need labeled breaks, they should be there for them.

As I explained, a reorganization does not solve all issues. You can’t have full loop control with both continue and break unless you split it in completely unnatural ways with lots of handle_this_loop and handle_that_loop functions. If you are doing all this just to get some pretty basic functionality, I think it’s pretty clear the language is lacking in some way.

Learn proper refactoring and your future self (and every else who read your code, including your coworkers who review it) will thank you. Do as Serhiy: create a function and use it only once.

Let’s get back to this “fine control” to jump out of multiple loops. Do you have actual examples of real code that needs this “fine control”, or is this purely hypothetical?

So start with actual code that needs to be written, and THEN figure out what the best way to write it

Do you have actual examples of real code that needs this “fine control”, or is this purely hypothetical?

Here is a minimal example based on my real-life problem, which I solved using flags in actuality. How would you refactor the code below? Let’s compare and see which one is cleaner, easier to read and implement. I already know for a fact, beyond all doubt that this looks MUCH better than using flags, let’s see how your solution compares.

for league in all_basketball_leagues as league_loop:
    number_of_incomplete_players = 0
    for player in league as player_loop:
        for season in player.seasons as season_loop:
            number_of_incomplete_game_data = 0
            
            for game_stats in season:
                stats_successfully_processed = process_game_stats(game_stats)
                
                # key table not in game_stats or another problem
                if not stats_successfully_processed:
                    number_of_incomplete_game_data += 1

                    if number_of_incomplete_game_data > 5:
                        number_of_incomplete_players += 1
                        
                        if number_of_incomplete_players > 60:
                            # too much missing data, league unusable
                            continue league_loop

                        else:
                            # too much missing data, season unusable
                            continue season_loop
                         
                    # if key table missing, skip game   
                    else:
                        continue

                # no key tables missing, check game stats
                number_of_missing_values = 0
                for table in game_stats:
                    if not check_table_integrity(table):
                        number_of_missing_values += 1
                        
                        if number_of_missing_values > 5:
                            number_of_incomplete_game_data += 1
                            
                            if number_of_incomplete_game_data > 5:
                                continue season_loop

                        else:
                            continue
                            
                    # table is perfect
                    else:
                        do_stuff()

Rosuav · August 30, 2022, 9:04pm

Your logic is incredibly opaque here. Why does incomplete game data sometimes skip the entire league, but as long as you don’t get too much all at once, you’ll only skip individual seasons? This still feels extremely artificial, but I’ll give you the benefit of the doubt there; still, it does seem like there’s quite a few utterly unrelated things going on here (for instance, there are two places where you increment number_of_incomplete_game_data but they have different loop-skipping behaviour).

This has a strong smell of standard multi-level validity done weirdly with ad-hoc constraints. If this is the best example you can come up with, I would have to say “welp, sometimes code is messy, deal with it”. This code is BEGGING to be refactored in some way, to make the logic clearer; and only you, the one who designed its constraints, can figure out how.

bryevdv · August 30, 2022, 9:05pm

A 50-line loop body and eight levels of indentation (assuming this is inside a function) and this is the good version? Having a multi-break ~~won’t~~ didn’t fix that. All the complexity in that code stems from trying to do ad-hoc relational querying with imperative code, at the same time as pre- and post-processing. You should compute all the game stats up front unconditionally, filter out the ones you don’t want separately after the fact, and only then “do stuff”.

In very broad strokes, this is more like what I would expect that code to look like:

def compute_all_stats(leagues: Iterable[League]) -> Iterator[GameStat]:
    # this could probably be improved with itertools
    for league in leagues:
        for player in league:
            for season in player.seasons:
                for game_stats in season:
                    yield process_game_stats(game_stats)

def filter_stats(stats: Iterable[GameStat]) -> list[GameStat]:
    result = []
    # filter out ones you don't want here, maybe even use a real database
    # to help make real queries -- SQLite is built right in to Python
    return result

all_stats = compute_all_stats()

filtered_stats = filter_stats(all_stats)

for stat in filtered_stats: 
    do_stuff(stat)

It’s fewer lines, much less indentation, and congratulations, now you have some simpler, smaller functions that you can actually unit test in isolation.

PythonMillionaire · August 30, 2022, 9:57pm

Your logic is incredibly opaque here. Why does incomplete game data sometimes skip the entire league, but as long as you don’t get too much all at once, you’ll only skip individual seasons? This still feels extremely artificial, but I’ll give you the benefit of the doubt there; still, it does seem like there’s quite a few utterly unrelated things going on here (for instance, there are two places where you increment number_of_incomplete_game_data but they have different loop-skipping behaviour).

Like I very clearly said, it is based on my real-life example. I changed a lot of things because otherwise it would be too complex but in essence that’s what my problem is. If there is too much missing data of a certain type, I skip the table, player, season or league accordingly.

“welp, sometimes code is messy, deal with it”

If this were about a completely novel idea that had never been implemented in any language before, sure, I would agree that this could be such a case. However, labeled loops are nothing new. They have been implemented in several prominent, respected and successful programming languages without any adverse effects. If this is the case, it must be so for a reason. Maybe in the end it won’t be implemented in Python but the idea is very valid and solves a specific and real problem.

A 50-line loop body and eight levels of indentation (assuming this is inside a function) and this is the good version? Having a multi-break ~~won’t~~ didn’t fix that. All the complexity in that code stems from trying to do ad-hoc relational querying with imperative code, at the same time as pre- and post-processing. You should compute all the game stats up front unconditionally, filter out the ones you don’t want separately after the fact, and only then “do stuff”.
In very broad strokes, this is more like what I would expect that code to look like:

First of all, this is cheating as you did not at all replicate the behavior of the other script, you just changed the algorithm completely. More importantly, processing everything unconditionally is not even close to an acceptable solution in my use case. I am processing a lot of HTML in process_game_stats() and use a good deal of regex. To make matters worse, my files are in a very slow external hard drive in many different folders and so are glacially slow to scan. There are multiple dozen basketball leagues from different countries, some of which have over 5000 players, each of whom has played hundreds of games on average. Some, like Lebron James, have played over 1300. All in all, I am looking at literally millions of scraped HTML pages.

Again, like I said, even if I didn’t have a real use case, your proposed solution is also not a solution at all as you simply changed the whole algorithm in order to avoid replicating the behavior that I showed. How would you do the exact thing that I did, meaning control the flow of the program in that exact way using one of the proposed solutions in this thread or another of your own? In other words, how do you solve the loop flow problem without labeled loops?

bryevdv · August 30, 2022, 10:09pm

First of all, this is cheating as you did not at all replicate the behavior of the other script, you just changed the algorithm completely .

A suggestion to change how you are doing things was the point, yes.

More importantly, processing everything unconditionally is not even close to an acceptable solution in my use case

So if you look closely, you will see that the compute_all_stats is a generator, which means it evaluates lazily. The filter_stats kind of presumed that everything could be collected in one big pass. But it could be a generator too (it would need to keep around and build up an appropriate data structure for incremental filtering). The point is to tease things apart into more, smaller, discrete, testable chunks, which can certainly be done, even if you require lazy (generator) evaluation. I would not keep things in a single loop has you have, under any circumstances, because that tangled business logic is unmaintainable.

You seem really obsessed with replicating control flow when all that really matters is that results match, and satisfy any other requirements (e.g. being able to stream process incrementally instead of one batch). To be honest, this whole thread seems like another example of the x-y problem:

https://xyproblem.info/

PythonMillionaire · August 30, 2022, 10:21pm

But it could be a generator too (it would need to keep around and build up an appropriate data structure for incremental filtering)

I don’t see how this would allow me to skip over a very significant portion of the data in the compute_all_stats loops, which is what I need. The whole thing will take days

You seem really obsessed with replicating control flow when all that really matters is that results match, and satisfy any other requirements (e.g. being able to stream process incrementally instead of one batch). To be honest, this whole thread seems like another example of the x-y problem

This is strictly about flow control. This isn’t the first time I have wished for a way to continue or break outer loops and it bothered me having had to use workarounds. It always seemed to me that labeled or numbered breaks felt as pythonic as anything could be and so I thought I would suggest it and see what people would think.