Breaking/continuing out of multiple loops

ajoino · August 24, 2022, 4:55pm

Huh, I would have assumed most languages would be against such a feature, it makes the code spaghettified imo.

To me, using nested for loops for most things in Python is redundant, there usually exists an algorithm in itertools or a data type in collections that does what you want and is often (at least partially) implemented in C which makes them faster as well. If you need algorithms not found there, scipy or scikit-learning likely has your back. If this feature was implemented I would consider an anti-pattern.

Rosuav · August 24, 2022, 10:48pm

Pike has labelled break too, although I don’t use it very often. Across all the repositories I searched, I found just two cases where I’ve used labelled break; one of them is a quick hack-job of an SSL socket server for testing purposes, and the other actually isn’t even breaking out of two loops, it’s breaking from a for loop and a switch block. So the Python equivalent would just be a match statement with a break in it.

I’m definitely -1 on a numbered break feature in Python. If anything, it should be a labelled break. But even there, the use-cases are extremely rare.

hunter86bg · August 25, 2022, 5:45pm

I’m quite new in programming and python, but wouldn’t it make sense each loop (from the initial example) to be in a separate function and call each other when needed while storing the results in a global dictionary/list (or mixture of them) ?

Rosuav · August 26, 2022, 9:42am

Not everything should be its own function. Refactoring has readability costs, so it’s only worth doing when the cost of the current form is greater than the cost of having it in separate functions. A good rule of thumb is: if you can’t think of a good name for the function, don’t make it one.

Gouvernathor · August 28, 2022, 2:43am

I agree. I think the general good reason to make a function or method is for a) a public API question, b) to contain code that’s called at several different places, or c) in a functional programming context (such as passing it to sorted).
I don’t claim to be exhaustive, but in any case defining a function only to call it once is a bad design. Either in your code or, in this case, in the language.

For the feature itself, do we agree that it should only accept positive integer literals ? If we don’t go in the direction of named loops, that is.

storchaka · August 28, 2022, 4:14am

I disagree. If it makes your code clearer, it is a good reason.

Rosuav · August 28, 2022, 6:02am

I’d distinguish here between what is technically a function (what you need to pass to sorted, for example) and what is refactored into a function (something that has a name and an identity). They’ll often be the same, but a lambda function can often behave the same way that an inline bit of code does, and a list comprehension is technically wrapped up in a function too, but you almost never think about it that way unless you need to concern yourself with name leakage.

A named function does serve your first two purposes (public API, called in several places). But a named function can also be used well as a mere refactor, where it’ll only ever be called from one place. My rule of thumb from above (whether you can think of a good name for it) applies here; if the function has an identtity beyond “it’s the bit of that other function where I need to break more than once”, it’s reasonable to make it a function. For example, “find a matching user” where your definition of “matching” is a complex multi-level loop involving information about everything that user’s done - it might be so extremely specific that you never use it in any other location, but its job is clear, and it returns as soon as it finds a user to return.

storchaka · August 28, 2022, 6:21am

A named function is a solution if you cannot use goto and want to avoid using boolean flags or code duplication. It happens not only with exit from nested loops, but with exit from sequential loops, or even with nested if’s. For example:

if condition1:
    if condition2:
        x = some_expression
    else:
        x = long_expression
else:
    if condition3:
        x = other_expression
    else:
        x = long_expression

If you want to avoid repeating long_expression, you can refactor the code into function:

def find():
    if condition1:
        if condition2:
            return some_expression
    else:
        if condition3:
            return other_expression
    return long_expression
x = find()

From my experience, such case occurs not less often than breaking out of multiple loops.

steven.daprano · August 27, 2022, 10:24am

main() says hello

A main function is not mandatory in Python, but it is mandatory in many other languages. I don’t think we can say that they are universally bad for requiring a main function.

Other examples of useful functions that are only called once may include setup(), or a cleanup() function called just prior to the application exiting.

One advantage of putting code into a function, even if it only gets called once, is that you can write unit tests for it. If the expression is complex enough that you cannot see it is correct just at a glance, that may be a good idea.

No, we do not agree.

Its not clear that we need this feature at all, but if we do, there is no agreement in favour of integer counts versus named labels. I think that people could legitimately vote -1 on integer counts and +1 on named labels.

fungi · August 28, 2022, 4:38pm

I suppose it could be argued that once you start calling a function
in unit tests, it’s no longer being called only once (or only in one
place, at any rate).

Gouvernathor · August 28, 2022, 11:25pm

@steven.daprano I didn’t mean “are we ok to use integers and not to use named labels”, I meant “in the context of the integer version, do we agree that only literal integers should be allowed, aka we should not allow a variable containing an integer, or an expression resolving as an integer”.
I’m not sold either on label vs. integer, but I have that opinion that the integer version would be “bad ” if we allowed other things than literal integers to be put after the break/continue keyword.

Same answer as Fungi for the unit tests : if you call the function both in your code and in unit tests, you’re not calling it from only one place. But I would accept the main() function as an exception to the clear-cut statement I made sooner - although sometimes a if __name__ == "__main__" suffices.

@storchaka In that particular case I would define long_expression in a variable before the ifs, and set x to that value in the ifs, so that the long expression is only written once.
But in the general case, where for example you don’t want side-effected functions part of that expression to be called, you would have to find a workaround, and your function example is a solution (another would be to use a sentinel object and check for it after the ifs, but it wouldn’t be much simpler). And I would consider that a design issue with the if/else statement. Nobody found a better way (afaik), a solution for that flaw, but I consider it one nonetheless, and if someone came up with a solution, the very existence of your example would be one good reason to go towards that solution, and to change the if/else statement.

@Rosuav I disagree, but I don’t think there’s much arguments to be made about it on my part : I guess we just have different coding philosophies. I would agree that defining something as a function can help make the code more understandable i.e for complicated math things, which I think goes your way and the way of @storchaka.
But I wouldn’t consider the needing-to-break-from-nested-loops case as an example of this : we don’t choose to separate it from the rest of the code just to make it more understandable, we’re doing it because the syntax gives us no other choice.

Rosuav · August 28, 2022, 11:42pm

That’s exactly my point though. If you’re ONLY doing it because syntax gives you no other choice, then there’s no good name for the function, and it doesn’t belong as a named function. Much more often, there is a good identity for such a function. You can give a name to a function that calculates “orbital_velocity_at_altitude(celestial_body, altitude)”, and you can just as viably give a name to a function that ascertains “has_user_liked_your_comment”, which might be a multiply-nested loop with an early abort. Do you have an actual real-world example where you needed to multi-break?

Gouvernathor · August 29, 2022, 12:00am

@Rosuav No. You could have a good name to give the function if you had to, yet believe that code understandability alone doesn’t make it necessary or relevent to separate the function from the rest. In such a situation, a function like select_which_moon_or_satellite_to_go_to would only be defined because the syntax gives no other choice. Even though we have a good name available for it.
And the more I think about it the more I come to disagree with the concession I made in my previous message : if you want to name a code snippet that you only use once, you should probably just put a comment before it. I’m not judging anyone’s practices, but I personally would probably never define a function just for that.

If you want more context for the example I imagined just then, say I want to select a moon from several star systems :

for system in systems:
    for planet in system:
        for moon in planet.moons:
            if moon.has_no_titanium:
                break 2 # I don't want to be in a system with a moon with no titanium
            if moon.has_atmosphere:
                break # I don't want to be in the same planetary system

steven.daprano · August 28, 2022, 12:09pm

@Gouvernathor

Discuss doesn’t recognise py as a language for code blocks, you have to use python. I think python3 might also work.

This discussion is causing unpleasant flashbacks to the “structured programming wars” of the 1970s. (I didn’t witness them myself, but as late as 1999 when everyone was frantically re-writing their Cobol programs, a friend of mine was being ordered by his boss not to use functions because GOTO is “more efficient”.)

If you were hoping to convince us that labelled break/continue is better than using functions, I think that you are having the opposite effect.

To me, your example seems so artificial, and implausible, as to be useless as a use-case for multilevel break. If you don’t have a more realistic and convincing use-case, that weakens the argument for multilevel break.

In your example, the first time you find a moon with an atmosphere, you exit the entire planetary system. Surely it would be more realistic to move on to the next moon? And if you find a moon with no titanium, you exit the entire solar system. If you are looking for titanium, why not just move on to the next moon? Why travel to the next system?

def get_titanium(systems):
    """Mine titanium from a moon in one of the given systems."""
    for system in systems:
        for planet in system:
            moon = find_suitable_moon(planet)
            if moon is not None:
                 return mine_titanium(moon)

And lo, we have:

no problem with multilevel break;
a testable function with a self-documenting name;
we can factor the search algorithm into a separate function;
which allows us to test it, document it, and use a meaningful name;
and having done that, it awakes us to an even more powerful refactoring:

def get_titanium(systems):
    """Mine titanium from somewhere in one of the given systems."""
    for system in systems:
        body = find_suitable_body(system)
        # May return an uninhabitable planet, planetoid, moon, asteroid or comet.
        if body is not None:
            return mine_titanium(body)

I think that justifying this proposal is going to be hard. Break and continue are gotos, and as Uncle Bob explains in the link above, it is mathematically provable that we don’t need gotos.

To make this proposal convincing, we need a realistic example of an algorithm that uses it, and that example needs to be significantly more readable and maintainable than the refactorings into functions, or the use of try…except (also a localised goto).

If you intend to continue to push this idea, I strongly suggest you look at prior art: find languages which have added this capability, and see why they added it.

Gouvernathor · August 29, 2022, 12:02pm

Thanks for the formatting tip.

You’re changing my example, the implementation you’re showing doesn’t do at all what my example did.
I don’t care why you would change solar systems for such a reason, that’s just an example I invented in a few minutes. Regardless, the constraints I arbitrarily decided are that we don’t want to be in a solar system which has a titanium moon, or in a planetary system which has an atmospheric moon.

The baseline is, I have three loops and I want to break the second or the first one, from the inner one. You can come up with other examples for this, such as parsing lines in files in a series of folder. Do you really consider that made-up context to be at all important ?
In such a loops structure, this extension proposal for loops is the only way, as far as I know, not to add more illegible indenting to the code (try/except) or to define single-use named functions which requires you to go back and forth to understand where control goes. It’s the most concise and, in my view, understandable and readable.

Break and continue are gotos

Sure, but they’re lighter and more readable than the try/except structure, which is a goto as well. And while functions add safeties making them not exactly gotos, the function structures causes control to go to another place in the code, then come back where it was called. That back-and-forth is what makes it, in my view, a code that’s hard to read and understand, since compared to that, my loops hold in a few lines and don’t go out of the group of lines.
What’s more, afaik continue and break as they currently are already not less goto instructions than a raise in a try block. So, we would not be adding a goto feature to python, we would be changing how one works. If your point is that we don’t need gotos, I think that’s a nonsequitur : you should then be arguing for the removal of break and continue. I don’t understand how only changing how they work would change anything to the goto problem as you’re describing it. We’re not even changing it that much : there’s no new place where you can go to. The beginning and end of loops, that was the case before and it’s still the case.
Really I think your goto argument is mostly moot due to the fact that break and continue are already part of the python syntax.

If you were hoping to convince us […] you are having the opposite effect.

My goal is not to persuade anyone, it’s that we can understand each other’s reasons in a honest manner. If I’m wrong, and I’m sufficiently clear that you can explain even more clearly why my reasons are bad, it would solve the issue and we would have less people proposing it again. I really doubt that will be the case of course, because I believe in what I’m saying, but to me it would be a good resolution.

pf_moore · August 29, 2022, 12:54pm

This question is ill-formed, in the sense that I have no opinion on what integers should be allowed, because I think the “numbered break” form is bad in all its forms. I’m not interested in debating nuances of how we would implement numbered break if we agreed it was a good idea - we don’t, so unless that changes, going any further is pointless.

Then you’re somewhat missing the point. Python doesn’t have multi-level breaks. The only way that will change is if you (or someone else) persuades the core developers that such a feature is useful. If you’re not interested in persuading anyone of that, then you’re wasting people’s time, just as much as if you were continuing to argue after it became clear that the consensus is “no, we don’t want this feature”.

If you’re simply interested in a theoretical discussion of the pros and cons of the feature, without any intention of trying to get it added to Python, then this isn’t the right forum (if you hover over the forum name, “Ideas”, the tooltip says “Would you like to change something in Python? This might be your feedback forum.”)

Gouvernathor · August 29, 2022, 1:02pm

You’re missing the point, again, yes I want the feature to be added, and if we all get convinced based on good reasons that it should be added, to me that’s the best possible outcome. What I meant when I replied to the question someone other than you asked, is that persuading people is a lesser goal for me than that of being honest, and hearing and understanding one another’s views and reasons.
That’s the same for the integer point : my question was adressed to people who think the integer option is a good one, so we can discuss and agree (between ourselves) to the details of that version. If you disagree in a broader manner, the question is not adressed to you.

Now, please, let’s get back to the point of why the extended break and continue are good or bad.

pf_moore · August 29, 2022, 1:14pm

Happily. What is the point?

I’m a strong -1 on any form of break <number>. I believe break <label> is too infrequently used to be worth the disruption of adding it. And in the (relatively rare) cases where I have seen code that could use a multi-level break, there has always been a refactoring using current language features that was more readable and maintainable than a multi-level break would have been.

I have used PL/SQL, one of the few languages I am aware of with multi-level breaks, and whenever I’ve seen it used, it was actively harmful to the maintainability and readability of the relevant code.

I hope you don’t think it’s unreasonable that I comment on statements you make, just because they were in response to someone else? This is a general discussion, not a private conversation. By posting here you take time from everyone who is reading the thread - not just selected people you choose to address. You should be prepared for those people to respond. Otherwise, you’re frankly not being very respectful of their time.

Rosuav · August 29, 2022, 1:19pm

Yes, the context is VERY important, and the fact that made-up context is all you can offer is, itself, telling.

I mentioned earlier that I had found just two examples where I’d used multi-level break/continue, across all my code. Here they both are:

github.com

Rosuav/Gypsum/blob/master/connection.pike#L243


      
          							conn["unknown_ansi_"+params[p]+"m"]=1;
          							say(conn->display,"%%%% %O produced unknown ANSI code \\e[%dm\n",conn->worldname,params[p]);
          						}
          					}
          					conn->curmsg[-1]=conn->curmsg[-1];
          					conn->curmsg+=({conn->curcolor=G->G->window->mkcolor(conn->fg+conn->bold,conn->bg),""});
          					break;
          					default: conn["unknown_ansi_"+ansi[i]]=1; break; //Ignore unknowns without error - log them for curiosity value though
          				}
          				ansi=ansi[i+1..];
          				break colorloop;
          			}
          			default: say(conn->display,"Unparseable ANSI sequence: %O\n",ansi[..i]); return;
          		}
          		conn->ansibuffer=ansi;
          	}) {/*werror("ERROR in ansiread: %s\n",describe_backtrace(ex));*/ return;} //This will (among other errors) catch the deliberate over-indexing, if we don't have enough data yet.
          	textread(conn,conn->ansibuffer,end_of_block); conn->ansibuffer="";
          }
          
          enum {IS=0x00,SEND=0x01,SE=0xF0,NOP=0xF1,GA=0xF9,SB,WILL,WONT,DO=0xFD,DONT,IAC=0xFF};
          enum {ECHO=1,SUPPRESSGA=3,TERMTYPE=24,NAWS=31,NEWENVIRON=39,MSSP=70,COMPRESS2=86,GMCP=201};

In the middle of a mess of organic growth (honestly, if I were still adding to that code, I’d probably end up refactoring it somewhere), one branch of a switch block needs to halt a loop. This isn’t what I’d call particularly good code, but even if it were, the Python equivalent wouldn’t use a switch statement, so there wouldn’t be two levels to break.

github.com

Rosuav/shed/blob/master/sslserver.pike#L52


      
          				write("SSL connection established.\n");
          			}
          		}
          		object readbuf = Stdio.Buffer();
          		sock->set_buffer_mode(readbuf);
          		sock->write("Hello!\n");
          		out: while (1)
          		{
          			while (string cmd = readbuf->match("%s%*[\r]\n"))
          			{
          				if (cmd == "quit") break out;
          				write("cmd: %O\n", cmd);
          				sock->write("Sure, whatever.\n");
          			}
          			readbuf->add(sock->read(1024, 1));
          		}
          		sock->write("Bye!\n");
          		write("Dropping connection %O\n", sock);
          		sock->close();
          	}
          }

Quick and dirty script. The core double-loop here has an outer “wait for data from socket and fill buffer” loop and an inner “split buffer into lines” loop. Upon receipt of a quit command, it needs to break out of both. But this is only a nested loop with this specific design; it could just as easily be written as a single loop, with “if the buffer doesn’t match this pattern, fetch more from the socket and continue” (a regular single-level continue statement), or as a producer-consumer, or any number of other ways. I actually have no idea why I happened to write it like this, beyond that it worked, and the purpose of this was to test something else entirely and I wanted to focus my time on what actually mattered.

So even when the feature exists, it’s not something people use all that frequently. (In contrast, continue, which is used far less frequently than break, shows up hundreds of times in my shed repository alone.) To better explain why you think this is useful, we really need a good example.

Gouvernathor · August 29, 2022, 2:07pm

On that, if you’re referring to the try/except way or the single-use functions way, I think we just disagree on what’s readable and what’s not. If you’re not, please introduce that other workaround.

The try/except adds indenting, which in the middle of several concentric loops, doesn’t help readability. It also requires defining single-use exceptions, which pollute the namespace, and are not particularly clear to understand what their purpose is. It’s the same weird kind of non-error exception as StopIteration : it’s sensible, sure, but not simple to understand.

The single-use function moves execution to another part of the code, which makes it harder to follow than if execution were just kept in the loops’ block. And, coming to think of it, I think that you need to define several single-use functions to emulate my example where you need to break to two different loop levels in a three-loop structure.

I don’t see how these refactoring could be considered more readable and/or maintainable (more about that further down) than the multi-break. I also don’t see how labeled breaks (for example) can harm maintainability.

@Rosuav ok, I understand your argument. I think this is necessary to make reasonably readable code, even if the circumstances when it’s needed are rare. But for example if the feature induces a performance loss to compile any loop, the fact that it’s barely used and there are (dirty) workarounds is very sensible. And of course the implementation cost is to be taken into account, too.
I’ll try to find real examples in code I worked on.

But I think there is another advantage for multi-breaks wrt maintainability which wasn’t considered thus far, and which may answer your question. Imagine in your code you have a simple loop, where you parse the lines of a file, and you break when some condition is verified on the considered line.
Now, say there’s a project update, some new feature, requiring you to parse several files, which turns your single loop into two loops, inside one another. You could simply add the for file in files as outerloop line and indent the existing loop, and turn the break into a break outerloop. The git diff would be very understandable, just 4 green spaces per line, one new line, and one altered break statement.
If you need to go back and forth between these two versions of the code, if you’re not sure, or if you’re doing it in some other branch and you need to maintain both in parallel (whatever) the codes are very similar, there’s only one edited line, one added line, and a bit more indenting.

However, if you use the single-use function way, you need to go back and forth between a version with a simple for loop, and a version which calls a function defined in another part of the code, where the original loop was moved to. That function contains two loops, and the break has been turned into a return, which doesn’t help recognizing it’s the original code that has been moved (that adds more time when checking the commit to see if it adds a bug or contains a typo).
And, you need to figure out which variables you need in the loops, because you will have to pass them to the function, and what information you need to get out of the loop. That isn’t necessary in the multi-break version, so that means it’s easier to upgrade your code to the multi-break version, than to upgrade it to the single-use function version.