Good reason to do the refactor first, as its own commit, and (if necessary) backport that change to the other version, since it causes no harm.
So, on the base version, you would have a single loop in a function by itself, for no other reason than resembling the update version ? I find that even worse, because it makes no sense on its own, so it impedes code readablity on the base version. Yes, in my view it does cause harm.
Whatâs more, you may be doing a pull request from a fork of the repo, and not have access to the master branch, where the refactor of the base version would take place. In that context, even if you refactored in a single commit and did the reste in the subsequent commits, the diff of the whole PR is far less readable than using the multi-break.
This is all incredibly theoretical on your part. Do you have any actual examples of real code that would be improved by this? Itâs easy to invent theoretical objections to theoretical alternatives to theoretical problems.
Any situation where the lines of a file are iterated through, and where the loop contains a break, is subject to what Iâm describing. Typically when implementing a multi-file search, like most IDEs do, or when looking for files with trailing whitespace, whatever. I donât find that an incredibly theoretical or uncanny situation, and itâs a very simple example.
I find your positions quite contradictory, because on the one hand youâre defending seemingly obvious and better solutions to a problem, and on the other hand youâre saying that problem never occurs outside of theoretical scenarios. So, which is it ? Are the try/except way and the single-use function way good and widely used solutions, or is the problem non-existent ?
Itâs not contradictory. The problem does not exist because there are MANY alternatives, including refactoring, exceptions, etc, etc, etc. Youâre trying to say that we need multi-level break, but so far, you havenât shown any examples that canât be as well (or better) handled with other techniques.
So if you want to dispute my stance, show an example.
Iâm referring to reconsidering the logic - something thatâs broader than any form of simplistic, âone size fits allâ solution. I agree the try/except approach is not good. I disagree strongly with your idea that naming a block of code (i.e., âsingle use functionsâ) is unacceptable - naming a subexpression in a complex calculation is fine, why is naming a block of code in a complex loop any less acceptable? And yes, Iâd often refactor by naming the inner loop. Or I might refactor by merging the 2 loops into one using a generator. Or something else. Basically Iâd think about the code and what it was trying to do, and as I say, Iâd nearly always find a better approach.
Think of it in terms of âhaving to break out of multiple loops is a code smell, indicating that you should re-think your approachâ.
It does if you simply call the function stuff
or something equally passive-aggressively unhelpful. But if you think about the logic, part of that is thinking about how to describe parts of the process, and that usually results in a good name.
Of course not all code warrants that much effort. A lot of my code is quick hacks. But even in that case, I can usually find a better way than âexit from 2 loopsâ.
I donât think everyone piling in with their code samples is particularly helpful. The most recent example I had, though, was a âtry to fetch a URL 10 times before giving upâ loop, inside the body of a function. I wanted to break out if one of the tries returned a 304 Not Modified status. A double-break would have worked. But in reality, stopping and thinking for a moment and factoring out the inner loop into a fetch_url
function was far better, named the operation in a way that was more readable, and made the outer loop shorter and hence more readable itself.
OK, so youâve been lucky enough to not have had my experience. How many real life examples of code using labelled breaks have you seen (obviously they wouldnât have been in Python, as thereâs no labelled break in Python)? Were they all easy to read, and easy to maintain? How big (in terms of numbers of lines) were the loops? My experience suggests that the feature is not a particular problem for short loops (less than 10 lines, like most constructed examples) but scales badly to loops that are tens or hundreds of lines long[1].
However, Iâll also note that this is irrelevant (see your comment above about getting back on topic). Python doesnât have labelled breaks (or any kind of multiple breaks). The debate isnât about how they might be problematic, itâs about what value they have that justifies adding them to the language. The onus is very much on you (and anyone else in favour of the proposal) to argue why they benefit users, in the first instance. If we get to the point where thereâs a clear understanding of the benefits, then (and only then) is there a point to discussing whether the (perceived) problems are important enough to outweigh the benefits.
So please, can we stick to the point here. What are the benefits of this syntax? Clearly it lets you do something that at the moment canât be done without some form or refactoring of the code. But how important is that? How much real world code (not theoretical examples, or code written to demonstrate a point) would be improved by this feature? Even if I concede (which I donât - but as I say thatâs not the point yet) that multi-level breaks are perfectly fine for readability and maintainability, you still need to demonstrate that thereâs a need, and itâs not just a neat idea that no-one actually needs. Otherwise the status quo wins.
Yes, I know loops that long are unmaintainable anyway. Thatâs why having some pressure to refactor them helps⌠âŠď¸
Thereâs always fileinput for that. Thatâs a perfect example of factoring out the double loop into an iterator.
Sure, but the well-named function still moves a chunk of code from one place to another, which doesnât help git diff simplicity or overall readability, regardless of its name.
Also, your solution of using a single generator instead of several loops can help in several examples, but not in ones where 1) you need to break some of the loops but not all of them, or 2) you need to do stuff inside the inner loops, yet outside the outer ones.
My point was not that itâs unacceptable, or a bad practice, rather that it being the only solution is a bad thing. If you want to define single-use functions, itâs good that the language allows you to do so (and it can include much more advantages listed above by someone else, such as unit tests), but it should not be the only solution.
Anyway, that was my point, but yes I hear your point about multiple solutions for multiple contexts.
Iâm not sure I agree with that, but then again itâs not surprising since I guess itâs based upon compiling a lot of examples you encountered (also called âexperience â) so it would be hard for you to come up with a hard proof anyway.
Sure, makes sense.
Itâs not the ONLY solution. Multiple solutions have repeatedly been given. Are there situations where the current half-dozen solutions arenât good enough, and that having âonlyâ this handful of solutions is a problem? You have given no such indication.
But I wouldnât consider the needing-to-break-from-nested-loops case as an example of this : we donât choose to separate it from the rest of the code just to make it more understandable, weâre doing it because the syntax gives us no other choice.
Exactly!!
Itâs not the ONLY solution. Multiple solutions have repeatedly been given. Are there situations where the current half-dozen solutions arenât good enough, and that having âonlyâ this handful of solutions is a problem? You have given no such indication.
They are all ways of getting around the problem, not actual solutions presented by the programming language, which should be the case. You either:
-
need to refactor it into a function, which you would have done already if you wanted to do that for reasons other than achieving this functionality. It also doesnât provide a way to use multilevel continue and you would need one function for each loop level so that you can return at different points in the loop and not just the outermost like @steven.daprano 's workaround.
-
need to use things that werenât meant to be used for this purpose such as a try and except block. If the block was already there, this means you will pollute it by adding an exception (or two for break and continue) thatâs not really an exception and make your code harder to understand. If it wasnât, you will push everything one indent forward and make your code harder to understand at a glance because it will seem like you are catching a real exception but arenât. On top of it all, this workaround also suffers from the same problem as workaround 1, which is that you need one try-except block at each loop level in order to have full control over the flow.
-
need to use flag variables, which also add a bunch of unnecessary lines of code as well as useless variables
-
other horrible solutions that donât allow for full loop control and/or continue and/or require a bunch of lines of code or refactoring or whatever else
All of these workarounds, which is what they actually are as opposed to actual solutions, have very clear drawbacks. They are forceful and impose redesigns, pollute your code and are not quick to implement or remove.
Named and numbered break/continue have the following advantages:
- Fine control. At any loop level, you can fully control the flow of any preceding loop
- Allows for both break and continue
- Extremely quick to implement and remove from code
- Clear, unambiguous and concise. Does not increase the indent level
- Is not a makeshift, jury-rigged workaround and does not force you to change your code in any way
- Not ONE single line of code is added
The fact that multiple other languages have implemented this functionality proves beyond any doubt that there is demand for it. Even 15 years ago, when Guido originally rejected it, he said it had already been brought up several times. He argued:
⌠before you know it you have an incredible mess on your hands of unintelligible code.
I think itâs abundantly obvious that the alternatives are what actually lead to messy, unintelligible code. This can be clearly seen because:
- They require many more characters and several more lines of code as opposed to a single integer or âas [loop_name]â added to a line that already existed
- They raise the indent level, particularly if, for some reason, you need to control the loop flow at more than one loop level
- They use Python functionality that wasnât really meant to be used for such purposes
- They force you to refactor your code in ways you didnât want to
- They invariably involve adding more variables, if clauses, etc. making your code more polluted and harder to read
I honestly canât think of a single real advantage these solutions have over labeled or numbered breaks.
I donât agree with the premise that âyou would have [refactored] alreadyâ here. In large codebases with many contributors it can be common that over time the body of a loop will tend to grow as more and more gets added but no one chooses to refactor it until their hand is forced in someway. Perhaps the original decision not to have a single use function was not unreasonable at the time it was made. Later on though as more and more code is added it gets clearer and clearer that it should be refactored but at the same time the more it grows the harder it gets to actually do the refactoring. The reason I dislike the idea of labelled break is probably for precisely the opposite reason that you want to propose the idea: I donât want to provide more opportunities to postpone what I would consider to be the necessary refactoring (that very often should have been done a long time ago).
I canât actually picture in my mind real maintainable code where labelled break is significantly better than a reorganisation. When I try to imagine this being used in practice I just picture it being used to extend the kind of spaghetti code that I already wish people didnât write in the first place. Perhaps I donât have the imagination to see the kind of situation where this could be used in a good way but I can definitely imagine it being used in a bad way.
Of course if real life examples were provided then we could discuss the pros and cons in those cases without depending on my imagination.
Hold on a moment. Youâre starting from the assumption that âI need multi-level breakâ is the problem/goal. People donât pay programmers to write multi-level breaks; they pay programmers to solve problems - not problems like âwhat is love?â, but practical problems. So start with actual code that needs to be written, and THEN figure out what the best way to write it is; donât start with âwe need multi-level breakâ, and then justify it by saying âif we need multi-level break, then multi-level break is the best way to write multi-level breakâ.
Learn proper refactoring and your future self (and everyone else who read your code, including your coworkers who review it) will thank you. Do as Serhiy: create a function and use it only once.
(This is my first and only post in this thread; Iâll mute it right away.)
Recommended further reading:
- Refactoring â Improving the Design of Existing Code (Fowler): Refactoring
- The Practice of Programming (Pike, Kernighan): The Practice of Programming
BTW, did I mention Iâm -1 to this multilevel labelled break proposal?
No, they are solutions. You just donât like the solutions.
And the drawbacks are not âvery clearâ. Python is a 30+ year old mature language, used by thousands or tens of thousands of programmers, with millions of lines of code. If the drawbacks were so obvious, we would have added this feature 20 years ago.
So if these drawbacks exist, they are minor, rare, a matter of opinion, or all three.
Good! Because if you find yourself writing a big nested loop where you need to jump out of multiple levels, you should redesign it.
That is also an argument for unrestricted goto with the ability to jump anywhere. âBut someday I might need the fine control of being able to jump into the middle of a function!â
Um, okay, for some definition of âneedâ.
Letâs get back to this âfine controlâ to jump out of multiple loops. Do you have actual examples of real code that needs this âfine controlâ, or is this purely hypothetical?
Iâve written a lot of loops, and Iâve never wanted this. So letâs see some examples of real code that needs this feature.
And the fact that for every language which has included this feature, a dozen languages have not, proves âbeyond any doubtâ (your words) that the feature is a mistake, no matter how much demand there is. Right? No?
Neither am I that we should copy languages that have this feature. Languages have all sorts of unnecessary bloat and cruft.
Letâs see real-world examples, not made up hypotheticals.
-
Solutions that work now are better than solutions that rely on you waiting a year or three until you can use it.
-
Functions, flag variables and exceptions already exist. They need no extra documentation or testing, and no extra code in the interpreter to make it work. So we avoid all the maintenance headache from introducing a new syntactic feature.
-
Refactoring big, complex, monolithic blocks of code with multiple nested loops into functions will simplify the code and make it easier to understand and easier to maintain.
Do you still disagree? Then show us real-world examples, not made up hypotheticals.
I donât want to provide more opportunities to postpone what I would consider to be the necessary refactoring
Please excuse me but âI donât want this feature to be added because otherwise [I, people] wonât have an incentive not to be lazy anymoreâ is a pretty poor argument.
I canât actually picture in my mind real maintainable code where labelled break is significantly better than a reorganisation.
They are not mutually exclusive. People can reorganize their code all they want and if they find they still need labeled breaks, they should be there for them.
As I explained, a reorganization does not solve all issues. You canât have full loop control with both continue and break unless you split it in completely unnatural ways with lots of handle_this_loop and handle_that_loop functions. If you are doing all this just to get some pretty basic functionality, I think itâs pretty clear the language is lacking in some way.
Learn proper refactoring and your future self (and every else who read your code, including your coworkers who review it) will thank you. Do as Serhiy: create a function and use it only once.
Letâs get back to this âfine controlâ to jump out of multiple loops. Do you have actual examples of real code that needs this âfine controlâ, or is this purely hypothetical?
So start with actual code that needs to be written, and THEN figure out what the best way to write it
Do you have actual examples of real code that needs this âfine controlâ, or is this purely hypothetical?
Here is a minimal example based on my real-life problem, which I solved using flags in actuality. How would you refactor the code below? Letâs compare and see which one is cleaner, easier to read and implement. I already know for a fact, beyond all doubt that this looks MUCH better than using flags, letâs see how your solution compares.
for league in all_basketball_leagues as league_loop:
number_of_incomplete_players = 0
for player in league as player_loop:
for season in player.seasons as season_loop:
number_of_incomplete_game_data = 0
for game_stats in season:
stats_successfully_processed = process_game_stats(game_stats)
# key table not in game_stats or another problem
if not stats_successfully_processed:
number_of_incomplete_game_data += 1
if number_of_incomplete_game_data > 5:
number_of_incomplete_players += 1
if number_of_incomplete_players > 60:
# too much missing data, league unusable
continue league_loop
else:
# too much missing data, season unusable
continue season_loop
# if key table missing, skip game
else:
continue
# no key tables missing, check game stats
number_of_missing_values = 0
for table in game_stats:
if not check_table_integrity(table):
number_of_missing_values += 1
if number_of_missing_values > 5:
number_of_incomplete_game_data += 1
if number_of_incomplete_game_data > 5:
continue season_loop
else:
continue
# table is perfect
else:
do_stuff()
Your logic is incredibly opaque here. Why does incomplete game data sometimes skip the entire league, but as long as you donât get too much all at once, youâll only skip individual seasons? This still feels extremely artificial, but Iâll give you the benefit of the doubt there; still, it does seem like thereâs quite a few utterly unrelated things going on here (for instance, there are two places where you increment number_of_incomplete_game_data but they have different loop-skipping behaviour).
This has a strong smell of standard multi-level validity done weirdly with ad-hoc constraints. If this is the best example you can come up with, I would have to say âwelp, sometimes code is messy, deal with itâ. This code is BEGGING to be refactored in some way, to make the logic clearer; and only you, the one who designed its constraints, can figure out how.
A 50-line loop body and eight levels of indentation (assuming this is inside a function) and this is the good version? Having a multi-break wonât didnât fix that. All the complexity in that code stems from trying to do ad-hoc relational querying with imperative code, at the same time as pre- and post-processing. You should compute all the game stats up front unconditionally, filter out the ones you donât want separately after the fact, and only then âdo stuffâ.
In very broad strokes, this is more like what I would expect that code to look like:
def compute_all_stats(leagues: Iterable[League]) -> Iterator[GameStat]:
# this could probably be improved with itertools
for league in leagues:
for player in league:
for season in player.seasons:
for game_stats in season:
yield process_game_stats(game_stats)
def filter_stats(stats: Iterable[GameStat]) -> list[GameStat]:
result = []
# filter out ones you don't want here, maybe even use a real database
# to help make real queries -- SQLite is built right in to Python
return result
all_stats = compute_all_stats()
filtered_stats = filter_stats(all_stats)
for stat in filtered_stats:
do_stuff(stat)
Itâs fewer lines, much less indentation, and congratulations, now you have some simpler, smaller functions that you can actually unit test in isolation.
Your logic is incredibly opaque here. Why does incomplete game data sometimes skip the entire league, but as long as you donât get too much all at once, youâll only skip individual seasons? This still feels extremely artificial, but Iâll give you the benefit of the doubt there; still, it does seem like thereâs quite a few utterly unrelated things going on here (for instance, there are two places where you increment number_of_incomplete_game_data but they have different loop-skipping behaviour).
Like I very clearly said, it is based on my real-life example. I changed a lot of things because otherwise it would be too complex but in essence thatâs what my problem is. If there is too much missing data of a certain type, I skip the table, player, season or league accordingly.
âwelp, sometimes code is messy, deal with itâ
If this were about a completely novel idea that had never been implemented in any language before, sure, I would agree that this could be such a case. However, labeled loops are nothing new. They have been implemented in several prominent, respected and successful programming languages without any adverse effects. If this is the case, it must be so for a reason. Maybe in the end it wonât be implemented in Python but the idea is very valid and solves a specific and real problem.
A 50-line loop body and eight levels of indentation (assuming this is inside a function) and this is the good version? Having a multi-break
wonâtdidnât fix that. All the complexity in that code stems from trying to do ad-hoc relational querying with imperative code, at the same time as pre- and post-processing. You should compute all the game stats up front unconditionally, filter out the ones you donât want separately after the fact, and only then âdo stuffâ.
In very broad strokes, this is more like what I would expect that code to look like:
First of all, this is cheating as you did not at all replicate the behavior of the other script, you just changed the algorithm completely. More importantly, processing everything unconditionally is not even close to an acceptable solution in my use case. I am processing a lot of HTML in process_game_stats() and use a good deal of regex. To make matters worse, my files are in a very slow external hard drive in many different folders and so are glacially slow to scan. There are multiple dozen basketball leagues from different countries, some of which have over 5000 players, each of whom has played hundreds of games on average. Some, like Lebron James, have played over 1300. All in all, I am looking at literally millions of scraped HTML pages.
Again, like I said, even if I didnât have a real use case, your proposed solution is also not a solution at all as you simply changed the whole algorithm in order to avoid replicating the behavior that I showed. How would you do the exact thing that I did, meaning control the flow of the program in that exact way using one of the proposed solutions in this thread or another of your own? In other words, how do you solve the loop flow problem without labeled loops?
First of all, this is cheating as you did not at all replicate the behavior of the other script, you just changed the algorithm completely .
A suggestion to change how you are doing things was the point, yes.
More importantly, processing everything unconditionally is not even close to an acceptable solution in my use case
So if you look closely, you will see that the compute_all_stats
is a generator, which means it evaluates lazily. The filter_stats
kind of presumed that everything could be collected in one big pass. But it could be a generator too (it would need to keep around and build up an appropriate data structure for incremental filtering). The point is to tease things apart into more, smaller, discrete, testable chunks, which can certainly be done, even if you require lazy (generator) evaluation. I would not keep things in a single loop has you have, under any circumstances, because that tangled business logic is unmaintainable.
You seem really obsessed with replicating control flow when all that really matters is that results match, and satisfy any other requirements (e.g. being able to stream process incrementally instead of one batch). To be honest, this whole thread seems like another example of the x-y problem:
But it could be a generator too (it would need to keep around and build up an appropriate data structure for incremental filtering)
I donât see how this would allow me to skip over a very significant portion of the data in the compute_all_stats loops, which is what I need. The whole thing will take days
You seem really obsessed with replicating control flow when all that really matters is that results match, and satisfy any other requirements (e.g. being able to stream process incrementally instead of one batch). To be honest, this whole thread seems like another example of the x-y problem
This is strictly about flow control. This isnât the first time I have wished for a way to continue or break outer loops and it bothered me having had to use workarounds. It always seemed to me that labeled or numbered breaks felt as pythonic as anything could be and so I thought I would suggest it and see what people would think.