Repair indentation

Juandev · January 4, 2024, 1:22pm

What are the approaches to repair indentation? Let’s say I receive a code from someone else, do I have to understand the code to be able to repair indentation? How does the Python parser detect that there is bad indentation?

elis.byberi · January 4, 2024, 1:25pm

Yes, you have to know the purpose of the code.

Stefan2 · January 4, 2024, 1:49pm

The correct way to repair their indentation is to tell them to repair it.

tjreedy · January 4, 2024, 5:03pm

Two kinds of ‘bad’ indentation:

Syntactically incorrect: a) inconsistent # of spaces; b) bad mix of tabs and spaces; c) missing required indent. Parser can detect, along with other syntax errors.
Legal but semantically incorrect. Parser cannot detect; reader might detect from logic, knowing purpose of code.

kknechtel · January 4, 2024, 10:18pm

Yes. It’s possible to have the same lines of code, indent them differently, and have a different, legal program that does something else.

It can only detect indentation that is against the rules of the language. It detects the problem by… trying to parse the code, and then realizing that the indentation is against the rules. It’s pretty straightforward.

sgrey · January 4, 2024, 10:42pm

It depends on what’s wrong with the indentation. If someone screwed up and just aligned all text to the left, then you have to read the code and indent it properly yourself. Or if someone broke loops and conditionals you would need to read the code most of the time.

If you are in a situation where you just have different system, like you get a file from someone who uses spaces, but you use tabs or someone added incorrect whitespace symbol somewhere, you can try running reindent script that should come with the python installation

Some IDEs and tools can also try and reindent the code for you, so maybe you can try that way. PyCharm should be able to and VSCode might have a plugin for that or something built-in. As long as you don’t have indentation that totally incorrect semantically it should work out.

fungi · January 4, 2024, 11:02pm

Let’s take a trivial example:

x = 0
while x < 10:
x += 1
return x

That’s (obviously) broken and unparseable. If you understand basic
algorithmic paradigms, then you can assume the author probably meant
this:

x = 0
while x < 10:
    x += 1
return x

That will parse successfully and return a value of 10. However, this
is also valid Python and will return a value of 1:

x = 0
while x < 10:
    x += 1
    return x

They differ only in which lines are indented and by how much, but
which way you “correct” the missing indentation will influence the
behavior of the resulting program.

I agree with the other responses, how exactly the indentation is
broken will have a significant bearing on whether it’s even possible
to know what the corrected form of your program should be.

hansgeunsmeyer · January 4, 2024, 11:25pm

The reader might also detect it – in rare cases – from the overall code style, for instance the way empty lines are distributed inside functions. Or from violating frequently used usage patterns or idioms.

I wonder how good current LLMs are in detecting semantically incorrect indentation if given reasonably long, otherwise well-written files as input.

I ran a quick & dirty test. In one of my files I introduced one semantically incorrect indentation, and yes, ChatGPT was able to pinpoint it immediately after I explicitly asked it “Consider the following code. Could there be any lines where semantically the indentation is wrong?” I gave it a 370 line long Python script that also contained this code:

    def step(self, *moves: str | P, initial_state: list[str] | None = None) -> list[str]:
        """
        Apply a sequence of moves (permutations) to an initial state.

        If initial_state is None, the moves are applied to the identity permutation.
        Returns the resulting state.
        """
        state = initial_state or self.identity()
        for m in moves:
            move = self.allowed_moves.get(m, m)           
        state = move(state)
        return state

Response (partial):

Syntactically, the code appears to be well-indented. However, there is a potential semantic issue in the step method on line 52. Specifically, the comment on line 51 suggests that the loop should be applied to each move, but the indentation of the loop body on line 53 might not align with the intended logic. … It seems like the loop body (state = move(state) ) might be outside the loop due to incorrect indentation.

So, the AIs are getting there. And they exploit our comments/docstrings!

The examples that @fungi gave are ambiguous and would seem to be much more difficult - in principle they are, of course, because there is no further context, but in practice they are less ambiguous than may appear at first. For instance

while x < 10:
    x += 1
    return x

is almost certainly not intended. If intended then the code (code style) is very bad. It’s so “bad” that ChatGPT, when asked to comment, automatically corrects this (without at first noticing that it corrects it!) to

while x < 10:
    x += 1
return x