Trailing whitespaces in triple-quoted string literals

Maybe we should add a str.dedent() method which removes them :wink:

More often than not, people don’t want it. They write it wrong and it goes unnoticed until someone runs a new version and sees a weirdly formatted message.

Have you encountered patches and PRs with trailing whitespaces? I reviewed and fixed a lot of them, I even made this mistake few times myself. This happens all the time.

One annoying error: pydoc generated output with trailing spaces, and the test contained the expected output as a literal multiline string. Because some developers’ editors removed trailing whitespaces, this string kept breaking until someone replaced the trailing spaces with \x20. But when the output changed, the new output was copied into the tests, and it repeated. Finally, pydoc was changed to trim trailing whitespaces. If trailing whitespaces were forbidden, the original author would notice and fix this, and it would not annoy multiple people for years.

This is why I wrote “probably”. In Java they decided to trim trailing whitespaces, and they already had an experiance of Python and other programming languages. Therefore there is such option, and it may be better than the current Python behavior. I am not insisting that it should change, I am inviting a discussion.

To make it less errorprone. It comes at a price, and I’ve opened a discussion to discuss whether it’s worth the price.

Is it benign? More often than not, it is a mistake from the developer. Even if a multiline string should contain trailing whitespaces, writing them literally causes problems, like in the example above. It is a bug magnet. If trailing whitespaces were added unintentionally, this is a mistake that is easy to make and difficult to notice. If they were added intentionally, it is a fragile code that looks misleading.

Either you want an exact comparison, or you want a comparison that ignores trailing spaces.

If you want an exact comparison, then you need to be able to include the expected output, warts and all, and any kind of trimming is a bad idea, it interferes with your ability to express the precise text.

If you want a trailing-whitespace-ignoring comparison, then that’s up to the tool that’s doing the comparison.

Whichever your preference, a syntax warning doesn’t fix it.

3 Likes

Honestly, no, I’ve never encountered this issue. Maybe I’ve been lucky.

I have written a lot of code with embedded SQL statements using triple-quoted strings. Sometimes, I’ve needed a trailing space to separate two parts of a string:

SQL = """select x, y, z from table """
CONDITION = f"""where x == {val}"""

query = SQL
if val:
    query += CONDITION

That example doesn’t need triple-quoted strings, and could easily be written to not need the trailing space, but I no longer have access to the real code I’ve written in the past to confirm if I ever needed trailing whitespace like this in a more realistic example. But it’s certainly not impossible.

To be honest, I don’t really care that much. I’m happy with the status quo, and I would quite likely never hit the proposed SyntaxWarning. If I ever did, I’d be more likely to be mildly irritated and write my code differently to avoid the warning, than to be grateful that Python had saved me from an error. Maybe the trailing space would be needed, maybe it would have been a (harmless) mistake. The net effect would be that I’d be very slightly less happy with Python - not enough to make a difference, just a “papercut” type of annoyance.

TBH, I’m more frustrated that we’re bothering to have this discussion over something so minor. It seems like a waste of everyone’s time.

I won’t add anything more here. I assume that unless there’s a significant response in favour of the proposed change, it won’t happen (status quo wins) so there’s no need for me to make the case for not doing it. The burden is on you to demonstrate that it’s worth doing, and IMO you haven’t done that yet (and you probably won’t unless you get a reasonable level of community support).

Agreed. There seems to have been a lot of interest in changing triple-quoted strings triggered by the str.dedent proposal. I don’t know why - str.dedent seems like a reasonable and simple change, whereas changing triple-quoted strings seems like an over-reaction, and a violation of the “if it ain’t broke, don’t fix it” principle.

3 Likes

I did a scan of my own code. 582 occurrences of triple quoted strings. 249 of those contained trailing whitespace. As far as I can tell, not a single one of those 249 was a bug. There are a couple of cases where removing the trailing spaces would cause regression test failures. And at least one where it would introduce a bug. A very minor one, but still.

5 Likes

Nobody talked about a comparison that ignores trailing spaces. How is it related to this?

I strongly disagree. Editors should not change source code unless directed to do so by the user.

Could you please show some examples? Examples like this:

"""
CLASSES
    builtins.object
        A
        B
        C
    
    class A(builtins.object)
     |  Hello and goodbye
     |  
     |  Methods defined here:
     |  
     |  __init__()
     |      Wow, I have no function!
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  __dict__%s
     |  
     |  __weakref__%s
    
    class B(builtins.object)
"""

Do you see trailing spaces here?

But you example does not have trailing spaces. It has a space followed by closing quote, it is not that’s not what I was talking about.

I talked about trailing whitespaces of the physical line, like in the example in the previous message (Trailing whitespaces in triple-quoted string literals - #14 by storchaka). Do you see potential issues with this example?

Oh, so you mean the sequence space-newline in a triple-quoted string?

Then yes, I can easily imagine a case where I’d want that - ''.join(multi_line_str.splitlines()) to format a long string that’s split into multiple lines for readability, but which wants to be one line for processing.

It may not be common, nor the only (not even the best, maybe) way of writing that. But it’s legitimate and working code that would be a false positive for this warning.

I could see them the same way I’d see trailing spaces in any other block of text, by highlighting the text and looking for highlighted spaces.

But more to the point, why would it matter that there were trailing spaces? Any possible problem that I can imagine being caused by those trailing spaces, I’d classify as a bug in the code consuming the block of text. After all, such code could just as easily be consuming text that came from some_path.read_text(), which could contain trailing whitespace on lines.

3 Likes

Somewhat germane to the discussion is that it is occasionally annoying that editors strip trailing whitespace. If it helps anyone here, the following is what I do for test assertions in order to retain multi-line strings: https://github.com/pypa/hatch/blob/9d62c6e34233e37d2932cbc21affe56dee86e814/tests/cli/config/test_set.py#L104

No.

No.

While I would love auto-dedenting in doc strings, normal triple-quoted strings should not be changed.

2 Likes

No. For example, I have a test which compares the output of a reStructuredText table (generated by a third-party library), with the expected result stored in a triple-quoted string; it has trailing spaces.

If they were stripped from the string, the actual result would no longer match the expected result.

It will happen less often in the CPython codebase now we’ve added the trailing-whitespace lint/fix.

3 Likes

No. That is a job for editors and linters.

Also, you cannot know in advance what users actually intend to have in their strings, so you can’t just issue a blanket rejection. This would almost certainly cause some pain for existing code in the wild that currently works just fine.

FWIW, trailing CR can be legitimate. We use them in templates when building separate sections that end-up being concatenated. There are also use cases for strings with trailing whitespace when we later append additional text on the line.

Lastly, this all feels judgmental, paternalistic, and unnecessary. It seems like an invented problem, not an actual problem.

5 Likes

FWIW, Emacs has global-whitespace-mode which I always enable, and which highlights trailing whitespace. I have a personal convenience function that strips TWS from every line in a file, but it’s something I have to explicitly do in my editor, which feels like the right responsibility. And it does occasionally go wrong, such as cases where you are comparing big blocks of text and the trailing whitespace is significant [1].

ObWSJ: You can pry those ^L’s from my Emacs RSI’d fingers! :stuck_out_tongue_winking_eye:


  1. as I remember in some Mailman and/or email package tests ↩︎

My vote is also “no”. I don’t think Python itself should emit syntax warning, but I would expect linters or IDEs to warn users about this.

2 Likes

In your pydoc story, you seemed to be bothered that a test failed because of trailing whitespace differences. If the test was changed to compare in a way that ignores trailing spaces, then that would fix the test.

If you don’t want that, because you do care about what specific trailing whitespace there is, then you need your notation for the expected value to be capable of describing the trailing whitespace.

1 Like

Oh, there’s nothing really interesting. The majority is last-line indentation. Most of the rest are just editing remnants of no consequence. Still, it’s 249 potential SyntaxWarning’s to deal with.

:-1: from me, too. As a maintainer of a lot of legacy code, I’d just not want to go through everything and decide where it’s wanted and where it’s just a minor cosmetic irritant for the ~2% of people, that might actually notice.

About stripping common indentation in the lines of triple quoted blocks, now… :smirk: