Trailing whitespaces in triple-quoted string literals

Trailing whitespaces are invisible in printed code. Some specialized text editors and tools (like git diff in color mode) can highlight trailing whitespaces, but in general they are indistinguisible from empty space after the end of line.

Some editors remove trailing whitespaces on load or save, or can be configured to do this. It is usually good thing, unless it changes the semantic of code.

In Python, trailing whitespaces are ignored except one case: in triple-quoted string literals. Then they are a part of the string value, and removing them changes that value.

For these reasons, trailing spaces are removed in triple-quoted string literals in Java (where they are called text blocks).

Should Python emit a syntax warning for trailing whitespaces in triple-quoted string literals? The warning can be cancelled by adding \n\ at the end of the line.

More general question: should Python emit a syntax warning for invisible, control or ambiguous characters in string literals? It includes CR, TAB and NBSP.

2 Likes

Well, do these trainling whitespaces hurt at all? Is there a security risk for example? If not, why raise a warning?

This should be left to lint utilties IMHO.

9 Likes

It is the same as with any other syntax warning. Is there a security risk in every case? Maybe there are some examples for some warnings in some specific conditions, but not for all and not always. It is not a concern. The rationale is that such code is almost always an error, there are almost no legitimate uses cases for which the warning should be disabled (and the code can be rewritten in other way to avoid warnings if it is needed), and that such errors occur in the code where linters are not used.

The fact is that now Python supports a feature that perhaps should not exist. What are reasons to support trailing whitespaces in triple-quoted string literals? If it wasn’t supported, how would you argue for the need for such a feature?

This feels like we’re moving away from the “consenting adults” principle. If people want to write triple-quoted strings with trailing whitespace, why shouldn’t we let them? If they want to avoid doing so, linters exist that can let them know.

I don’t think it’s at all clear that this feature should not exist. At best you could say that if we were implementing it now, we might have made different choices.

Simply that there’s no reason to make it impossible to express certain strings in triple-quoted form. It may be uncommon usage, but that doesn’t mean it’s never of value.

I’d put the question the other way - given that it is supported, what’s the justification for potentially breaking existing code? Why should we make it potentially harder for people to upgrade Python, just to do something a linter can already do? People can choose what linter warnings to enable - they can’t choose to suppress this if it’s built into the interpreter (they can rewrite their code to avoid it, but that’s not the same).

8 Likes

Is it? I don’t think syntax warnings usually warn about benign cases like this. The documentation describes SyntaxWarning as “a syntax category for dubious syntactic features”. Examples:

  • invalid escape sequences in a regular expression
  • octal escapes with a value larger than 0o377
  • invalid escape sequences in string and bytes literals

Each of these examples point to a likely mistake from the developer. I do not see how trailing whitespace is a mistake TBH.

>>> b"\777"
<stdin>:1: SyntaxWarning: invalid octal escape sequence '\777'
b'\xff'

Maybe we should add a str.dedent() method which removes them :wink:

More often than not, people don’t want it. They write it wrong and it goes unnoticed until someone runs a new version and sees a weirdly formatted message.

Have you encountered patches and PRs with trailing whitespaces? I reviewed and fixed a lot of them, I even made this mistake few times myself. This happens all the time.

One annoying error: pydoc generated output with trailing spaces, and the test contained the expected output as a literal multiline string. Because some developers’ editors removed trailing whitespaces, this string kept breaking until someone replaced the trailing spaces with \x20. But when the output changed, the new output was copied into the tests, and it repeated. Finally, pydoc was changed to trim trailing whitespaces. If trailing whitespaces were forbidden, the original author would notice and fix this, and it would not annoy multiple people for years.

This is why I wrote “probably”. In Java they decided to trim trailing whitespaces, and they already had an experiance of Python and other programming languages. Therefore there is such option, and it may be better than the current Python behavior. I am not insisting that it should change, I am inviting a discussion.

To make it less errorprone. It comes at a price, and I’ve opened a discussion to discuss whether it’s worth the price.

Is it benign? More often than not, it is a mistake from the developer. Even if a multiline string should contain trailing whitespaces, writing them literally causes problems, like in the example above. It is a bug magnet. If trailing whitespaces were added unintentionally, this is a mistake that is easy to make and difficult to notice. If they were added intentionally, it is a fragile code that looks misleading.

Either you want an exact comparison, or you want a comparison that ignores trailing spaces.

If you want an exact comparison, then you need to be able to include the expected output, warts and all, and any kind of trimming is a bad idea, it interferes with your ability to express the precise text.

If you want a trailing-whitespace-ignoring comparison, then that’s up to the tool that’s doing the comparison.

Whichever your preference, a syntax warning doesn’t fix it.

3 Likes

Honestly, no, I’ve never encountered this issue. Maybe I’ve been lucky.

I have written a lot of code with embedded SQL statements using triple-quoted strings. Sometimes, I’ve needed a trailing space to separate two parts of a string:

SQL = """select x, y, z from table """
CONDITION = f"""where x == {val}"""

query = SQL
if val:
    query += CONDITION

That example doesn’t need triple-quoted strings, and could easily be written to not need the trailing space, but I no longer have access to the real code I’ve written in the past to confirm if I ever needed trailing whitespace like this in a more realistic example. But it’s certainly not impossible.

To be honest, I don’t really care that much. I’m happy with the status quo, and I would quite likely never hit the proposed SyntaxWarning. If I ever did, I’d be more likely to be mildly irritated and write my code differently to avoid the warning, than to be grateful that Python had saved me from an error. Maybe the trailing space would be needed, maybe it would have been a (harmless) mistake. The net effect would be that I’d be very slightly less happy with Python - not enough to make a difference, just a “papercut” type of annoyance.

TBH, I’m more frustrated that we’re bothering to have this discussion over something so minor. It seems like a waste of everyone’s time.

I won’t add anything more here. I assume that unless there’s a significant response in favour of the proposed change, it won’t happen (status quo wins) so there’s no need for me to make the case for not doing it. The burden is on you to demonstrate that it’s worth doing, and IMO you haven’t done that yet (and you probably won’t unless you get a reasonable level of community support).

Agreed. There seems to have been a lot of interest in changing triple-quoted strings triggered by the str.dedent proposal. I don’t know why - str.dedent seems like a reasonable and simple change, whereas changing triple-quoted strings seems like an over-reaction, and a violation of the “if it ain’t broke, don’t fix it” principle.

3 Likes

I did a scan of my own code. 582 occurrences of triple quoted strings. 249 of those contained trailing whitespace. As far as I can tell, not a single one of those 249 was a bug. There are a couple of cases where removing the trailing spaces would cause regression test failures. And at least one where it would introduce a bug. A very minor one, but still.

5 Likes

Nobody talked about a comparison that ignores trailing spaces. How is it related to this?

I strongly disagree. Editors should not change source code unless directed to do so by the user.

Could you please show some examples? Examples like this:

"""
CLASSES
    builtins.object
        A
        B
        C
    
    class A(builtins.object)
     |  Hello and goodbye
     |  
     |  Methods defined here:
     |  
     |  __init__()
     |      Wow, I have no function!
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  __dict__%s
     |  
     |  __weakref__%s
    
    class B(builtins.object)
"""

Do you see trailing spaces here?

But you example does not have trailing spaces. It has a space followed by closing quote, it is not that’s not what I was talking about.

I talked about trailing whitespaces of the physical line, like in the example in the previous message (Trailing whitespaces in triple-quoted string literals - #14 by storchaka). Do you see potential issues with this example?

Oh, so you mean the sequence space-newline in a triple-quoted string?

Then yes, I can easily imagine a case where I’d want that - ''.join(multi_line_str.splitlines()) to format a long string that’s split into multiple lines for readability, but which wants to be one line for processing.

It may not be common, nor the only (not even the best, maybe) way of writing that. But it’s legitimate and working code that would be a false positive for this warning.

I could see them the same way I’d see trailing spaces in any other block of text, by highlighting the text and looking for highlighted spaces.

But more to the point, why would it matter that there were trailing spaces? Any possible problem that I can imagine being caused by those trailing spaces, I’d classify as a bug in the code consuming the block of text. After all, such code could just as easily be consuming text that came from some_path.read_text(), which could contain trailing whitespace on lines.

3 Likes

Somewhat germane to the discussion is that it is occasionally annoying that editors strip trailing whitespace. If it helps anyone here, the following is what I do for test assertions in order to retain multi-line strings: https://github.com/pypa/hatch/blob/9d62c6e34233e37d2932cbc21affe56dee86e814/tests/cli/config/test_set.py#L104

No.

No.

While I would love auto-dedenting in doc strings, normal triple-quoted strings should not be changed.

2 Likes

No. For example, I have a test which compares the output of a reStructuredText table (generated by a third-party library), with the expected result stored in a triple-quoted string; it has trailing spaces.

If they were stripped from the string, the actual result would no longer match the expected result.

It will happen less often in the CPython codebase now we’ve added the trailing-whitespace lint/fix.

3 Likes

No. That is a job for editors and linters.

Also, you cannot know in advance what users actually intend to have in their strings, so you can’t just issue a blanket rejection. This would almost certainly cause some pain for existing code in the wild that currently works just fine.

FWIW, trailing CR can be legitimate. We use them in templates when building separate sections that end-up being concatenated. There are also use cases for strings with trailing whitespace when we later append additional text on the line.

Lastly, this all feels judgmental, paternalistic, and unnecessary. It seems like an invented problem, not an actual problem.

5 Likes