PEP 822: Dedented Multiline String (d-string)

I don’t think that’s true.
The entire point of d-strings is that they make it easier to work with text blocks inside indented python code by changing the indentation logic, so it’s specifically, deliberately inconsistent with the other string types.

I think the important points to consider when thinking about the indentation are not if they are consistent, but

  • How easy it is to read indented blocks inside already indented python code
  • How easy it is to write blocks of text in already indented python code
  • How easy it is to compose together these blocks of text

What your answers are determine where/how you think d-string should indent the text.

I think having to put a \ at the start of practically every d-string is just inviting bugs from the 1/N chance people forgetting to add and not noticing because it “look right”. I expect N~50. Having to remember doing this will make it harder to write, and having to notice the \ or lack thereof will make it harder to read correctly.


Honestly, I almost never see a \ line break in python code, especially “good” python code. I personally find it extremely distasteful. The PEP 8 style guide also discourage its use, allowing it only as “last resort”

The preferred way of wrapping long lines is by using Python’s implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.

Backslashes may still be appropriate at times. For example, long, multiple with-statements could not use implicit continuation before Python 3.10, so backslashes were acceptable for that case:

So I think for most python programmer using \ will feel very foreign.

If that is the case, I believe solutions that involve having to add extra indentation deliberately will be always be more natural, clear, and obvious then solutions that require using \ to remove indentation.

12 Likes

It’s completely unclear what you’re referring to. Please use the quote feature.

What you’re suggesting is to design this feature in complete isolation, and ignoring all the ways this will cause permanent friction, through irregularity of the shiny new feature you’ve just designed with the rest. If Python did that, you’d get a patchwork of barely-fitting pieces, not a coherent language.

Irregularity is a permanent cost, because each inconsistent behaviour decreases the ability of our puny brains to keep all relevant context “in memory”. So it shouldn’t be paid lightly[1].

Well, you’re assuming that it’s a common case that people need to remove an initial newline. I question that it’s common, much less so ubiquitous that it outweighs the other costs.

Composition is another big one. One of the most fundamental strength of Python that new users learn basically on day 0, is how powerfully array-indexing can be composed, e.g. a[1:N] + a[N:-1] == a[1:-1], without having to do special-casing around N. Messing with newlines means breaking this kind of composability for different kinds of strings (and suddenly the string component that you turned into a d-string gets attached incorrectly to something else in other places of the code far from the definition site).

In contrast to the bug you describe (which only affects people that require to remove the initial newline), this will hit a far larger set of users – anyone doing concatenation of multi-line strings – who simply “forgot” (due to irregularity of the feature) to insert an extra newline when converting something to a d-string.

Sure. That still doesn’t make it a good enough reason to break the regularity. You only have to “pay” the \ if you feel even more strongly about an initial newline[2].


  1. that’s not to say irregularity is never justified, but the benefits need to offset the costs. ↩︎

  2. completely aside from the second variant I showed to avoid the newline that doesn’t need a \. ↩︎

2 Likes

You argue for consistency for string types that make dedent feature difficult to use in common use cases.

The dedent feature should be designed to be usable for the 99.9% of use cases, and consistency is a secondary consideration.

Losing the leading \n seems obviously a very good idea.
Keeping the tailing \n is, I think the most common use case.

Needing to use \ for any use case I hope can be avoided.
I don’t like to see tailing \ in my code.

7 Likes

Someone replied to a topic you are Watching.

The entire point of d-strings is that they make it easier to work with
text blocks inside indented python code by changing the indentation
logic, so it’s specifically, deliberately inconsistent with the other
string types.

Rather than re-use triple quotes for this kind of string, I would rather
the syntax look completely different. Having the contents of the string
interpreted in radically different ways depending on the presence or
absence of a d prefix is rather too subtle for my taste.

A while ago I came up with this:

string s:
1 Like

In past threads, we also considered the idea of using from __future__ import dedent_string_literals instead of the d-prefix.
So if future import had been chosen there, I would have designed the rules to prioritize maximum compatibility with all multiline string literals.

However, people chose the d-prefix over future import. We do not need to replace all """...""" with d-strings. I designed the rules with priority given to usability in contexts such as textwrap.dedent("""...""").

Therefore, I prioritized eliminating the inconvenience that a \ is almost always required immediately after textwrap.dedent(""", over the inconvenience of needing to add a newline when rewriting """ as d""".

When making technical decisions, many objective facts are taken into consideration, but the final choice of priorities is always subjective. My subjective judgment may not align with yours; however, since I am the author of this PEP, it is ultimately written based on my own subjective decisions.

The presentation of new facts that support your position is welcome, but let us refrain from continuing subjective and abstract discussions about whether the initial newline should be preserved or removed.

9 Likes

I would weigh a self-consistent design (generally speaking, not talking about newlines here) higher than an informal vote, where most people go by what looks/feels better, mostly without considering the full set of consequences. It’s like designing the nuclear powerplant around the bikeshed proposal that people liked the most.

Sure it’s your PEP, you write it how you think it’s best. Despite my enthusiasm for most of the PEP, the inconsistencies turn me from +1 to -0.5.

1 Like

… and r strings are the only ones to interpret backslashes literally and, until t strings came along, b strings were the only strings to not produce str types and f strings were the only strings to be sensitive to the surrounding namespace. In all cases, it’s precisely their inconsistencies that made them useful.

I think I probably would have agreed with you had I not hit all of the issues that the PEP tries to solve. I want to write a multline string so I intuitively reach for Python’s multiline string feature, using it as I’ve seen countless times in docstrings…

    return f"""
        line 1
        line 2
        {multiline_variable_bit}
        line 4
    """

but by the time I’ve adjusted for all the ways Python very literally interprets the spaces and newline characters between the quotes, it ends up looking something like:

    return """\
line 1
line 2
{}
line 4
""".format(multiline_variable_bit)

It’s consistent behaviour but it’s both surprising and unhelpful. The so-called consistency of regular multiline strings feels more like a pedantic hyper-literalism rather than a feature.

So yes, it’s inconsistent, we’re well aware that it’s inconsistent, but I’d still argue that it makes it a better, and IMO more intuitive, feature.

13 Likes

The way I see it: d-strings strip various forms of cosmetic whitespace added to make the string more readable in the context of the surrounding code.

They are also very useful for docstrings, so the d prefix is perfect. (Like r-strings are perfect for regular expressions.)

18 Likes

IMO you’re mixing a feature’s primary purpose with its secondary effects. In all cases, the primary purpose was important enough to open up some new aspect (but those PEPs still ensured the syntactic/semantic fallout is minimized).

It still sounds to me like people would want to change multiline string newline handling more generally – which I even tend to agree with – but because it’s hard to change any existing behaviour, it’ll now only be in d-strings, which I think is suboptimal.

Given the responses/hearts, it seems people are willing to pay that price, so I’ll stop making my case. As long as this is a conscious choice rather than confirmation bias due to enthusiasm, it’ll be fine.

2 Likes

I would rather have the leading escape in exchange for more consistent multiline string rules. But as it’s the job of the proposed d-string to manipulate whitespace, I think having different rules here is OK.

The thing which puzzles me is why the trailing newline shouldn’t – for the sake of internally consistent behavior – also be stripped off.

Why should I expect two newlines printed in this example?

print(
    d"""
      A text this short
      Is my full report
    """
)

I don’t quite understand the argument here. Removing the leading newline is intuitive, but the trailing one is not? My intuition is that either the first and last lines are semantic or they are not.

7 Likes

I got confused about what you were saying here at first. There are two lines so I would expect the string to have two newlines but what you actually mean is that print will append a newline so you end up having an extra blank line i.e. two newlines at the end of the output. The counterpoint would be why this should be missing a line ending:

with open('foo.txt', 'w') as f:
    f.write(
        d"""
          A text this short
          Is my full report
        """
    )

It seems intuitive to me that the two middle lines here should show how it would look in the file if I were to e.g. cat foo.txt. I think print is the oddball there for appending a newline. When working with a multiline string you generally want the string to have its own newlines. Otherwise it’s better to have a list of strings.

1 Like

Always stripping trailing /n would mean that you do not get double /n/n when printing a d’’’string’’’ or inserting it into an other multiline string.

If a trailing /n is required you would need to leave a empty line before the closing quote.

2 Likes

The trailing newline question is admittedly use-case dependent.

  • When composing a large multi-line string by concatenating bits together having the trailing newline is good.
        chunks = [d"""
            line 1
            line 2
        """]
        if something:
            chunks.append(d"""
                optional line 3
                optional line 4
            """)
        chunks.append(d"""
            line 5
            line 6
        """)
        return "".join(chunks)
    
  • When composing via f-strings, it gets in the way.
  • When writing a text file, the trailing newline is usually needed.
  • When writing test cases for something that generates multi-line strings, the trailing newline would ideally match whatever’s being tested. [1]
    def test():
        assert tabulate.tabulate([[1, 2], [3, 4]], ["loooooong", "short"]) == d"""
              loooooong    short
            -----------  -------
                      1        2
                      3        4
        """  # In this case, the trailing newline is not wanted
    
  • When composing multi-line console output or error messages, it may or may not get in the way depending on whether whatever’s putting the text on the screen appends a newline. It’s also possible that an extra blank line is desired after a large block of text so that a resultant double newline is accidentally a good thing[2].

  1. I do this a lot in Java. It’s so satisfying – not a feeling I’m used to experiencing whilst going anywhere near Java. ↩︎

  2. although relying on that behaviour may get confusing quickly ↩︎

6 Likes

Sorry about the ambiguity; yes, I meant the two trailing newlines.

I was mostly responding to the statement that the proposed behavior is “intuitive”. Right or wrong, I have a different intuitive expectation.

As I note above, I don’t think you’re wrong, but I definitely don’t think in terms of cat here. I think in terms of existing multiline Python strings. When I learn that the leading newline is not part of the content, that jumps out at me as a new and special rule. Then I start expecting that the trailing one isn’t either – to me, that would match doing special cleanup on the start of the string.

If the question is reframed away from “intuitive” and towards “what’s ergonomic”, I think the cases where a newline must be explicitly added are easier to handle comfortably than those where it must be explicitly removed. I’m thinking of print(), "\n".join(), inclusion in f-strings, and other scenarios like that.

3 Likes

I personally am more fond of the approach where the last newline is removed from the string as well, so that the syntax allows any string to be represented without having to resort to \ escapes, since their use has been discouraged by the commonly used style guides/formatters (and I don’t like them myself either :smile:). The other option is slicing with [:-1] , but that involves runtime overhead. I suppose Python could technically learn to reduce slice operations on constants to the new constant, though for f-strings, such optimisation could only be applied in some cases. It definitely seems like adding new optimisations would be a way bigger endeavour compared to just designing the syntax such that it doesn’t impose restrictions on what strings can be made with it, though.

What I’d really like to come back to, though, is how \ works for removing the trailing newline:

Is it just me, or does the behaviour here seem a bit surprising? \ means ignored end of line, but then doesn’t it somewhat suggest that it is equivalent to:

s = d"""
__Hello World!"""

…which is not valid syntax under the current proposal? I know that, as far as the tokeniser is concerned, we’re within a string, and so this is considered an escape sequence, not explicit line joining, but it does throw me off that it doesn’t work like a typical line continuation, where I can just join the lines back together and get the equivalent code.

This could technically be solved by the earlier suggestion of just allowing the closing "”” anywhere, not just indentation-only lines. However, I personally like having the closing "”” on a separate line since it gives more control over how much of the indentation we want to remove (also brought up by Brénainn earlier). I do, however, think that the line continuation (\) should not be allowed in the line that precedes it due to what I’ve shown above as (imo?) an inconsistency with the way line continuation works today. I admit I don’t use that feature of the language at all, so please correct me if there’s actually a different example where you can’t just remove the line continuation and join the lines to get the equivalent code.

Anyway, this leaves me with suggesting this variant (which I’m pretty sure was suggested here, but not sure by whom) as one that I’d consider least restrictive out of the bunch:

assert "Hello World!" == d"""
__Hello \
__World!
__"""
assert "Hello\nWorld!" == d"""
__Hello
__World!
__"""
assert "Hello\nWorld!\n" == d"""
__Hello
__World!
__
__"""
assert "Hello\nWorld!\n" == d"""
__Hello
__World!

__"""

I strongly agree with Stephen’s sentiment that the “what’s ergonomic” question is the right question to ask here. The syntax should feel familiar, but it doesn’t need to (and based on the discussion so far, it seems to me that some inconsistency is unavoidable) feel entirely consistent with existing strings, and that’s fine.

5 Likes

Maybe we should make this look very different from existing triple-quoted strings by using something other than triple quotes.

s = <<<
    Multi-line indentation-respecting
    string literal contents
    goes here.
    >>>

That’s clearly something else, so it can have its own syntactic rules without causing confusion.

1 Like

There’s something that’s unclear to me in the PEP: what exactly is meant by left-aligned multiline string? Is it something like this?

def some_indented_code():
    my_string = """
<html>
    <p>
        I don’t start from the code’s indentation level
    </p>
</html>
    """.strip()

If not, then how would you call the above and would it make sense to add it to the possible options in the Motivation section? It’s been my go-to for years for things like HTML or long SQL query literals.

d""" is already something different, just like r"" is something entirely different from "". We already have string modifiers that significantly change the output, from entirely different types (b""bytes, t""Template) to entirely different interpretations.

Here is a real path on my desktop which, when put in quotes as "C:\Users\user\Pictures\78981416.jpeg", is a SyntaxError, while the same path with r"C:\Users\user\Pictures\78981416.jpeg" is completely fine. We already live in a world where different string modifiers are worlds apart in their behavior, and it has proven to be quite intuitive.

3 Likes

I do agree with this, and with that in mind I personally lean toward stripping newlines on both ends because:

  • It is a simpler rule to remember and explain.
  • Anecdotally, I find it simpler and more concise to add newlines than to remove them.
  • Getting in the way of composing with f-strings feels like a pretty big drawback to me, whereas wanting newlines for concatenation or writing text files has simpler[1] workarounds like "\n".join(chunks) and f.write("\n") or f.write(chunk + "\n").

  1. in my opinion ↩︎

3 Likes

I’d argue that this code works just as well with no trailing newline by just using return "\n".join(chunks)

If you need the extra newline though, then I’d argue explicit adding (extra line) is more intuitive than explicit removing (backslash).

    chunks = [d"""
    line 1
        * line 2

    """]
    if something:
        chunks.append(d"""
        optional line 3
            * optional line 4
        same line preamble of line 5:
        """)
    chunks.append(d"""
    line 5
        * line 6

    """)
    return "".join(chunks)

(Granted, that could be because I spend most of my work life on Windows systems and the last newline character in source code is actually displayed as an extra lines in IDEs, so I’m used to see a visual line at the end.)

For another example that may go against expectations:

    letters = d"""
    a
    b
    c
    """.split("\n")

What’s len(letters)? My intuition tells me it should be 3, but with trailing newline, it would actually be 4. So I can’t just chuck into a for l in letters loop because I have to deal with the empty string in the last item first.

5 Likes