PEP 822: Dedented Multiline String (d-string)

Happy new year, and new PEP.
After several years of long discussion, I wrote up the d-string PEP.

Old threads:

Abstract

This PEP proposes to add a feature that automatically removes indentation from multiline string literals.

Dedented multiline strings use a new prefix “d” (shorthand for “dedent”) before the opening quote of a multiline string literal.

Example (spaces are visualized as _):

    def hello_paragraph() -> str:
    ____return d"""
    ________<p>
    __________Hello, World!
    ________</p>
    ____"""

The closing triple quotes control how much indentation would be removed. In the above example, the returned string will contain three lines:

  • "____<p>\n" (four leading spaces)
  • "______Hello, World!\n" (six leading spaces)
  • "____</p>\n" (four leading spaces)

Motivation

When writing multiline string literals within deeply indented Python code, users are faced with the following choices:

  • Accept that the content of the string literal will be left-aligned.
  • Use multiple single-line string literals concatenated together instead of a multiline string literal.
  • Use textwrap.dedent() to remove indentation.

All of these options have drawbacks in terms of code readability and maintainability.

  • Left-aligned multiline strings look awkward and tend to be avoided. In practice, many places including Python’s own test code choose other methods.
  • Concatenated single-line string literals are more verbose and harder to maintain.
  • textwrap.dedent() is implemented in Python so it require some runtime overhead. It cannot be used in hot paths where performance is critical.

This PEP aims to provide a built-in syntax for dedented multiline strings that is both easy to read and write, while also being efficient at runtime.

51 Likes

Like the proposal, however I have one question:

I was wondering why it cannot be combined with “b”. I think dedentation and the bytes data type are orthogonal to each other (from a design perspective, maybe not from the implementation perspective).

3 Likes

So to get:

"<p>\n"
"__Hello, World!\n"
"</p>\n"

I would need to write:

def hello_paragraph() -> str:
____return d"""
____<p>
______Hello, World!
____</p>
"""

?

2 Likes

Thanks for putting together the PEP!

This is one of the key advantages of a d-string versus the status quo, so the specs should go into details of exactly how it works.

Specifically, the question I raised in Pre-PEP: d-string / Dedented Multiline Strings with Optional Language Hinting - #98 by blhsing regarding trailing newlines in evaluations from {...} should be addressed.

Namely, given:

text = d'''
Hello
World!
'''
paragraph = df'''
<p>
__{text}
</p>
'''

where text becomes 'Hello\nWorld!\n', with a trailing newline, should paragraph preserve the trailing newline to become '<p>\n__Hello\n__World!\n\n</p>', or should it automatically remove the trailing newline to avoid a blank line in the output, so it can become a prettier '<p>\n__Hello\n__World!\n</p>'?

If the implicit behavior of automatically removing a trailing newline from a {...} evaluation spooks people, I suggested using a backslash to explicitly avoid an extra newline:

paragraph = df'''
<p>
__{text}\
</p>
'''

But then it will likely make most df-strings ridden with ugly backslashes.

I’m personally more in favor of an implicit behavior of automatic removal of trailing newlines from {...} evaluations to keep the usage clean, but would not mind an explicit solution.

1 Like

To remove all the leading spaces from <p> and </p> you should align the closing triple quotes with them:

def hello_paragraph() -> str:
____return d"""
____<p>
______Hello, World!
____</p>
____"""
7 Likes

fmm, I did not support byte strings in the same way, as t-strings and f-strings do not support byte strings.

However, I can’t think of a clear reason why d-strings couldn’t support byte strings. From an implementation perspective, it is easier for d-strings to support byte strings compared to t-strings and f-strings.

When writing C or HTML snippets, Unicode strings are usually used, but byte strings are also used in some cases.

I will take some time to consider whether to add support for byte strings.

3 Likes

I don’t like the idea of changing f-string behavior. d-string should only remove indent before processing backshash escape, f-string, and t-string.

In other words, this assertion must pass for any input text.:

s1 = df"""
    foo
      {text}
    bar
    """
s2 = f"""\
foo
  {text}
bar
"""
assert s1 == s2

Instead, you can strip trailing newline with regular Python methods:

paragraph = df'''
<p>
__{str(text).rstrip('\n')}, or
__{str(text).removesuffix('\n')}
</p>
'''
16 Likes

I like the way multi-line strings are dedented in C# ( Raw string literals - """ - C# reference | Microsoft Learn ), this proposal is similar, with the most significant difference (imo) being the inability to not have a trailing newline (C# doesn’t include the newline before the closing quotes). I wish this was possible to do with Python’s dedented multi-line strings though it seems to be at odds with the assertion comparing dedented and non-dedented multi-line strings that you presented above. I tend to agree that this assertion is important to hold true but it does seem like an annoying quirk that the string will always end with \n, requiring me to e.g. use .rstrip() on it. I can’t think of anything that would allow the assertion to hold true and allow the string to not have end with a trailing newline, sadly.

2 Likes

You can use a line continuation marker at the end to avoid a trailing newline:

5 Likes

I explained in a previous thread why I think it’s better to strip the newline immediately after the opening quotes but leave newline in the last line.

7 Likes

I prefer concatenating multiple single-line strings.

d-strings remind me of the <<EOF (heredoc) syntax in shell scripts.

But that approach is unfriendly to copy-and-paste, and requires an explicit trailing newline character in every single-line string.

And is the similarity to the heredoc syntax a good thing or a bad thing to you, and why?

2 Likes

I acknowledge that the concatenation approach is not ideal for copy-and-pasting.

However, regarding the comparison to heredoc, my primary concern is the visual inconsistency it introduces. Subjectively speaking—if you will pardon the phrasing—I find that syntax somewhat unaesthetic.

Furthermore, I believe d-strings struggle to handle scenarios involving deep indentation. In such cases, block strings often push the content too far to the right, causing line length issues that I prefer to avoid.

In this context, the necessity of explicit newlines in concatenated strings becomes an advantage. It allows me to manually wrap long content to strictly adhere to line-length limits.

if config.is_valid:
    if user.is_authenticated:
        # In deep indentation, implicit concatenation gives me 
        # precise control over line breaks and length.
        query = (
            "SELECT id, username, email, created_at "
            "FROM users "
            "WHERE status = 'active' "
            "AND role = 'admin'"
        )
3 Likes

The whole point of a d-string (as the title indicates) is a multiline string. If your use case is a long single-line string then yes by all means multiple single-line string literals is a great tool for the job, though with line continuation a d-string doesn’t look too bad to me either:

if config.is_valid:
    if user.is_authenticated:
        query = d"""
        SELECT id, username, email, created_at \
        FROM users \
        WHERE status = 'active' \
        AND role = 'admin'
        """
2 Likes

Fortunately, you always have the option of not using the new string syntax.

I hope you don’t intend, with your posting, to argue against, the rolling forward of a feature that has been missed through decades by thousands of people, just because it is another option that doesn’t fit your particular taste.

2 Likes

Thank you @methane for taking this on!

In this example:

def hello_paragraph() -> str:
    ____return d"""
    ________<p>
    __________Hello, World!
    ________</p>
    ____"""

I would have expected this output:

<p>
__Hello, World!
</p>

without leading or trailing whitespace, so "<p>\n__Hello, World!\n</p>"

1 Like

Inada already explained why there should be a trailing newline.

And since the indentation of the closing triple quotes is arguably the most visually intuitive way of specifying the level of dedentation, the above d-string should dedent by only 4 spaces instead of 8.

I do agree that formatting the content of the string with more indentation than the enclosing quotes follows the current styling recommendations better, though it’s probably a necessary tradeoff if we want the closing triple quotes to control the level of dedentation.

Love the PEP, I think it’s very well argued and thought-through!

I can understand how you arrive at this from the POV of the algorithm that determines the indentation of the trailing triple-quote and deducts that from all the lines, but this restriction seems artificial to me.

Taking the two relevant of your examples, I don’t see how

s = d"""Hello
__World!
"""
print(repr(s))  # 'Hello\n__World!\n'

s = d"""\
__Hello
__World
__"""
print(repr(s))  # 'Hello\nWorld!\n'

would have any ambiguity in their mechanics. Likewise for the following third example that ties both cases together

s = d"""Hello\
__World!\
__"""
print(repr(s))  # 'HelloWorld!'

The way I read this is that the line containing the opening triple quotes simply does not participate in the stripping of indentation. This is IMO still a rule that’s very intuitively explainable[1]. As any feature, it has some potential for suboptimal use, e.g.

s = d"""__Hello
______World!
____"""
print(repr(s))  # '__Hello\n__World!\n'

but I think that’s where we should rely on “consenting adults” being able to make the choices they prefer (and of course popular linters will come up with best practice rules anyway).

To summarise: I think it’s imperative to avoid ambiguity, but I also think it’d be better to avoid restrictions that aren’t strictly necessary for that.


  1. not least because the position of the opening quotes will generally be different from the rest of the multi-line string anyway. ↩︎

2 Likes

This could lead to surprising outcomes if text that appears to be part of a string is not actually part of it. In addition to being confusing, it could be abused to be misleading, possibly even with security risk if the situation was just right.

Can it be a SyntaxError instead?

Perhaps with the exception of empty lines being allowed to still be empty lines. Maybe also allow lines with only indentation characters (spaces/tabs)

def examples():
____d"""
____this should
be an error
____"""

____d"""
____this would be ok:

____this too probably:
__
____"""
5 Likes

I intended to propose the same thing you are suggesting, but my English was inaccurate.
The pointed-out sentence was intended to mean that when the indentation to be deleted is 4 spaces, lines with only 2 spaces would not result in a syntax error but become blank lines. However, it can be read as if any short string is allowed.

I will revise the expression in that paragraph to avoid misunderstanding.

2 Likes