PEP 822: Dedented Multiline String (d-string)

Happy new year, and new PEP.
After several years of long discussion, I wrote up the d-string PEP.

Old threads:

Abstract

This PEP proposes to add a feature that automatically removes indentation from multiline string literals.

Dedented multiline strings use a new prefix “d” (shorthand for “dedent”) before the opening quote of a multiline string literal.

Example (spaces are visualized as _):

    def hello_paragraph() -> str:
    ____return d"""
    ________<p>
    __________Hello, World!
    ________</p>
    ____"""

The closing triple quotes control how much indentation would be removed. In the above example, the returned string will contain three lines:

  • "____<p>\n" (four leading spaces)
  • "______Hello, World!\n" (six leading spaces)
  • "____</p>\n" (four leading spaces)

Motivation

When writing multiline string literals within deeply indented Python code, users are faced with the following choices:

  • Accept that the content of the string literal will be left-aligned.
  • Use multiple single-line string literals concatenated together instead of a multiline string literal.
  • Use textwrap.dedent() to remove indentation.

All of these options have drawbacks in terms of code readability and maintainability.

  • Left-aligned multiline strings look awkward and tend to be avoided. In practice, many places including Python’s own test code choose other methods.
  • Concatenated single-line string literals are more verbose and harder to maintain.
  • textwrap.dedent() is implemented in Python so it require some runtime overhead. It cannot be used in hot paths where performance is critical.

This PEP aims to provide a built-in syntax for dedented multiline strings that is both easy to read and write, while also being efficient at runtime.

Rationale

The main alternative to this idea is to implement textwrap.dedent() in C and provide it as a str.dedent() method. This idea reduces the runtime overhead of textwrap.dedent(). By making it a built-in method, it also allows for compile-time dedentation when called directly on string literals.

However, this approach has several drawbacks:

  • To support cases where users want to include some indentation in the string, the dedent() method would need to accept an argument specifying the amount of indentation to remove. This would be cumbersome and error-prone for users.
  • When continuation lines (lines after line ends with a backslash) are used, they cannot be dedented.
  • f-strings may interpolate expressions as multiline string without indent. In such case, f-string + str.dedent() cannot dedent the whole string.
  • t-strings do not create str objects, so they cannot use the str.dedent() method. While adding a dedent() method to string.templatelib.Template is an option, it would lead to inconsistency since t-strings and f-strings are very similar but would have different behaviors regarding dedentation.

The str.dedent() method can still be useful for non-literal strings, so this PEP does not preclude that idea. However, for ease of use with multiline string literals, providing dedicated syntax is superior.

Specification

Add a new string literal prefix “d” for dedented multiline strings. This prefix can be combined with “f”, “t”, and “r” prefixes.

This prefix is only for multiline string literals. So it can only be used with triple quotes (""" or '''). Using it with single or double quotes (" or ') is a syntax error.

Opening triple quotes needs to be followed by a newline character. This newline is not included in the resulting string.

The amount of indentation to be removed is determined by the whitespace (' ' or '\t') preceding the closing triple quotes. Mixing spaces and tabs in indentation raises a TabError, similar to Python’s own indentation rules.

The dedentation process removes the determined amount of leading whitespace from each line in the string. Lines that are shorter than the determined indentation become just an empty line (e.g. "\n"). Otherwise, if the line does not start with the determined indentation, Python raises an IndentationError.

Unless combined with the “r” prefix, backslash escapes are processed after removing indentation. So you cannot use \\t to create indentation. And you can use line continuation (backslash at the end of line) and remove indentation from the continued line.

Examples:

# whiltespace is shown as _ and TAB is shown as ---> for clarity.
# Error messages are just for explanation. Actual messages may differ.

s = d""  # SyntaxError: d-string must be a multiline string
s = d"""Hello"""  # SyntaxError: d-string must be a multiline string
s = d"""Hello
__World!
"""  # SyntaxError: d-string must start with a newline

s = d"""
__Hello
__World!"""  # SyntaxError: d-string must end with an indent-only line

s = d"""
__Hello
__World!
"""  # Zero indentation is removed because closing quotes are not indented.
print(repr(s))  # '__Hello\n__World!\n'

s = d"""
__Hello
__World!
_"""  # One space indentation is removed.
print(repr(s))  # '_Hello\n_World!\n'

s = d"""
__Hello
__World!
__"""  # Two spaces indentation are removed.
print(repr(s))  # 'Hello\nWorld!\n'

s = d"""
__Hello
__World!
___"""  # IndentationError: missing valid indentation

s = d"""
--->Hello
__World!
__"""  # IndentationError: missing valid indentation

s = d"""
--->--->__Hello
--->--->__World!
--->--->"""  # TAB is allowed as indentation.
             # Spaces are just in the string, not indentation to be removed.
print(repr(s))  # '__Hello\n__World!\n'

s = d"""
--->____Hello
--->____World!
--->__"""  # TabError: mixing spaces and tabs in indentation

s = d"""
__Hello \
__World!\
__"""  # line continuation works as ususal
print(repr(s))  # 'Hello_World!'

s = d"""\
__Hello
__World
__"""  # SyntaxError: d-string must starts with a newline.

s = dr"""
__Hello\
__World!\
__"""  # d-string can be combined with r-string.
print(repr(s))  # 'Hello\\\nWorld!\\\n'

s = df"""
____Hello, {"world".title()}!
____"""  # d-string can be combined with f-string and t-string too.
print(repr(s))  # 'Hello, World!\n'

s = dt"""
____Hello, {"world".title()}!
____"""
print(type(s))  # <class 'string.templatelib.Template'>
print(s.strings)  # ('Hello, ', '!\n')
print(s.values)  # ('World',)
print(s.interpolations)  # (Interpolation('World', '"world".title()', None, ''),)

How to Teach This

In the tutorial, we can introduce d-string with triple quote string literals. Additionally, we can add a note in the textwrap.dedent() documentation, providing a link to the d-string section in the language reference or the relevant part of the tutorial.

Other Languages having Similar Features

“Java 15 introduced a feature called Text Blocks. Since Java had not used triple qutes before, they introduced triple quotes for multiline string literals with automatic indent removal.

Julia and Swift also supports triple-quoted string literals that automatically remove indentation.

Java and Julia uses the least-indented line to determine the amount of indentation to be removed. Swift uses the indentation of the closing triple quotes, similar to this PEP.

This PEP chose the Swift approach because it is more simple and easy to explain.

Reference Implementation

A CPython implementation of PEP 822 is available。

Rejected Ideas

str.dedent() method

As mentioned in the Rationale section, this PEP doesn’t reject the idea of a str.dedent() method. However, d-string is more suitable for multiline string literals because:

  • It can work nice with f-strings and t-strings nice.
  • It can specify the amount of indentation to be removed more easily.
  • It can dedent continuation lines.

triple-backquote

It is considered that using triple backquotes (”```”) for dedented multiline strings could be an alternative syntax. This notation is familiar to us from Markdown. While there were past concerns about certain keyboard layouts, nowadays many people are accustomed to typing this notation.

However, this notation conflicts when embedding Python code within Markdown or vice versa. Therefore, considering these drawbacks, increasing the variety of quote characters is not seen as a superior idea compared to adding a prefix to string literals.

__future__ import

Instead of adding a prefix to string literals, the idea of using a __future__ import to change the default behavior of multiline string literals was also considered. This could help simplify Python’s grammar in the future.

But rewriting all existing complex codebases to the new notation may not be straightforward. Until all multiline strings in that source code are rewritten to the new notation, automatic dedentation cannot be utilized.

Until all users can rewrite existing codebases to the new notation, two types of Python syntax will coexist indefinitely. Therefore, many people preferred the new string prefix over the __future__ import.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

48 Likes

Like the proposal, however I have one question:

I was wondering why it cannot be combined with “b”. I think dedentation and the bytes data type are orthogonal to each other (from a design perspective, maybe not from the implementation perspective).

3 Likes

So to get:

"<p>\n"
"__Hello, World!\n"
"</p>\n"

I would need to write:

def hello_paragraph() -> str:
____return d"""
____<p>
______Hello, World!
____</p>
"""

?

2 Likes

Thanks for putting together the PEP!

This is one of the key advantages of a d-string versus the status quo, so the specs should go into details of exactly how it works.

Specifically, the question I raised in Pre-PEP: d-string / Dedented Multiline Strings with Optional Language Hinting - #98 by blhsing regarding trailing newlines in evaluations from {...} should be addressed.

Namely, given:

text = d'''
Hello
World!
'''
paragraph = df'''
<p>
__{text}
</p>
'''

where text becomes 'Hello\nWorld!\n', with a trailing newline, should paragraph preserve the trailing newline to become '<p>\n__Hello\n__World!\n\n</p>', or should it automatically remove the trailing newline to avoid a blank line in the output, so it can become a prettier '<p>\n__Hello\n__World!\n</p>'?

If the implicit behavior of automatically removing a trailing newline from a {...} evaluation spooks people, I suggested using a backslash to explicitly avoid an extra newline:

paragraph = df'''
<p>
__{text}\
</p>
'''

But then it will likely make most df-strings ridden with ugly backslashes.

I’m personally more in favor of an implicit behavior of automatic removal of trailing newlines from {...} evaluations to keep the usage clean, but would not mind an explicit solution.

1 Like

To remove all the leading spaces from <p> and </p> you should align the closing triple quotes with them:

def hello_paragraph() -> str:
____return d"""
____<p>
______Hello, World!
____</p>
____"""
7 Likes

fmm, I did not support byte strings in the same way, as t-strings and f-strings do not support byte strings.

However, I can’t think of a clear reason why d-strings couldn’t support byte strings. From an implementation perspective, it is easier for d-strings to support byte strings compared to t-strings and f-strings.

When writing C or HTML snippets, Unicode strings are usually used, but byte strings are also used in some cases.

I will take some time to consider whether to add support for byte strings.

3 Likes

I don’t like the idea of changing f-string behavior. d-string should only remove indent before processing backshash escape, f-string, and t-string.

In other words, this assertion must pass for any input text.:

s1 = df"""
    foo
      {text}
    bar
    """
s2 = f"""\
foo
  {text}
bar
"""
assert s1 == s2

Instead, you can strip trailing newline with regular Python methods:

paragraph = df'''
<p>
__{str(text).rstrip('\n')}, or
__{str(text).removesuffix('\n')}
</p>
'''
12 Likes

I like the way multi-line strings are dedented in C# ( Raw string literals - """ - C# reference | Microsoft Learn ), this proposal is similar, with the most significant difference (imo) being the inability to not have a trailing newline (C# doesn’t include the newline before the closing quotes). I wish this was possible to do with Python’s dedented multi-line strings though it seems to be at odds with the assertion comparing dedented and non-dedented multi-line strings that you presented above. I tend to agree that this assertion is important to hold true but it does seem like an annoying quirk that the string will always end with \n, requiring me to e.g. use .rstrip() on it. I can’t think of anything that would allow the assertion to hold true and allow the string to not have end with a trailing newline, sadly.

2 Likes

You can use a line continuation marker at the end to avoid a trailing newline:

5 Likes

I explained in a previous thread why I think it’s better to strip the newline immediately after the opening quotes but leave newline in the last line.

6 Likes

I prefer concatenating multiple single-line strings.

d-strings remind me of the <<EOF (heredoc) syntax in shell scripts.

But that approach is unfriendly to copy-and-paste, and requires an explicit trailing newline character in every single-line string.

And is the similarity to the heredoc syntax a good thing or a bad thing to you, and why?

2 Likes

I acknowledge that the concatenation approach is not ideal for copy-and-pasting.

However, regarding the comparison to heredoc, my primary concern is the visual inconsistency it introduces. Subjectively speaking—if you will pardon the phrasing—I find that syntax somewhat unaesthetic.

Furthermore, I believe d-strings struggle to handle scenarios involving deep indentation. In such cases, block strings often push the content too far to the right, causing line length issues that I prefer to avoid.

In this context, the necessity of explicit newlines in concatenated strings becomes an advantage. It allows me to manually wrap long content to strictly adhere to line-length limits.

if config.is_valid:
    if user.is_authenticated:
        # In deep indentation, implicit concatenation gives me 
        # precise control over line breaks and length.
        query = (
            "SELECT id, username, email, created_at "
            "FROM users "
            "WHERE status = 'active' "
            "AND role = 'admin'"
        )
2 Likes

The whole point of a d-string (as the title indicates) is a multiline string. If your use case is a long single-line string then yes by all means multiple single-line string literals is a great tool for the job, though with line continuation a d-string doesn’t look too bad to me either:

if config.is_valid:
    if user.is_authenticated:
        query = d"""
        SELECT id, username, email, created_at \
        FROM users \
        WHERE status = 'active' \
        AND role = 'admin'
        """
2 Likes

Fortunately, you always have the option of not using the new string syntax.

I hope you don’t intend, with your posting, to argue against, the rolling forward of a feature that has been missed through decades by thousands of people, just because it is another option that doesn’t fit your particular taste.

1 Like

Thank you @methane for taking this on!

In this example:

def hello_paragraph() -> str:
    ____return d"""
    ________<p>
    __________Hello, World!
    ________</p>
    ____"""

I would have expected this output:

<p>
__Hello, World!
</p>

without leading or trailing whitespace, so "<p>\n__Hello, World!\n</p>"

Inada already explained why there should be a trailing newline.

And since the indentation of the closing triple quotes is arguably the most visually intuitive way of specifying the level of dedentation, the above d-string should dedent by only 4 spaces instead of 8.

I do agree that formatting the content of the string with more indentation than the enclosing quotes follows the current styling recommendations better, though it’s probably a necessary tradeoff if we want the closing triple quotes to control the level of dedentation.

Love the PEP, I think it’s very well argued and thought-through!

I can understand how you arrive at this from the POV of the algorithm that determines the indentation of the trailing triple-quote and deducts that from all the lines, but this restriction seems artificial to me.

Taking the two relevant of your examples, I don’t see how

s = d"""Hello
__World!
"""
print(repr(s))  # 'Hello\n__World!\n'

s = d"""\
__Hello
__World
__"""
print(repr(s))  # 'Hello\nWorld!\n'

would have any ambiguity in their mechanics. Likewise for the following third example that ties both cases together

s = d"""Hello\
__World!\
__"""
print(repr(s))  # 'HelloWorld!'

The way I read this is that the line containing the opening triple quotes simply does not participate in the stripping of indentation. This is IMO still a rule that’s very intuitively explainable[1]. As any feature, it has some potential for suboptimal use, e.g.

s = d"""__Hello
______World!
____"""
print(repr(s))  # '__Hello\n__World!\n'

but I think that’s where we should rely on “consenting adults” being able to make the choices they prefer (and of course popular linters will come up with best practice rules anyway).

To summarise: I think it’s imperative to avoid ambiguity, but I also think it’d be better to avoid restrictions that aren’t strictly necessary for that.


  1. not least because the position of the opening quotes will generally be different from the rest of the multi-line string anyway. ↩︎

1 Like

This could lead to surprising outcomes if text that appears to be part of a string is not actually part of it. In addition to being confusing, it could be abused to be misleading, possibly even with security risk if the situation was just right.

Can it be a SyntaxError instead?

Perhaps with the exception of empty lines being allowed to still be empty lines. Maybe also allow lines with only indentation characters (spaces/tabs)

def examples():
____d"""
____this should
be an error
____"""

____d"""
____this would be ok:

____this too probably:
__
____"""
4 Likes

I intended to propose the same thing you are suggesting, but my English was inaccurate.
The pointed-out sentence was intended to mean that when the indentation to be deleted is 4 spaces, lines with only 2 spaces would not result in a syntax error but become blank lines. However, it can be read as if any short string is allowed.

I will revise the expression in that paragraph to avoid misunderstanding.

2 Likes