Happy new year, and new PEP.
After several years of long discussion, I wrote up the d-string PEP.
Old threads:
- Str.dedent vs str.removeindent
- D-string vs str.dedent()
- Markdown code-block style syntax for string literals
- Pre-PEP: d-string / Dedented Multiline Strings with Optional Language Hinting
Abstract
This PEP proposes to add a feature that automatically removes indentation from multiline string literals.
Dedented multiline strings use a new prefix “d” (shorthand for “dedent”) before the opening quote of a multiline string literal.
Example (spaces are visualized as _):
def hello_paragraph() -> str:
____return d"""
________<p>
__________Hello, World!
________</p>
____"""
The closing triple quotes control how much indentation would be removed. In the above example, the returned string will contain three lines:
"____<p>\n"(four leading spaces)"______Hello, World!\n"(six leading spaces)"____</p>\n"(four leading spaces)
Motivation
When writing multiline string literals within deeply indented Python code, users are faced with the following choices:
- Accept that the content of the string literal will be left-aligned.
- Use multiple single-line string literals concatenated together instead of a multiline string literal.
- Use
textwrap.dedent()to remove indentation.
All of these options have drawbacks in terms of code readability and maintainability.
- Left-aligned multiline strings look awkward and tend to be avoided. In practice, many places including Python’s own test code choose other methods.
- Concatenated single-line string literals are more verbose and harder to maintain.
textwrap.dedent()is implemented in Python so it require some runtime overhead. It cannot be used in hot paths where performance is critical.
This PEP aims to provide a built-in syntax for dedented multiline strings that is both easy to read and write, while also being efficient at runtime.
Rationale
The main alternative to this idea is to implement textwrap.dedent() in C and provide it as a str.dedent() method. This idea reduces the runtime overhead of textwrap.dedent(). By making it a built-in method, it also allows for compile-time dedentation when called directly on string literals.
However, this approach has several drawbacks:
- To support cases where users want to include some indentation in the string, the
dedent()method would need to accept an argument specifying the amount of indentation to remove. This would be cumbersome and error-prone for users. - When continuation lines (lines after line ends with a backslash) are used, they cannot be dedented.
- f-strings may interpolate expressions as multiline string without indent. In such case, f-string +
str.dedent()cannot dedent the whole string. - t-strings do not create str objects, so they cannot use the
str.dedent()method. While adding adedent()method tostring.templatelib.Templateis an option, it would lead to inconsistency since t-strings and f-strings are very similar but would have different behaviors regarding dedentation.
The str.dedent() method can still be useful for non-literal strings, so this PEP does not preclude that idea. However, for ease of use with multiline string literals, providing dedicated syntax is superior.
Specification
Add a new string literal prefix “d” for dedented multiline strings. This prefix can be combined with “f”, “t”, and “r” prefixes.
This prefix is only for multiline string literals. So it can only be used with triple quotes (""" or '''). Using it with single or double quotes (" or ') is a syntax error.
Opening triple quotes needs to be followed by a newline character. This newline is not included in the resulting string.
The amount of indentation to be removed is determined by the whitespace (' ' or '\t') preceding the closing triple quotes. Mixing spaces and tabs in indentation raises a TabError, similar to Python’s own indentation rules.
The dedentation process removes the determined amount of leading whitespace from each line in the string. Lines that are shorter than the determined indentation become just an empty line (e.g. "\n"). Otherwise, if the line does not start with the determined indentation, Python raises an IndentationError.
Unless combined with the “r” prefix, backslash escapes are processed after removing indentation. So you cannot use \\t to create indentation. And you can use line continuation (backslash at the end of line) and remove indentation from the continued line.
Examples:
# whiltespace is shown as _ and TAB is shown as ---> for clarity.
# Error messages are just for explanation. Actual messages may differ.
s = d"" # SyntaxError: d-string must be a multiline string
s = d"""Hello""" # SyntaxError: d-string must be a multiline string
s = d"""Hello
__World!
""" # SyntaxError: d-string must start with a newline
s = d"""
__Hello
__World!""" # SyntaxError: d-string must end with an indent-only line
s = d"""
__Hello
__World!
""" # Zero indentation is removed because closing quotes are not indented.
print(repr(s)) # '__Hello\n__World!\n'
s = d"""
__Hello
__World!
_""" # One space indentation is removed.
print(repr(s)) # '_Hello\n_World!\n'
s = d"""
__Hello
__World!
__""" # Two spaces indentation are removed.
print(repr(s)) # 'Hello\nWorld!\n'
s = d"""
__Hello
__World!
___""" # IndentationError: missing valid indentation
s = d"""
--->Hello
__World!
__""" # IndentationError: missing valid indentation
s = d"""
--->--->__Hello
--->--->__World!
--->--->""" # TAB is allowed as indentation.
# Spaces are just in the string, not indentation to be removed.
print(repr(s)) # '__Hello\n__World!\n'
s = d"""
--->____Hello
--->____World!
--->__""" # TabError: mixing spaces and tabs in indentation
s = d"""
__Hello \
__World!\
__""" # line continuation works as ususal
print(repr(s)) # 'Hello_World!'
s = d"""\
__Hello
__World
__""" # SyntaxError: d-string must starts with a newline.
s = dr"""
__Hello\
__World!\
__""" # d-string can be combined with r-string.
print(repr(s)) # 'Hello\\\nWorld!\\\n'
s = df"""
____Hello, {"world".title()}!
____""" # d-string can be combined with f-string and t-string too.
print(repr(s)) # 'Hello, World!\n'
s = dt"""
____Hello, {"world".title()}!
____"""
print(type(s)) # <class 'string.templatelib.Template'>
print(s.strings) # ('Hello, ', '!\n')
print(s.values) # ('World',)
print(s.interpolations) # (Interpolation('World', '"world".title()', None, ''),)
How to Teach This
In the tutorial, we can introduce d-string with triple quote string literals. Additionally, we can add a note in the textwrap.dedent() documentation, providing a link to the d-string section in the language reference or the relevant part of the tutorial.
Other Languages having Similar Features
“Java 15 introduced a feature called Text Blocks. Since Java had not used triple qutes before, they introduced triple quotes for multiline string literals with automatic indent removal.
Julia and Swift also supports triple-quoted string literals that automatically remove indentation.
Java and Julia uses the least-indented line to determine the amount of indentation to be removed. Swift uses the indentation of the closing triple quotes, similar to this PEP.
This PEP chose the Swift approach because it is more simple and easy to explain.
Reference Implementation
A CPython implementation of PEP 822 is available。
Rejected Ideas
str.dedent() method
As mentioned in the Rationale section, this PEP doesn’t reject the idea of a str.dedent() method. However, d-string is more suitable for multiline string literals because:
- It can work nice with f-strings and t-strings nice.
- It can specify the amount of indentation to be removed more easily.
- It can dedent continuation lines.
triple-backquote
It is considered that using triple backquotes (”```”) for dedented multiline strings could be an alternative syntax. This notation is familiar to us from Markdown. While there were past concerns about certain keyboard layouts, nowadays many people are accustomed to typing this notation.
However, this notation conflicts when embedding Python code within Markdown or vice versa. Therefore, considering these drawbacks, increasing the variety of quote characters is not seen as a superior idea compared to adding a prefix to string literals.
__future__ import
Instead of adding a prefix to string literals, the idea of using a __future__ import to change the default behavior of multiline string literals was also considered. This could help simplify Python’s grammar in the future.
But rewriting all existing complex codebases to the new notation may not be straightforward. Until all multiline strings in that source code are rewritten to the new notation, automatic dedentation cannot be utilized.
Until all users can rewrite existing codebases to the new notation, two types of Python syntax will coexist indefinitely. Therefore, many people preferred the new string prefix over the __future__ import.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.