PEP 822: Dedented Multiline String (d-string)

So, this means in practice that users would write:

def hello_paragraph() -> str:
    ____return (
            d"""
    ________<p>
    __________Hello, World!
    ________</p>
    ____    """.strip()
        )

If/when they want "<p>\n__Hello, World!\n</p>"

If that’s the intention, maybe it’s OK?

I kinda wonder if it’s getting almost as verbose as explicit textwrap.dedent()

Perhaps classification and statistics of existing uses for dedent in the wild could be used to justify the chosen dedent level and no-strip policy vs one-strip/all-strip.

1 Like

Maybe it’s better to rewrite this to be more explicit:

  • dt, df, drf, drt are allowed (as well as equivalent permutations dt, td, df, fd, drf, dfr, rdf, rfd, fdr, frd, drt, dtr, rdt, rtd, tdr, trd)
  • dtf (and equivalent permutations) are not allowed, as tf is not allowed

And

“The d prefix can only be used with triple quoted string literals and templates, """ and ''', because it’s intended for multi-line strings.”

(the way it’s currently written, it’s a bit vague is d"""a""" is actually allowed.

3 Likes

This:

Mixing spaces and tabs in indentation raises a TabError, similar to Python’s own indentation rules.

contradicts with:

s = d"""
--->____Hello
--->____World!
--->__"""  # TabError: mixing spaces and tabs in indentation

Under Python’s indentation rules, it’s perfectly fine to combine spaces and tabs. For example:

if True:
--->____print(1)
--->____print(2)
1
2

What’s not allowed is mixing them “in a way that makes the meaning dependent on the worth of a tab in spaces” (ref).
For example:

if True:
--->____print(1)
____--->print(2)
Traceback (most recent call last):
  ...
IndentationError: unindent does not match any outer indentation level

I suggest instead specifying that each line must

  • start with exactly the whitespace string that’s before the closing triple quote (which will be removed), or
  • contain only some beginning of that whitespace string (resulting in a blank line)
3 Likes

No, you would write

def hello_paragraph() -> str:
    ____return d"""
    ________<p>
    __________Hello, World!
    ________</p>\
    ________"""

if they want "<p>\n__Hello, World!\n</p>"

5 Likes

It seems in Swift, the last \n is also stripped:

let singleLineString = "These are the same."
let multilineString = """
These are the same.
"""

and to get a line break there, you have to include an empty line:

let lineBreaks = """

This string starts with a line break.
It also ends with a line break.

"""

I get that in the unix world, everything should end with a newline, but I wonder whether it isn’t more common in the world of Python to have things not end with a newline. I don’t know the answer to this, but it might be a good idea to investigate this.

3 Likes

There is no right answer about that. It’s a matter of preference.

To my eye, Swift example seems there is an empty line at last. I prefer Julia approach.

1 Like

To support multiline string without last newline without using, Julia approach looks better.

The dedentation level is determined as the longest common starting sequence of spaces or tabs in all lines, excluding the line following the opening """ and lines containing only spaces or tabs (the line containing the closing """ is always included). Then for all lines, excluding the text following the opening """ , the common starting sequence is removed (including lines containing only spaces and tabs if they start with this sequence),

julia> """
        Hello,
        world."""
"Hello,\nworld."

Although “longest common starting sequence of … excluding …” rule is not simple to explain,
it almost same to textwrap.dedent().

If many people think that you should be able to write multiline strings without \, """[:-1], """.rstrip(), or """.removesuffix('\n'), let’s make the dedent width determination the same as Julia.

2 Likes

I personally think being able to control the remaining indentation by dedenting the trailing """ is more useful than being able to remove the trailing newline without a \.

8 Likes

It’s the shortest indent following a literal newline that has non-whitespace on it, i.e. what’s on the same line as """ doesn’t count. Or:

min(re.finditer(r"\n(?P<indent>\w*)\W+",
    key=lambda match: len(match["indent"]))

The Julia docs summarize it as

triple-quoted strings are also dedented to the level of the least-indented line.

Make it “dedented to the level of the least-indented line with non-whitespace characters” and I think that’s simple enough.

5 Likes

This is a wonderful PEP, thank you for your hard work.

I have to say that I don’t find the trailing ‘\n’ intuitive at all, even after reading more of the posts in this thread (and having read the original thread a while ago).
Obviously this is steeped in personal taste, but I also think It will be easier to compose blocks of text together if the trailing line is remove.

def hello_paragraph(name):
    """assuming no trailing"""
    return df"""
        <p>
            hello {name}!
        </p>
        """

hello_world = hello_paragraph('world')

body = df"""
<body>
    {hello_world.replace("\n", "\n    ")}
</body>
"""

assert body == "<body>\n    <p>\n        hello world!\n    </p>\n</body>"

print(body)

<body>
    <p>
        hello world!
    </p>
</body>

I think referring to POSIX line definition is not really relevant to most string processing in python: taking "a\nb".splitline() for example will give a length a list of length 2, and the common "\n".join(...) idiom doesn’t add a trailing “\n”. It’s only at the edge of the program, when writing to file, when POSIX lines is relevant.

Even then, writing "a\nb\n" with the trailing line, then opening the file in an editor (vs code) shows up as something like

1  a
2  b
3

So to me, and I’m guessing the majority of newer programmer, the “obvious” way do this with d-strings is to wrap the whole thing with triple quotes: They “look” almost like braces.

d"""
a
b

"""

I think in general it will be easier to explicitly write trailing newlines when needed than having to explicitly remove it when needed.


I think adding a supporting format specs for indentation-composition could solve this: If there is a definite, privileged “One obvious way” of making, indenting, and composing text blocks together, that would solve any personal preferences issue.

Even/especially if all the format spec does is some trivial function like .strip().replace()

hello_world_block = ...

# Exactly one of this is the correct way
body = df"""
<body>
{hello_world_block:indentation spec}
OR
{hello_world_block:indentation spec}/
OR
    {hello_world_block:indentation spec}
OR
    {hello_world_block:indentation spec}/
</body>
"""

Or maybe even just a few more examples in the PEP would work? IDK I can’t un-read it and it’s late

4 Likes

I mentioned in the earlier d-strings thread that I thought we should preserve the leading newline, for consistency with other string types.
However, I 100% understand the appeal of stripping it off.

If the leading newline is removed, does it make sense for the trailing one to be removed as well? I think there’s a flavor of consistency there, in that the leading and trailing lines are the delimiters but not part of the content.


I still think there are too many string prefixes, and prefer the str.dedent method idea, acknowledging its limitations. But if we’re doing d-strings, let’s make it as good as possible!

4 Likes

I was already suggesting a dedicated symbol in the previous thread, there is no indentation “spec” necessary, the reindentation can be inferred from the braces position :

body = df"""
<body>
    {->hello_world_block}
</body>
"""

The general opinion about this was that it probably belongs to another PEP.

I think the following consideration is key

But I’d want to make it symmetrical w.r.t. \. If s2 begins with """\ (modulo string qualifiers) then so should s1; both starting with bare """ should match just the same.

I believe the only way to make all of this internally consistent (especially with regular multi-line strings) is to not touch beginning or trailing newlines. Let’s please avoid magic pre-processing that user cannot opt out of, but do the “obvious” thing, with user freedom to opt into stripping (e.g. by adding a backslash or simply beginning with the body right after the opening """).

As I mentioned further up, this is easy to make consistent by excepting the line containing the opening """ from any dedenting. Julia also exempts this line

The trailing """ is slightly harder, because according to the current draft, it determines how much indentation is stripped. But if this is switched to the Julia methodology

then this issue resolves itself as well, and people can then – as for the opening quotes – either use

some_long_variable = d"""
____body\
____"""

or

some_long_variable = d"""
____body"""

to suppress the trailing newline.

3 Likes

I’m +1 on stripping the first and last newline.

I think seeing additional newlines is more obvious in showing intent that there should be more lines than the single backslash doing the opposite. (And the backslash placement is bad on my German keyboard, so the less I need them, the better)

I also think it’s easier to work with when copy pasting text in and out. If I take a paragraph of some Markdown file (e.g. documentation) and copy it into a python multiline string (e.g. docstring) or vice versa, I don’t want to add/remove backslashes every time. I can change indentation easily with (Shift) Tab, can’t say the same for the backslashes.

8 Likes

my 2 cents: I use codedent (JS) which takes a different approach: the end of the string doesn’t matter, it’s only the very first non empty line that dictates the indentation, making easier to read and reason about (imho).

That is:

def greetings(name):
    return df"""
      <p>
        Hello {name}!
      </p>
    """

greetings('d')

# \n<p>\n  Hello d!</p>\n

The ending indentation aligns well with the starting indentation of the string and for a PL where indentation is everything I think it’s a more elegant approach.

On top of that, new lines are preserved (if desired) but these can be easily stripped after so no \n surrounding, yet no data is lost.

Only thing that matters is, again, the very first line which can have tabs and/or spaces up to a non tab-space char.

All other lines that match that ^\s{4} (4 spaces indentation per line) will be sanitized, those that don’t won’t.

Wouldn’t be this approach a simplified improvement over current ending ruler proposal?

edit moreover, all examples here are tiny chinks of code … I can see me scrolling until the end of a potentially long query/HTML/Markdown content to eventually fix the issue in case indentantion is off VS knowing at the very first line where indentation is meant to disappear (before every other line).

You don’t have to remove trailing newline from Python or markdown every time.
Most Python/Markdown snippets should have trailing newline. You need to strip it in only rare case.

For example, cpython/Lib/test/test_regrtest.py at main · python/cpython · GitHub have 45 dedent()s for python snippets. None of them strip last newline.

2 Likes

None of them strip the first newline either. So I think this test file might not be the best example.

(EDIT: and I don’t know if we need more examples, but I just noticed that the KDL config language strips the last newline in dedented strings.)

I was only arguing against “add/remove backslashes every time”. I never denied that there are use cases for removing the last newline.

Besides the backslash, there are other ways to remove the last newline, such as [:-1], .rstrip(), and .removesuffix('\n').

And I am proposing to change the specification to borrow Julia’s rules in order to prevent d-strings from becoming more inconvenient than textwrap.dedent() in use cases where the last newline character needs to be removed.

1 Like

The primary reason for not including the newline immediately after the opening triple quote in the string is intuitive clarity. As can be seen from the sample code of textwrap.dedent(), where a backslash is written immediately after the opening triple quote to remove the newline, it is very unnatural and difficult to understand when lines within the string to be dedented start after the opening triple quote rather than with an indent.

Instead of thinking that d-strings remove indentation and line breaks from triple-quoted strings, consider that d-strings treat the part after the indentation as the body of the string.
The following image visually explains why the newline immediately after the opening triple quote is ignored, while the newline before the closing triple quote is naturally and clearly included in the string.

Another reason to ignore the newline immediately after the opening triple quote is to leave room for adding something there in the future.
It may allow writing comments starting with # in the future. Previous thread about d-string had proposed adding language hint there.
However, to move the discussion forward, we propose the specification of this PEP with room to add those ideas in the future.

4 Likes

I think that’s pretty subjective. I certainly don’t see why dedenting (a horizontal operation) should be removing any newlines, including the initial one.

Beyond intuition however, I’m mostly concerned with consistency. We’re beginning to have quite a few string modifiers (r, f, d, t, b; as well as no modifier, of course). d would be the only one to strip newlines, and the downside of the confusion this will cause IMO far outweighs any other benefit.

This inconsistency is IMO not justified especially because anyone who cares about removing the initial newline can easily do so (under the “don’t strip newlines” model) with little effort

s = d"""\
____foo
______bar
____baz
____"""
# or
s = d"""foo
______bar
____baz
____"""

Besides consistency, we can make the same argument about someone who does want the initial newline, and who – under your model – would be forced to do

# d-strings strip initial newlines but we need one here
s = d"""
____
____foo
______bar
____baz
____"""

You can argue that this case is less common (I agree), but the wart this causes in the code[1] is IMO many times worse. It’s so distinct from regular multi-line string handling that it will almost always need a comment.

So in summary, stripping newlines creates multiple inconsistencies with existing string handling and user expectations, and I see no reason that’s close to compelling enough to give up that consistency.

Comments – like dedenting – should have nothing to do with newlines. Doing so would lead to yet more magical interactions that people have to spend time learning, rather than doing the obvious.

Don’t get me wrong, I can see the appeal of stripping the initial newline. But then this should be done as a separate PEP for all multi-line strings, not for a single modifier.

“Special cases aren’t special enough to break the rules.”


  1. and in the git history, when changing modifiers ↩︎

2 Likes