PEP 822: Dedented Multiline String (d-string)

So, this means in practice that users would write:

def hello_paragraph() -> str:
    ____return (
            d"""
    ________<p>
    __________Hello, World!
    ________</p>
    ____    """.strip()
        )

If/when they want "<p>\n__Hello, World!\n</p>"

If that’s the intention, maybe it’s OK?

I kinda wonder if it’s getting almost as verbose as explicit textwrap.dedent()

Perhaps classification and statistics of existing uses for dedent in the wild could be used to justify the chosen dedent level and no-strip policy vs one-strip/all-strip.

Maybe it’s better to rewrite this to be more explicit:

  • dt, df, drf, drt are allowed (as well as equivalent permutations dt, td, df, fd, drf, dfr, rdf, rfd, fdr, frd, drt, dtr, rdt, rtd, tdr, trd)
  • dtf (and equivalent permutations) are not allowed, as tf is not allowed

And

“The d prefix can only be used with triple quoted string literals and templates, """ and ''', because it’s intended for multi-line strings.”

(the way it’s currently written, it’s a bit vague is d"""a""" is actually allowed.

2 Likes

This:

Mixing spaces and tabs in indentation raises a TabError, similar to Python’s own indentation rules.

contradicts with:

s = d"""
--->____Hello
--->____World!
--->__"""  # TabError: mixing spaces and tabs in indentation

Under Python’s indentation rules, it’s perfectly fine to combine spaces and tabs. For example:

if True:
--->____print(1)
--->____print(2)
1
2

What’s not allowed is mixing them “in a way that makes the meaning dependent on the worth of a tab in spaces” (ref).
For example:

if True:
--->____print(1)
____--->print(2)
Traceback (most recent call last):
  ...
IndentationError: unindent does not match any outer indentation level

I suggest instead specifying that each line must

  • start with exactly the whitespace string that’s before the closing triple quote (which will be removed), or
  • contain only some beginning of that whitespace string (resulting in a blank line)
3 Likes

No, you would write

def hello_paragraph() -> str:
    ____return d"""
    ________<p>
    __________Hello, World!
    ________</p>\
    ________"""

if they want "<p>\n__Hello, World!\n</p>"

5 Likes

It seems in Swift, the last \n is also stripped:

let singleLineString = "These are the same."
let multilineString = """
These are the same.
"""

and to get a line break there, you have to include an empty line:

let lineBreaks = """

This string starts with a line break.
It also ends with a line break.

"""

I get that in the unix world, everything should end with a newline, but I wonder whether it isn’t more common in the world of Python to have things not end with a newline. I don’t know the answer to this, but it might be a good idea to investigate this.

2 Likes

There is no right answer about that. It’s a matter of preference.

To my eye, Swift example seems there is an empty line at last. I prefer Julia approach.

1 Like

To support multiline string without last newline without using, Julia approach looks better.

The dedentation level is determined as the longest common starting sequence of spaces or tabs in all lines, excluding the line following the opening """ and lines containing only spaces or tabs (the line containing the closing """ is always included). Then for all lines, excluding the text following the opening """ , the common starting sequence is removed (including lines containing only spaces and tabs if they start with this sequence),

julia> """
        Hello,
        world."""
"Hello,\nworld."

Although “longest common starting sequence of … excluding …” rule is not simple to explain,
it almost same to textwrap.dedent().

If many people think that you should be able to write multiline strings without \, """[:-1], """.rstrip(), or """.removesuffix('\n'), let’s make the dedent width determination the same as Julia.

2 Likes

I personally think being able to control the remaining indentation by dedenting the trailing """ is more useful than being able to remove the trailing newline without a \.

7 Likes

It’s the shortest indent following a literal newline that has non-whitespace on it, i.e. what’s on the same line as """ doesn’t count. Or:

min(re.finditer(r"\n(?P<indent>\w*)\W+",
    key=lambda match: len(match["indent"]))

The Julia docs summarize it as

triple-quoted strings are also dedented to the level of the least-indented line.

Make it “dedented to the level of the least-indented line with non-whitespace characters” and I think that’s simple enough.

2 Likes

This is a wonderful PEP, thank you for your hard work.

I have to say that I don’t find the trailing ‘\n’ intuitive at all, even after reading more of the posts in this thread (and having read the original thread a while ago).
Obviously this is steeped in personal taste, but I also think It will be easier to compose blocks of text together if the trailing line is remove.

def hello_paragraph(name):
    """assuming no trailing"""
    return df"""
        <p>
            hello {name}!
        </p>
        """

hello_world = hello_paragraph('world')

body = df"""
<body>
    {hello_world.replace("\n", "\n    ")}
</body>
"""

assert body == "<body>\n    <p>\n        hello world!\n    </p>\n</body>"

print(body)

<body>
    <p>
        hello world!
    </p>
</body>

I think referring to POSIX line definition is not really relevant to most string processing in python: taking "a\nb".splitline() for example will give a length a list of length 2, and the common "\n".join(...) idiom doesn’t add a trailing “\n”. It’s only at the edge of the program, when writing to file, when POSIX lines is relevant.

Even then, writing "a\nb\n" with the trailing line, then opening the file in an editor (vs code) shows up as something like

1  a
2  b
3

So to me, and I’m guessing the majority of newer programmer, the “obvious” way do this with d-strings is to wrap the whole thing with triple quotes: They “look” almost like braces.

d"""
a
b

"""

I think in general it will be easier to explicitly write trailing newlines when needed than having to explicitly remove it when needed.


I think adding a supporting format specs for indentation-composition could solve this: If there is a definite, privileged “One obvious way” of making, indenting, and composing text blocks together, that would solve any personal preferences issue.

Even/especially if all the format spec does is some trivial function like .strip().replace()

hello_world_block = ...

# Exactly one of this is the correct way
body = df"""
<body>
{hello_world_block:indentation spec}
OR
{hello_world_block:indentation spec}/
OR
    {hello_world_block:indentation spec}
OR
    {hello_world_block:indentation spec}/
</body>
"""

Or maybe even just a few more examples in the PEP would work? IDK I can’t un-read it and it’s late

3 Likes

I mentioned in the earlier d-strings thread that I thought we should preserve the leading newline, for consistency with other string types.
However, I 100% understand the appeal of stripping it off.

If the leading newline is removed, does it make sense for the trailing one to be removed as well? I think there’s a flavor of consistency there, in that the leading and trailing lines are the delimiters but not part of the content.


I still think there are too many string prefixes, and prefer the str.dedent method idea, acknowledging its limitations. But if we’re doing d-strings, let’s make it as good as possible!

4 Likes

I was already suggesting a dedicated symbol in the previous thread, there is no indentation “spec” necessary, the reindentation can be inferred from the braces position :

body = df"""
<body>
    {->hello_world_block}
</body>
"""

The general opinion about this was that it probably belongs to another PEP.

I think the following consideration is key

But I’d want to make it symmetrical w.r.t. \. If s2 begins with """\ (modulo string qualifiers) then so should s1; both starting with bare """ should match just the same.

I believe the only way to make all of this internally consistent (especially with regular multi-line strings) is to not touch beginning or trailing newlines. Let’s please avoid magic pre-processing that user cannot opt out of, but do the “obvious” thing, with user freedom to opt into stripping (e.g. by adding a backslash or simply beginning with the body right after the opening """).

As I mentioned further up, this is easy to make consistent by excepting the line containing the opening """ from any dedenting. Julia also exempts this line

The trailing """ is slightly harder, because according to the current draft, it determines how much indentation is stripped. But if this is switched to the Julia methodology

then this issue resolves itself as well, and people can then – as for the opening quotes – either use

some_long_variable = d"""
____body\
____"""

or

some_long_variable = d"""
____body"""

to suppress the trailing newline.

1 Like

I’m +1 on stripping the first and last newline.

I think seeing additional newlines is more obvious in showing intent that there should be more lines than the single backslash doing the opposite. (And the backslash placement is bad on my German keyboard, so the less I need them, the better)

I also think it’s easier to work with when copy pasting text in and out. If I take a paragraph of some Markdown file (e.g. documentation) and copy it into a python multiline string (e.g. docstring) or vice versa, I don’t want to add/remove backslashes every time. I can change indentation easily with (Shift) Tab, can’t say the same for the backslashes.

5 Likes