PEP 822: Dedented Multiline String (d-string)

In the column indicating the Trailing Marker, is \n""" a shorthand for the regex \s*\n""" ? Or do those languages really require the closing """ to be at the start of a line?

Sorry I meant \n\s*””” of course.

1 Like

Swift, C# and PHP all allow the extra \s* in the trailing marker and use it for dedenting.

(And for completeness: Swift and PHP don’t allow extra \s* in the opening marker; C# does but appears to ignore it completely.)

1 Like

bash heredoc requires the endmarker to be at the beginning of the line. Other languages with heredocs required the endmarker to be at the beginning of the line, but in the process of extending to allow indentation, they allowed the endmarker to be indented.

However, I do not like the idea of including newline characters in markers. Is the content starting from the next character after the opening marker, or from the next line? Does the content include characters up to just before the end marker, or up to the previous line? I think this way of thinking is more natural. And when considering up to the line before the end marker as content, the question is whether or not to remove the last newline.

All heredoc forms consider up to the line before the end marker as content. Only PHP removes the last newline; other languages do not remove the last newline.
Among the languages that use """, C# and Swift do not allow writing the end marker on a content line, so they are similar to heredocs. Both remove the last newline.

Julia and Java allow writing the end marker on a content line. Neither removes the last newline. This is the idea this PEP currently adopts.

Regarding the opening marker, it can be said that all languages treat from the next line as content. Julia allows writing content immediately after the opening marker, but when starting with """\n, content can be written from the next line too. Ignoring the dedent feature, this PEP’s d"""\n corresponds to Julia’s """\n, and Python’s existing """ is the same as Julia’s """content....

1 Like

I understand that it’s implemented differently[1], that’s why I called it an implementation detail. I didn’t respond on this anymore because it’s pretty off-topic, but I’m coming back to it because it leads to an idea that might be interesting.

Just to wrap up the scala-side of things, in terms of actual use, you can consider """...""".stripMargin as a verbose postfix string modifier. That – together with the discussion about opening & closing markers – gave me an idea that I’m not super-duper in love with, but which may perhaps be worth exploring whether it can achieve a better trade-off globally


Which is that, if we want to introduce new behaviour around new-line stripping, we might consider imbuing the closing quotes with a modifier.

In other words, we could do:

# modifier for opening quote --> keep existing behaviour of not stripping \n
s = d"""
____foo
______bar
____baz
____"""
assert s == "\nfoo\n  bar\nbaz\n"
# modifier for closing quote --> can have new behaviour, e.g. stripping \n
s = """
____foo
______bar
____baz
____"""d
assert s == "foo\n  bar\nbaz"

Obviously, it’d also be possible to introduce only the latter. I’ll be the first to admit that behavioural differences between d""" vs. """d would have to be taught (rather than be self-explanatory), but at least the syntactically new position would allow us to introduce new behaviour in a consistent manner.


  1. the lack of syntactic context also explains why scala needs an explicit | token on each line until which it will dedent. ↩

Has anyone got an actual use case for keeping the leading \n? I can understand the contention about the trailing newline but the conversation keeps circling back to the leading one, seemingly solely on consistency for the sake of consistency grounds.

8 Likes

This is a quite odd take on the term “implementation detail”.

I would call is a “fundamental design decision”, not only of Python, but of most (modern) programming languages[1]: If one writes "...".dedent(), the member function acts on the string object created by the string literal, not on the string literal itself. So, it cannot depend on syntactic aspects of the literal, it must behave as a="..."; a.dedent(). Again, this is not because of “implementation detail”, it is how member functions generally work.


  1. Of course, the syntax may vary. In C++, a std::string object is created with a prefix: s"..." ↩

People (in Scala like in Python) want to keep their code indented without polluting their multiline strings with superfluous spaces. In Scala, the idiomatic way to do that is with a method, but, as I keep explaining, it’s besides the point that it’s a method. Once again, it’s about USAGE being equivalent to d-strings[1], not implementation.


  1. the method-ness is not relevant because it essentially always gets applied immediately at the definition site, like a d-string. ↩

I don’t think there is a use case for it. I think it is a question of consistency vs utility.

It’s fine for the PEP to decide that since nobody actually has use for the leading newline, it’s not preserved.

The conversation goes in circles when the justification is that this way is “intuitive” or (worse) “correct” and therefore doesn’t deserve to be documented. I just want to make sure the PEP acknowledges that there is a nontrivial decision to be made here – it’s not obvious even if the majority of folks have the same preference.[1]


  1. we might already be there? I haven’t read the latest changes yet. ↩

3 Likes

(Deleted) Probably guido is right: this response is not as productive as it could be.

I have a feeling this discussion is going around in circles without reaching a conclusion or agreement. Maybe everyone would do good to self-censor their instinct to immediately reply when they see something that they don’t agree with?

7 Likes

The problem is that Discourse doesn’t have a downvote button, so you can’t just downvote and move on
 (sorry, this is really off-topic)

2 Likes

I see nothing wrong with the occasional in-band post pointing out meta-issues like this. Not everything has to turn into an automated process.

1 Like

I don’t think we are looping about whether to remove the newline after the opening quote.

Vetinari was the only one who proposed the idea of not removing the first newline, but he agreed that there was no merit to that idea. His real opinion was to allow writing content after the opening quote. (PEP 822: Dedented Multiline String (d-string) - #96 by h-vetinari)

And I respond him that I will add it to the “Rejected Ideas” section.

The discussion about the opening quote is settled for now and not being repeated.
Only updating the PEP remains as my task.

Vetinari does not agree that Scala’s stripMargin is a method of strings and unrelated to the behavior of quotes, but that is not a topic that affects this PEP, no further discussion is necessary.

I don’t think there are any other topics to discuss or content to add to the PEP.

2 Likes

The “use case” framing is not very helpful when talking about consistency. Is it a “use case” to not have to audit your outputs for correct vertical spacing when switching from f""" to df"""? Is it a “use case” for users to have less special cases to learn and keep in mind?

Please don’t misrepresent my statements. Of course I understand it’s a method.

Again, there’s no need to misrepresent my position.

That aside, @gcewing and @Jost had voiced similar concerns.

In any case, I tire of the thinly-veiled hostility in this discussion. Good luck!

1 Like

I apologize for my poor English. When I said “does not agree,” I meant of course “unrelated to the behavior of quotes,” not that you denied that stripMargin is a method.

Hmm. Did I misundarstand you?
I was just trying to summarize opinions as the PEP Author, not to misrepresent your opinion.

You said that:

So I wrote

FYI, I have made a pull request to add “Allow content after opening quotes” to the rejected ideas.
I mentioned about “it looks symmetrical” in the section.

There’s a contingent of at least a few thread participants who seem to think this is at least an idea worth considering. I advocated for this in the Ideas thread. I have not pushed for it again because I consider that feedback to have been taken and rejected.

I would really like to see it listed in Rejected Ideas, with at least one sentence to explain the benefits which justify giving up consistency with other multiline strings. The current draft still says nothing on this topic, as far as I can see.

EDIT: On rereading it again, I think the PR above addresses this? It’s phrased very, very differently from how I think about the issue, but it touches on the same topic.

I wrote this in response to “the conversation keeps circling back to the leading one, seemingly solely on consistency for the sake of consistency grounds.”

So I didn’t mean that no one has supported this idea in the past, but that only Vetinari proposed the idea in this thread, and since he himself actually wants to allow not just newlines but characters as well, there are no longer people repeating this idea.

Upon rereading your comments from the previous thread, I did not think you wanted to keep the newline immediately after the opening quote. You were opposed to the introduction of d-strings itself.
That is related in the sense of not increasing the behavior of quotes. However, it is not the same idea. No circle made there.

In the previous thread, you opposed increasing the complexity when concatenating some string literals. You thought str.dedent() was better than d-strings.

So this is not just a matter of the newline immediately after the opening quote, but the whole issue of d-strings that dedent in literals in the first place.
So please read “Motivation”, “Ratinale”, and “Rejected Ideas / str.dedent() method” sections too. The reasons why changes to literals are necessary are written there.

The biggest problem is that str.dedent() does not work for t-strings. Even if we added a Template.dedent() method, f-string + str.dedent() and t-string + Template.dedent() would not behave the same way.
t-strings are intended to be used for writing SQL, HTML, etc., and those use cases often overlap with use cases that require dedent.
So I believe that providing consistent dedent functionality for t-strings and f-strings is worth the cost of complicating the string literal concatenation rules.

I thought the issues with string literal concatenation are naturally included in the complexity of adding string prefixes, so I didn’t think they need to be mentioned specifically.
However, if you think that point should absolutely be included in the PEP, could you propose a sentence explaining the magnitude of the issue?

Thanks for going back to the old posts! I did indeed prefer str.dedent() (and still do, even with its disadvantages). There were also a few posts starting here in which I advocated for keeping the leading newline character for consistency with existing multiline string syntax.

The old thread was quite long, so I’m not surprised that this point may have gotten lost.

I think you’ve misread which point I want covered in Rejected Ideas. Let me just propose some text and you can take it or not (or modify), as you prefer:

Rejected Ideas

Preserving the Initial Newline

Existing multiline string literals do not require that the first character be \n. If there is an initial newline, it is preserved, and users rely on """\ to strip the leading newline as is often desired.
D-strings could implement the same behavior, keeping them consistent with other string literals.

However, consistent behavior with other multiline literal types would allow for string content to start immediately after d""", with no initial newline.
An explicit goal of this PEP is to reserve the initial line of the string for future use, as a potential space for markers for the language of the dedented text.

Furthermore, based on discussion, there are no significant use cases which actively benefit from having an initial newline in the resulting string content.

I have written that based on my best understanding of the argument against consistency with other multiline string literals. It doesn’t convince me, personally, but I think it’s a reasonable viewpoint.

2 Likes