This is just a hack, but similar purpose.
PHP has heredoc with END identifier. << END. PhpStorm can recognize END identifier as language marker. << JSON, << SQL.
C# doesn’t have such component. But since C# is statically typed language, Visual C# can do syntax highlight for new Regex() argument:
Inline comments like /* lang=html */ are also used.
Similar syntax was proposed to C#, but static analysis was chosen:
Since Python doesn’t have inline comment, it is interesting to allow something in the first line.
Another idea is allowing comment in the first line, instead of language hint:
query = d"""# lang=SQL. This is comment
SELECT
id, name, age
FROM
user
WHERE
id=?
"""
For dedenting, I’ve long used textwrap.dedent or its lesser-known and possibly more convenient cousin inspect.cleandoc. I don’t find the extra function call to be that much trouble or distracting.
There seem to be new proposals for string prefixes frequently these days. I’d like to propose a general rule of thumb:
If the behavior you want can be implemented with a function call, then it isn’t going to become a string prefix.
For example, r-, f-, and t- strings cannot be implemented as a function call with another kind of string as an argument. They require special processing before the string becomes a string object.
On the other hand, d-strings can be implemented as a function call. Therefore, I propose that we will continue to recommend the function call, and won’t add a new lexical rule to the language for d-strings.
Julia has dedent included natively in multiline strings : Strings · The Julia Language. The rules are not simple but they look practical (I never used Julia myself).
In my code I use global variables, but usually the need desn’t come that often.
This proposal has two (basically unrelated) parts: dedenting and a mechanism for annotating the language in the string, for syntax highlighting. I believe that question is about the second part, not the first.
I would agree except that It can’t really. If you do a multiline insertion into a big f"string" then it’ll butcher the indentation.
multiline_insertion = """\
foo
bar
"""
template = f"""
Some
long
template
{multiline_insertion}
blah
blah
blah
"""
No use of textwrap.detent() is going to result in that string looking the way it’s supposed to. The best you can do is to write the {multiline_insertion} bit without indentation then dedent the whole thing afterwards (assuming the two literals are written at the same indentation level – it’s a lot muddier if one of those strings is inside an extra if block).
And using \ to get a long single line string is also impossible. Writing:
textwrap.dedent("""
A long \
piece of \
text
""")
results in a string where the spaces aren’t removed (i.e. '\nA long piece of text\n').
The one thing that has come up repeatedly in past dedented string discussions is that the reason people want it is to save both runtime overhead and memory. it would be done at compilation time so the resulting string stored in the bytecode and in memory would be smaller. With zero runtime cost. Either saving repeated pain or startup time pain, and always saving memory. The measurable need for that savings impacts huge codebases more so than smaller projects.
I still agree with continuing to use the function call syntax as today’s recommendation. It clearly expresses intent. But it has never been possible to reliably optimize that call away at compile time due to Python being as dynamic as Python… Actual syntax is one way to finish that thought - for constants, a .dedent() method on a str is something that could also be optimized out. That doesn’t require syntax, just a method:
I programmed a bit in scala a while ago, and one approach that language has which I liked a lot, is to denote the “baseline” indentation of the string that’s itself indented, which looks like this (docs):
val quote = """The essence of Scala:
|Fusion of functional and object-oriented
|programming in a typed setting.""".stripMargin
This produces the exact string
The essence of Scala:
Fusion of functional and object-oriented
programming in a typed setting.
This is particularly useful where the string you’d like to write within some indented code itself has some varying degrees of indentation. Without a baseline marker, it becomes very difficult to tell where the dedent ends and the indentation begins.
With |, there’s a clear visual marker that’s easy to digest visually, as well as easy to lint and/or syntax-highlight on.
I could imagine that d-strings could do the .stripMargin part by default, so modifying @methane’s example from above slightly, this could look like
if some_condition:
html_parts += d"""\
| <div>
| Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur [...]
| </div>
|"""
which would add the following string at the end of html_parts[1]:
# no line break before (see \)
<div>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur [...]
</div>
# with line break at the end, because the final """ was on a new line, and no \ before
minus the lines starting with # which are for explanation only ↩︎
I like the “baseline” idea and in Python I guess the baseline can be made implicitly one level after the d-string start. Modern code editors show a hint of a line there.
if problem:
raise ValueError(d"\
Multi line message
... line 2 ...
")
It is similar to the way a longer line is usually wrapped now:
if problem:
raise ValueError(
"Some longer single line message")
This would allow to create a text with some intended indentation (pun not planned).
It might make sense to simply allow the indentation before the closing quote to dictate the level of dedentation, so that the code above can be rewritten as:
if some_condition:
html_parts += d"""\
<div>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur [...]
</div>
""" # leaves 2 characters of indentation when compared to the lines above
I fail to see the problem. The issue is whether raw strings end up completely disregarding the surrounding indentation, resp. the that existing workarounds like dedent are cumbersome. Whether the actual text is indented 16 or 17 characters is negligible visually (and the |would be aligned correctly with other code on the same indentation level).
Indeed, but that cost is worth the gain IMO (aside from the very substantial likelihood that the python REPL as well as major IDEs would learn to ignore the leading \s*\| when pasting[1]).
That’s a corner case that has some obvious solutions (e.g. don’t use d-strings), but even if we posit this as a required case, I fail to see the issue. In that case you’d just have a double pipe, where the first one would get stripped (and probably greyed out by syntax highlighting)
That’s fine for machines but terrible for humans. If you got more than a handful of lines, it’ll be very hard to keep track of where the closing quote is (perhaps even off-screen), leading to repeated yet avoidable mistakes. I like the |-approach because it avoids exactly this sort of papercut.
say, if all lines being pasted start with that pattern. ↩︎
What I mean is that the quoted content itself often has nested indentation that follows the indentation of the surrounding Python code. The HTML example above uses a 2-character indentation so it isn’t actually representative of my preferred style.
The code I have in mind is more like (note the indentation now 4 characters per level even in the HTML):
first = True
if some_condition:
if first:
html_parts += "some header"
first = False
html_parts += d"""\
<div>
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
</div>
"""
And with your proposed | baseline marker it would read:
first = True
if some_condition:
if first:
html_parts += "some header"
first = False
html_parts += d"""\
| <div>
| Lorem ipsum dolor sit amet, consectetur adipiscing elit.
| </div>
|"""
which makes it obvious that <div> is not indented evenly with the surrounding code such as first = False.
What makes it worse is that the IDE for Python code usually has a tab setting of 4 spaces, so if I press the tab key to further indent Lorem it would incorrectly insert 3 spaces instead of 4 because | already pushes the indentation by 1 character.
My points #2 and #3 are relatively minor and can be worked around like you suggested but there really is no way around the downside of my point #1.
I agree that my suggested workaround isn’t great either but I don’t see an elegant solution so far.
Trying to predict and solve all possible use cases instead of taking the practical simple approach means we’re unlikely to ever do anything.
Fancier pie in the sky ideas that could be done should not negate the value of doing the simple thing that works to achieve real savings via str.dedent today.
Aren’t two solutions to apparently (but not strictly) the same problem always in conflict? If """...""".dedent() was ever implemented, it would be even harder to get an f-string/t-string/trailing backslash aware option past the why do we need yet another way of dedenting strings sayers.
I would indent the | portion relative to html_parts.
if some_condition:
if first:
html_parts += "some header"
first = False
html_parts += d"""\
| <div>
| Lorem ipsum dolor sit amet, consectetur adipiscing elit.
| </div>
|"""
so then the content of html_parts is aligned with first = False.
I haven’t looked at Java-flavoured IDEs in a while, but I’m willing to wager that this is a solved problem for Scala. That IDEs can provide some support for these common workflows doesn’t solve the issue across all editors, but it would still soften the blow substantially.
I don’t like Scala’s stripMargin.
It uses |-marker because stripMargin is string method. It has some limitations same to str.dedent().
If we add method to str, it should be same to textwrap.dedent(), not Scala’s stripMargin.
Swift, C#, Julia, Java has multiline string with dedent literal. All of them don’t use |-marker.
Reading that JEP, I would be in favor of adopting those rules for all multiline strings, including f-strings and t-strings. If a d'' tag is needed to support a transition period, fine, but I would hope that it would be like a __future__ import and become unnecessary after a certain point.