Trimmed multiline string

Usually, if I have to write a multiline string, I have do:

a = """Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. 

Donec enim diam vulputate ut pharetra sit amet. 

Vitae tortor condimentum lacinia quis."""

I propose to add another multiline string: t"""multiline string"""

With this type, the previous multine can be written as:

a = """
    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor 
    incididunt ut labore et dolore magna aliqua. 

    Donec enim diam vulputate ut pharetra sit amet. 

    Vitae tortor condimentum lacinia quis.
"""

A t"""multiline string""" should act as a normal multiline string, with this exceptions:

  1. one newline character at the start and at the end of the string will be removed, if present
  2. every line after the first one will be right left trimmed
1 Like

Do you mean left trimmed instead of right? With right trimming you’d remove the new line but keep the indentation.

This already exists. Just call inspect.cleandoc(a). It gives it the PEP 257 docstring treatment (for example, what the built-in help utility shows you).

1 Like

The different is similar to f-string vs str.format(). The latter exists and solves the problem, but incurs runtime penalty, and is arguably less readable. Of course, formatting is a much more common operation, and doesn’t have a feasible alternative other than f-string, so this may not be useful enough to warrant a dedicated syntax. It does IMO, but that might not be the majority opinion.

There was several discussions about this at last few years. Please find and read the past discussions before opening a new discussion and repeating old arguments which already was considered many times.

1 Like

this may not be useful enough to warrant a dedicated syntax

Well, it can only be applied to all docstrings… :smiley:

There was several discussions about this at last few years

You’re right, but I read them and no one proposed this syntax.

@uranusjr: Anyway, since it was proposed many times and also a PEP was opened about it, I think is useful enough.

Yes, this has come up before. - It is tracked as https://bugs.python.org/issue36906 - I’m in favor of solving it at the language level.

inspect.cleandoc() and textwrap.dedent() are not solutions. They have a run-time cost and the process is forced to carry the burden of the extra padded string data around with it in memory.

I’d like it if all docstrings were treated this way by default (without a prefix) but I’m assuming there are objections to a context dependent semantic change. Practicality beats purity is my response even if I don’t expect to win that one.

Options for dedicated syntax (the actual desire):

  1. Add yet another letter prefix such as t or m or d or c (and argue about which letter).

  2. Reuse the backquote (backtick) character now that we have removed it as a token and feature in Python 3. We could introduce a ``` token that acts as an auto-cleandoc’d or dedent’d multi-line string. (the other letter prefixes would apply to it). I intentionally recommend not supporting a single backquote, only a new triple tick token. That way it avoids python 2 ` repr vs python 3 syntax concerns.

A drawback to all options: Yet more new syntax.

Drawback on #1: The letter prefix makes no sense on a single quote string, but we’ve not differentiated between single or triple quotes when it comes to prefixes in the past. It’d be a no-op on single line string literals unless we decided to break that rule.

Drawback on #2: An sad downside to ``` is that it is virtually impossible to represent within most markdown documentation rendering engines. And markdown is crazy-popular. They don’t have a way - or don’t have a consistent way - to escape a triple backquote within the middle of their own multi-line. :confused:

>>> def castle_argh(self):
...    return ```multi-line
... 
...    dedented string.
...    ```
... 
>>> castle_argh()
'multi-line\n\ndedented string\n'

That only renders in here Discourse today because i added the REPL prefixes on the line. I can’t write a cut and pasteable code snippet using those. <pre> tags appear to work here, but lose syntax highlighting. And there seem to be 183 markdown renderer implementations out there (Github itself applies multiple to the same files!), each of which varies its behavior between one another and over time.

Upside to #2: People are already used to reading ``` as a multi-line block thanks to markdown! :slight_smile:

Example implementations in PRs along with a PEP laying this out would help.

1 Like

Well, I’m not against ```, but it remembers to me more the start of a code block instead of a text block.

About letter, I’m in favor of t for trim. And yes, I don’t like the fact that is useful for multi-line strings only. Maybe the optimal solution is another one.

Third option – automatically dedent all string literals. Overwhelming majority of multiline string literals should be dedented. It is easy to make it by default and add a way to prohibot dedenting.

This is backward incompatible change, so we will need to introduce a future import for switching this behavior.

-1 to any backward incompatibility. Py2->Py3 is not enough?

Most Markdown implementations have two code block delimiters, triple backtick (`) and triple tilde (~), but for some reason most people seem to be unaware of the latter. You need no additional escaping if you wrap the code block with ~~~ instead of ``` :slightly_smiling_face:

Most. And users usually wants to do a copy - paste simply. And furthermore ``` means code block, in Markdown, not text block.

I challenge you to actually find a meaningful Markdown implementation that tilde does not work :grimacing:

I challenge you to find a not lazy programmer that do not do only copy - paste and know about ~~~

Third option – automatically dedent all string literals. Overwhelming majority of multiline string literals should be dedented. It is easy to make it by default and add a way to prohibot dedenting.

Hmm. If you want to keep whitespace, just use a r"raw" string then?

I think it intended multi-strings only. But yes, I think you have to put r"""blablabla""".

Mh, well, no. The original propose of @storchaka was this one. I think that r makes much more sense…

Anyway, for what it’s worth, I continue to be against something that break also a 0.000000000000000000000000000000000000000000000000000000000000000001% of the already written code.

Excuse me for the spam. @gpshead: do you think I can, and it’s worth it, to write a PEP about this?

1 Like

No, the r prefix affects the interpretation of a backslash.

I proposed to add non-indented \<newline> if you want to keep indentation:

a = """\
    dedented
"""
b = """\
\
    not dedented
"""

Other option – always include the line containing the closing """:

a = """\
    dedented
    """
b = """\
    not dedented
"""

It corresponds Julia syntax which looks well thought out.