Trimmed multiline string

Marco_Sulla · November 7, 2019, 5:55pm

Terrible. It’s not simply ugly, is also counter-intuitive, since I expect that everything inside a string is part of the string, not a magic character that changes the whole string. Or you think that people that added to Python the support to prefixes, that are outside the string, like r"", were just less smart than Julia devs?

storchaka · November 7, 2019, 7:25pm

The r prefix does not have any relation to dedeting the string literal.

Julia also has string prefixes.

Marco_Sulla · November 7, 2019, 9:06pm

Well, I suppose another prefix must be chosen, so. But I still hope you change your mind.

So why not using them also in that case? This Julia seems not to me a good example of consistent specification.

gpshead · November 19, 2019, 6:28am

Sorry for the delay. Yes, I think it is worthwhile to make a PEP for this. Are you volunteering? Go for it!

h-vetinari · November 19, 2019, 8:11am

From the OP:

I don’t think this is the right approach. Docstrings for example contain indentation within, that often needs to be kept. A more reasonable approach to me seems to left-trim the first line, and trim all following lines by the same amount.

FWIW, scala has a different solution to this where one can strip margins up to a control-character (default ‘|’) from the left, e.g.

val s = """
          |Short explanation # first newline not trimmed by default
          |
          |Long explanation
          |
          |Parameters:
          |    param1: xxx 
          |    param2: yyy
          |""".stripMargin

results in the literal:


Short explanation # first newline not trimmed by default

Long explanation

Parameters:
    param1: xxx 
    param2: yyy

Of course, manually adding these control characters would be a pain as well, but for scala, there’s good IDE-support so that these never usually don’t have to be typed. For those without scala, there’s a web-REPL here, and the docs here.

Marco_Sulla · November 19, 2019, 9:46am

This is a not a bad idea, but it means that the interpreter should check the indentation of the de-dented multi-line string too. I mean, something like this should raise a SyntaxError:

my_docs = d"""
   Marco
 Sulla
"""

I don’t know how much this will be a trouble for py program startup performance and programmer expectations.

h-vetinari · November 20, 2019, 12:13pm

I also think raising a SyntaxError is probably not a good idea (expectation-wise).

However, this could easily be adapted by left-trimming the minimal amount of whitespace across all lines in the multiline string. This is also what the most recent Java Enhancement Proposal plans for multiline strings:

It removes the same amount of white space from each line of content until at least one of the lines has a non-white space character in the leftmost position.

One would still have to come up with a way of how to deal with tabs vs spaces within (strictly speaking) string literals, but I think that’s necessary anyway, because I don’t consider just stripping all whitespace on the left to be an option (because of the need to keep some indentation e.g. within docstrings). SyntaxError aren’t necessary in this case either, IMO better to (e.g.) convert tabs to 4 spaces (and raise a warning).

Marco_Sulla · November 20, 2019, 1:14pm

I don’t think it’s a good idea…

h-vetinari · November 20, 2019, 1:37pm

Fair enough. Explicit is better than implicit and all that. But then the next obvious choice is to raise a SyntaxError for mixed tabs and spaces. This is not unprecedented - f"..."-strings can and do raise them as well, so it can make sense to raise one if a d"..."-string cannot unambiguously perform the task it is asked to do by the code.

Adapted from the JEP, this could then be formulated as:

The d"..."-string removes the same amount of EXCLUSIVELY spaces or EXCLUSIVELY tabs from each line of content until at least one of the lines has a non-white space character in the leftmost position. Having mixed tabs/spaces across the leftmost whitespace of different lines raises a SyntaxError.

Marco_Sulla · November 20, 2019, 9:50pm

Well, yes… if I can say the truth, IMHO it’s the best solution. We have also to add that d"""[...]""" removes also the first empty line of the string and do a right trim.

The problem is I don’t know how much this will be easy for people… I mean, someone can exclame “What the… I have to indent properly also the documentation?”

On the other hand, a multiline string can also be used to write code, that could be evaled or written to file. In that case SyntaxError is useful, since the interpreter will inform that the string declaration is wrong, without running the code.

But you can also write non-python code in your multiline string… for example I created a code generator for Spring Boot using Python. In that case there’s no need to raise a SyntaxError.

I think the most practical solution is an algorithm like this one:

from io import StringIO
import re

example = """     
    a simple
 	example
               """

todedent = ""
skip_first_line = False
rex = re.compile(r"^(\s+)")

with StringIO(example.rstrip()) as stream:
    for i, line in enumerate(stream):
        if i == 0 and line.isspace():
            skip_first_line = True
            continue
        
        match = rex.match(line)
        
        if not match:
            todedent = ""
            break
        
        indent = match.group(1)
        
        if todedent:
            for i, (s1, s2) in enumerate(zip(todedent, indent)):
                if s1 != s2: break
            
            todedent = indent[0:i]
            
            if not todedent: break
        else: todedent = indent
    
    stream.seek(0)
    
    if skip_first_line: next(stream)
    
    if todedent:
        res = ""
        i_dedent = len(todedent)
        
        for line in stream: res += line[i_dedent:]
    else: res = "".join(stream.readlines())

The result is:

>>> res
'   a simple\n\texample'

Notice that the 1st line was an empty line, the 2nd line started with 4 spaces and the 3rd line started with 1 space and 1 tab.

Shortly, the multiline is dedented by a starting empty string that is equal to the start of all the lines (in the example, only 1 space). No SyntaxError, no tab conversion.