Pre-PEP: d-string / Dedented Multiline Strings with Optional Language Hinting

I agree. And I don’t see why anyone would want that leading whitespace in a docstring either.

def f():
  """
  Do something.
  """

is one of the many common styles for docstrings, and people don’t want it to have leading or trailing whitespace. It seems to me to be an accident of history that it does.

I think if you want leading or trailing newlines, you should add them:

"""

Has an extra leading newline
"""

and similarly:

"""
Has an extra trailing newline

"""

Forcing people to strip or use backslashes is really ugly IMO.

9 Likes

For starting newline, I am 100% agree with you.
On the other hand, for ending newline, I don’t want to remove it.

s = d"""
one
two
three
"""
assert s == "one\ntwo\nthree\n"

This string looks three lines. And line in POSIX includes terminal newline. (ref)

rstrip or backslash is ugly. But it is rarely used, only for creating ugly multiline string.
Of course this is my subjective opinion.

FWIW, C# and Swift remove last newline. Java and Julia keep last newline.

6 Likes

While I generally agree with @methane’s proposal, I think sirosen had a valid point about r-strings (which I was originally not thinking about). While under the proposal a trailing backslash can be used to remove a trailing newiline

s = d"""
one
two
three\
"""
assert s == "one\ntwo\nthree"

one would need other tricks, like [:-1] for r-strings. One could make an argument that automatic dedentation and raw are mutually exclusive, but I am not sure that the folks would be happy with that. (Mutually exclusive string prefixes would not be new: f and t are mutually exclusive, and u cannot be combined with any other prefix.)

1 Like

t-strings are non-string objects built from a convenient syntax that looks like a string literal. The u prefix does nothing and is legacy. So I don’t think those two are a good point of comparison.

I wouldn’t want this to be mutex with f and r prefixes. I can’t recall ever wanting dedented bytes, but it may be common in some domains. Allowing db makes sense, and I’m not sure I see the value in restricting it.

dt as a prefix might be useful. But nobody mentioned a Template.dedent() method earlier, so I assumed we’re not discussing support for this.


One higher level note.
All of the existing string prefixes do things which cannot be done by postprocessing a string. So they justify themselves in terms of necessity. d-strings justify text. Are there proposed rules which cannot be done as postprocessing?

I recall such things being proposed earlier, but I admit to having gotten confused about what the suggested dedent rules are.

1 Like

The only thing I’m aware of is some of the weird interactions between f-strings and interpolated multi-line values. For example:

val = "abc\ndef"

final = f"""\
    xxx
    {val}
    yyy
"""

If you interpolate and then post-process, you get

final = """\
    xxx
    abc
def
    yyy
""".dedent()

That’s probably not the intended behaviour - what you’d probably want is to dedent before doing the interpolation. Although I’m not entirely sure that wouldn’t have edge cases as well.

But I’ll be honest, I have no idea how important this is in practice. Most of the cases I’ve wanted indented triple-quoted strings have been fixed strings, not f-strings, and when I’ve wanted f-strings I’m usually only interpolating single-line data. Whenever I want to insert multi-line data into multi-line templates, and I care about the layout of the final string, things are getting complex enough that I’d be reaching for a proper templating system like Jinja. So for me, a .dedent method hits the sweet spot between cost and benefit - d-strings are a lot more cost (both in terms of language complexity and maintainability) for a relatively small benefit.

4 Likes

I for one, have the habit of doing from textwrapp import dedent as D and them `D(“”“\n …\n…”“”) in code. And the only reason I don’t do it in every place I use multiline strings is because it feels too clumsy. (The import and the extra () )

AFAIC, this would be a very much welcome addition.

1 Like

If it’s only the import and the parentheses that are the issue, how do you feel about

val = """
    some
    string
""".dedent()

which eliminates those two issues, but doesn’t introduce a new type of string, with new rules and edge cases?

2 Likes

I have answered this already several times:

s = """
    abc\
    def\nghi
    lmn
    """

would become "abcdef\nghi\nlmn\n" under the proposed rules since dedenting would be pre-processing before the handling of backslash-ecapes. This cannot be done with post-processing. Several comparable issues arize in the context of f-strings, as Paul pointed out:

a = "abc\ndef"
b = textwrap.dedent("""
    Hallo {a}!
    How are you?
    """")

does not dedent anything. t-strings do not create string objects at all, so current techniques are certainly not available.

Nobody proposed such a restriction.

Nobody proposed such a restriction. Quite the opposite, t-strings were a main motivation of the OP.

1 Like

But please remember, the point I made was that I don’t think these are likely to come up in real world usage. In your own example, you had to use a value of “abc\ndef” to make your point, and that’s not realistic as a name (which is how your f-string uses it).

It would be better than nothing, I guess - but the d prefix as discussed in this thread would be more like a “dream come true” -

I myself had not brought the topic earlier to being afraid of it being too easily dismissed. (Although, I vaguely recall sending some email in support for this on the old mailing lists)

But it is really a thing - even in this small snippet in your post, it would make a difference - it is like,a no-brainer just adding a d, or spend from seconds to minutes on “how much effort and code noise is it worth just to remove this whitespace?”

Difference between f-string.dedent() (dedent after format) and t-string.dedent() (dedent before format) is also advantage of d-string over .dedent().

I am not sure about sweet spot. Both are OK to me.
But 70% people prefer d-string over .dedent(). (poll)

I could think of a cli script along the lines

def _cli_help():
    print(f"""
        myscript [-f] [-g] filename

        {translations['desc']}

        filename {translations['filename']}

        -f       {translations['option_f']}

        -g       {translations['option_g']}
        """.dedent())

If translations['desc'] is too long, there might be newlines in that string. Of course, one could add 8 spaces into that string, but what happens if the function _cli_help becomes part of a class? Then, 4 more spacs need to be added to translations['desc'] or another trick has to be found.

More ascpects

If the initial whitespaces are screwed up, the whole output will be. This is not the same for the additional 9 whiespaces one might want to add after a newline in translations['filename'] in order to align a second decription line:

filename The path of the file to be read, or
         alternatively a minus - to read from console. 

If they don’t match, then this does not screw up the rest, but only have a local effect:

filename The path of the file to be read, or
    alternatively a minus - to read from console. 

The important point here: The 9 whitespaces do not change if the code is rearranged. It would only change if the overall help string itself is modified. In case of such an UI-change, it would be clear to have a look on the translations as well.

I am not sure that I have enough permission to edit the first message.
I don’t have a pencil icon edit this post anymore.

In that case, you can use regular multiline string and call .dedent().format(...).

def _cli_help():
    help = """\
        myscript [-f] [-g] filename

        {translations['desc']}

        filename {translations['filename']}

        -f       {translations['option_f']}

        -g       {translations['option_g']}
        """.dedent()
    print(help.format(translations=translations))

Or you can use t-string instead of f-string with f() sample, if we add Template.dedent().

def _cli_help():
    print(f(t"""\
        myscript [-f] [-g] filename

        {translations['desc']}

        filename {translations['filename']}

        -f       {translations['option_f']}

        -g       {translations['option_g']}
        """.dedent()))

These samples demonstrates how d-string is easier than (str|Template).dedent() method.

1 Like

This could be turned to

val = "abc\ndef"

final = """\
    xxx
    {val}
    yyy
""".dedent().format(val=val)

to avoid problems with too early formatting.

Somewhat related, tmpl.format() being a shorthand for something like tmpl.format(**(globals() | locals())) would be convenient. It would allow dropping val=val from the above example and in general avoid the need to update the arguments passed to .format() if the formatted string changes.

1 Like

globals() | locals() is not the same as the usual name binding rules. This was one of the main reasons why f-strings were invented at the first place (PEP 498)

Yes, alternatives are possible and actually helpful for the description of what is going on (ordering of the actions that happen.) Just a small nitpick at your example: str.format is generally quite restrictive concerning allowed expressions and a bit odd in one aspect: In your example, it would need to be translations[desc] without the quotes.

str.format() does not work in a scalable way for reindenting multiline.
Also another problem : You do not know where are the line returns within translation['desc'] before knowing at which indentation level it is inserted.

In the following example, the d constructor can be replaced by .dedent() (or .dedent().strip()) easily, but replacing the reindenting symbol -> at the proper level with an external method would be much more difficult :

def translate_doc():
    return df"""
    myscript [-f] [-g] filename

    {->translations['desc']}

    ..Parameters:
        filename {->translations['filename']}
        -f       {->translations['option_f']}
        -g       {->translations['option_g']}
    """

This use case requires the reindenting symbol more than the dedenting constructor.

You can emulate dedented f-strings using raw templates and a custom dedent function:

def convert(value, conversion):
    if conversion == "a":
        return ascii(value)
    elif conversion == "r":
        return repr(value)
    elif conversion == "s":
        return str(value)
    return value

def raw_template_dedent(template):
    lines_list = [string.split('\n') for string in template.strings]
    non_blank_lines = [l for lines in lines_list for l in lines if (l2 := l.removesuffix("\\")) and not l2.isspace()]
    l1 = min(non_blank_lines, default='')
    l2 = max(non_blank_lines, default='')
    margin = 0
    for margin, c in enumerate(l1):
        if c != l2[margin] or c not in ' \t':
            break
    parts = []
    for item in template:
        if isinstance(item, str):
            lines = item.split('\n')
            item = '\n'.join([l[margin:] if (l2 := l.removesuffix("\\")) and not l2.isspace() else l.lstrip() for l in lines])
            parts.append(item.encode().decode('unicode_escape'))
        else:
            value = convert(item.value, item.conversion)
            value = format(value, item.format_spec)
            parts.append(value)
    return "".join(parts)

Example:

val = "abc\ndef"
print(raw_template_dedent(rt"""\
    xxx
    {val}
    yyy\
"""))

Ouput:

xxx
abc
def
yyy

And if you alias raw_template_dedent as df, it’s only 4 characters longer.
Edit: fixed some edge cases.

1 Like

While technically true, a function using dt-strings absolutely could implement something useful here. I wouldn’t try to have -> as part of the first proposal, I think adding more syntax constructs is not going to help.

3 Likes

I am not convinced that d-strings are a worthwhile syntactic addition, but I like the idea of adding t-string support to textwrap.dedent.

I just created a better-dedent library to play with the idea of using t-strings with dedent.

2 Likes