Pre-PEP: d-string / Dedented Multiline Strings with Optional Language Hinting

To get more people to participate in the poll, I posted the URL on Reddit.

During reading comments on the post, I came up with one idea. Add not only str.dedent() but Template.dedent() too.
Template.dedent() calculate common prefix from the whole template. It can computed in compile time.

This idea fixes the issue str.dedent() and textwrap.dedent() don’t work for t-string.

On the other hand, f-string.dedent() behavior is different from t-string.dedent().
This is big downside of Template.dedent() idea.

1 Like

Maybe the edge case where a level of indentation is desired to be left over might be worth discussing :

str0 = d"""
    This is a list of tasks generated by selective addition of strings :
    """
item1 = d"""
        - This is task 1, where the description is long enough to takes several
    lines
    """
item2 = d"""
        - This is task 2, should be indented but is not
    """

str0+'\n'+item1+'\n'+item2

Would return the (undesired) result :

"""
This is a list of tasks generated by selective addition of elements :
    - This is task 1, where the description is long enough to takes several
lines
- This is task 2, should be indented but is not
"""

But there is no convention about the indentation level within a triple-quote string (to my knowledge), maybe the proposal for d-strings should come with one ?

Why? Also the last line matters! (Under the Java-rules, it would actually only be the last line that would matter.) In

the common number of whitespace is 4, so this would be equivalent to " - This is task 2, should be indented but is not\n". In order to suppress the final newline, one could then write

item2 = d"""
        - This is task 2, should be indented but is not\
    """

Yes but the prefixes f and r are impacting the construction of the string → after construction, there is no way to distinguish a string constructed as an f-string or a simple string. If d-strings are constructed the same way, the strings are dedented individually before concatenation, thus item1 and item2 are dedented of 4 and 8 spaces respectively, which yields a different indentation level between them in the undesired result.
(the final newline is indeed omitted in my example but it does not matter actually)

Ok, I think you mean the reference indentation level is the one of the closing triple-quotes. That elegantly solves the problem but it is not what is stated in the draft PEP :

EDIT : Actually it does since the leading whitespace of last line is the smallest common one.
Sorry for the confusion, probably this can be made more obvious.

Well, given your snippet, it seens it very much does.
Indenting in Pythos is not just a fancy thing that goes along with bracketing, you know - and having all logically nested code inside indented blocks is what most people will expect (not to mention the “subjective” prettyness, which translates in reduced cognitive friction to deal with a code block)

Remaining indent is specified by closing """. For example,

item2 = d"""
        - This is task 2, should be indented but is not
    """
#   ^^^^

Result: " - This is task 2, should be indented but is not\n"

So if you want keep 4-space indent for bullet list, correct code is:

str0 = d"""
    This is a list of tasks generated by selective addition of strings :
    """
item1 = d"""
        - This is task 1, where the description is long enough to takes several
          lines
    """
item2 = d"""
        - This is task 2, should be indented but is not
    """

str0 + item1 + item2

And if you don’t want to indent bullet list:

str0 = d"""
    This is a list of tasks generated by selective addition of strings :
    """
item1 = d"""
    - This is task 1, where the description is long enough to takes several
      lines
    """
item2 = d"""
    - This is task 2, should be indented but is not
    """
str0 + item1 + item2
1 Like

Current polling result:

  • Improve something or not: 42 vs 5. About 90% people are +1 on improve multiline string usability!
  • Improve syntax or not: 33 vs 14. About 70% people are +1 on improve syntax.
    • But this result may not include Template.dedent() idea.
  • d-string vs __future__: 21 vs 12. About 63% people against __future__.
    • I still like __future__ idea, but I have abandoned it.

Previous poll is still open, but I want to open next poll:

Improve syntax vs add method?

  • d-string or triple-quote
  • str.dedent and Template.dedent
0 voters

If improve syntax, d-prefix or triple backquote (```)?

  • d-string
  • triple backquote
0 voters

Note for options:

    1. d-string
    • new combination of string prefixe.
    • no new symbol in syntax.
    • support dedenting line continuation.
    • remaining indent can be specified by closing quote.
    1. triple backquote (```)
    • doesn’t increase combination of string prefix.
    • but using new symbol.
    • other characteristics are the same as for d-string.
    1. str.dedent() and Template.dedent()
    • no new syntax rule is needed at all.
    • cannot dedent line-continuation.
    • f-string.dedent() and t-string.dedent() looks similar but behavior is different.
      • f-string.dedent() behavior would be pitfall
    • f-string.dedent() cannot be done at compile time.
    • remaining indent should be specified as argument. (e.g. dedent(indent=4), dedent(indent=' '*4), `dedent().indent(’ '*4), etc…)

Triple backticks look neat but would unfortunately make quoting a code fence in Markdown difficult.

One possible approach is to allow both syntaxes of a d-string and a triple-backquoted string to minimize the chance of content that can’t be quoted without having to escape characters.

On second thought, even if a triple-backquoted string literal doesn’t contain triple backquotes, it makes the code unable to be easily included in a code fence in a Markdown-enabled forum such as Discourse and GitHub issues. So scratch the hybrid approach.

4 Likes

FWIW, I only voted against __future__ because I could only vote once. Had I been able to vote for more than one, I’d have picked __future__ too.

4 Likes

I’ve sort of lost track of the details of each proposal, but I just want to say that I’m leery of adding something like this if it is too biased towards code formatted in particular ways. For instance, code that uses tabs should not be disadvantaged by any proposal.

Because of that I’m pretty hard against anything like a string prefix, as that would necessitate choosing a particular dedent algorithm for all such strings. A method would be better, as it could accept an argument for different dedent styles, which could potentially be extended in the future. But still I’m not sure if either is really needed.

Tabs have not been mentioned here, but I would say that any matching whitespace sequence is removed. If the sequences are not matching, an error would be generated (using prefix or future import, this would actually happen at compile time, which would be preferably.) This sould follow the decisions that Python has made many years ago for its code blocks.

d-string is space/tab neutral. You can use TAB indent for both of stripped indent part and remaining indent part.

    # ---> represents TAB.
    text = d"""
------->   hello
------->   world
------->"""
    assert text == "   hello\n   world\n"

    text2 = d"""
        ------->hello
        ------->world
        """
    assert text2 == "\thello\n\tworld\n"

Would it be a good idea to suggest in the PEP and documentation that linters and users should—when possible—conventionally line up the first column of the first line of the string with the first column of the final line of the string?

    text = d"""
---------->   hello
---------->   world
---------->""" 

I don’t necessarily think I would want to use that convention all the time, and it doesn’t seem like a good use of anyone’s time to invite a style argument in a PEP. black will surely develop an convention, and other formatters may develop options, and those arguments can be had among people who are impacted by those tools.

2 Likes

Most probably people would like to be able to follow the indentation within the generated string as well as within the generating code simultaneously. Flexibility is desirable in this case.

For example
def write_func(option=1):
    func_def = d"""    
    def func(...):
        preinit()
    """
    loop_opening = d"""
        for i in range{N}:
            loopinit()
    """
    if option == 1:
        step_instructions = d"""
            do_stuff_v1(...)
    """
    elif option == 2:
        step_instructions = d"""
            do_stuff_v2(...)
    """
    return func_def + loop_opening + step_instructions 
1 Like

Instead of the + operator, I think the df prefix should be the recommended approach when concatenating strings in an indentation-aware manner so that step_instructions when it’s being defined doesn’t have to know in advance how deeply it’s going to be indented when used as a fragment of a larger string:

def write_func(option=1):
    func_def = d"""    
    def func(...):
        preinit()
    """
    loop_opening = d"""
    for i in range{N}:
        loopinit()
    """
    if option == 1:
        step_instructions = d"""
        do_stuff_v1(...)
        """
    elif option == 2:
        step_instructions = d"""
        do_stuff_v2(...)
        """
    return df"""
    {func_def}
        {loop_opening}
            {step_instructions}
    """

By the way, I think there needs to be some sort of a special rule for df-strings to ignore a newline character if it’s preceded by a {...} that evaluates to a string that ends with a newline, so that we don’t end up with blank lines (double newline characters) between {func_def} and {loop_opening}, and between {loop_opening} and {step_instructions} in the example above.

EDIT: One solution without involving a special rule may be to use line continuation instead:

    return df"""
    {func_def}\
        {loop_opening}\
            {step_instructions}\
    """

But this looks a little bit confusing to the reader because when people see a line continuation marker they usually think there isn’t going to be a newline there, when in this case there’s going to be one evaluated from {...}.

No, please not more magic! d-string should be strictly only about de-dedenting. The advantage of a d-string, as opposed to a function, is that dedentation happens as first step, i.e., before {}-substitution and before handling of backslash-escapes.

4 Likes

I thought this also but actually the example of df-string from @blhsing is perfectly consistent with a combination of the d constructor and the f one, it feels pretty natural with a closer look…

Both + and df can coexist in a consistent manner anyway.

After looking at all the examples previously presented in the topic,… I think a d constructor/prefix could automatically remove the first newline, i.e. the one after the opening triple quote, if any.
→ QUESTION : Would this makes things easier for everyone ??

The OP proposes that this always happens: The actual string only starts after the first newline after the opening quotes. While JEP 378 stipulates that there must not be anything (except whitespace characters that are ignored) before the first newline, the OP suggested that there could be optional type hints between the opening quotes and the first newline.

2 Likes

Looking at the results so far, it’s clear that d-strings, which increase the string prefix, are the most desired.

I’d like to take the next (hopefully, final) vote. Do you think language hints are a good idea?

  • It can be used by editors to do syntax highlighting.
  • The syntax rules are a bit complicated.
  • It is possible to use comments instead, but Python does not have /* inline comment */, so you cannot write comments when passing multi-line strings as function arguments.
  • C# uses static analysis to achieve syntax highlighting of function argument strings from the function signature.
  • +1 for language hint
  • -1 for language hint
0 voters