I agree. And I don’t see why anyone would want that leading whitespace in a docstring either.
def f():
"""
Do something.
"""
is one of the many common styles for docstrings, and people don’t want it to have leading or trailing whitespace. It seems to me to be an accident of history that it does.
I think if you want leading or trailing newlines, you should add them:
"""
Has an extra leading newline
"""
and similarly:
"""
Has an extra trailing newline
"""
Forcing people to strip or use backslashes is really ugly IMO.
While I generally agree with @methane’s proposal, I think sirosen had a valid point about r-strings (which I was originally not thinking about). While under the proposal a trailing backslash can be used to remove a trailing newiline
s = d"""
one
two
three\
"""
assert s == "one\ntwo\nthree"
one would need other tricks, like [:-1] for r-strings. One could make an argument that automatic dedentation and raw are mutually exclusive, but I am not sure that the folks would be happy with that. (Mutually exclusive string prefixes would not be new: f and t are mutually exclusive, and u cannot be combined with any other prefix.)
t-strings are non-string objects built from a convenient syntax that looks like a string literal. The u prefix does nothing and is legacy. So I don’t think those two are a good point of comparison.
I wouldn’t want this to be mutex with f and r prefixes. I can’t recall ever wanting dedented bytes, but it may be common in some domains. Allowing db makes sense, and I’m not sure I see the value in restricting it.
dt as a prefix might be useful. But nobody mentioned a Template.dedent() method earlier, so I assumed we’re not discussing support for this.
One higher level note.
All of the existing string prefixes do things which cannot be done by postprocessing a string. So they justify themselves in terms of necessity. d-strings justify text. Are there proposed rules which cannot be done as postprocessing?
I recall such things being proposed earlier, but I admit to having gotten confused about what the suggested dedent rules are.
The only thing I’m aware of is some of the weird interactions between f-strings and interpolated multi-line values. For example:
val = "abc\ndef"
final = f"""\
xxx
{val}
yyy
"""
If you interpolate and then post-process, you get
final = """\
xxx
abc
def
yyy
""".dedent()
That’s probably not the intended behaviour - what you’d probably want is to dedent before doing the interpolation. Although I’m not entirely sure that wouldn’t have edge cases as well.
But I’ll be honest, I have no idea how important this is in practice. Most of the cases I’ve wanted indented triple-quoted strings have been fixed strings, not f-strings, and when I’ve wanted f-strings I’m usually only interpolating single-line data. Whenever I want to insert multi-line data into multi-line templates, and I care about the layout of the final string, things are getting complex enough that I’d be reaching for a proper templating system like Jinja. So for me, a .dedent method hits the sweet spot between cost and benefit - d-strings are a lot more cost (both in terms of language complexity and maintainability) for a relatively small benefit.
I for one, have the habit of doing from textwrapp import dedent as D and them `D(“”“\n …\n…”“”) in code. And the only reason I don’t do it in every place I use multiline strings is because it feels too clumsy. (The import and the extra () )
AFAIC, this would be a very much welcome addition.
would become "abcdef\nghi\nlmn\n" under the proposed rules since dedenting would be pre-processing before the handling of backslash-ecapes. This cannot be done with post-processing. Several comparable issues arize in the context of f-strings, as Paul pointed out:
a = "abc\ndef"
b = textwrap.dedent("""
Hallo {a}!
How are you?
"""")
does not dedent anything. t-strings do not create string objects at all, so current techniques are certainly not available.
Nobody proposed such a restriction.
Nobody proposed such a restriction. Quite the opposite, t-strings were a main motivation of the OP.
But please remember, the point I made was that I don’t think these are likely to come up in real world usage. In your own example, you had to use a value of “abc\ndef” to make your point, and that’s not realistic as a name (which is how your f-string uses it).
It would be better than nothing, I guess - but the d prefix as discussed in this thread would be more like a “dream come true” -
I myself had not brought the topic earlier to being afraid of it being too easily dismissed. (Although, I vaguely recall sending some email in support for this on the old mailing lists)
But it is really a thing - even in this small snippet in your post, it would make a difference - it is like,a no-brainer just adding a d, or spend from seconds to minutes on “how much effort and code noise is it worth just to remove this whitespace?”
If translations['desc'] is too long, there might be newlines in that string. Of course, one could add 8 spaces into that string, but what happens if the function _cli_help becomes part of a class? Then, 4 more spacs need to be added to translations['desc'] or another trick has to be found.
More ascpects
If the initial whitespaces are screwed up, the whole output will be. This is not the same for the additional 9 whiespaces one might want to add after a newline in translations['filename'] in order to align a second decription line:
filename The path of the file to be read, or
alternatively a minus - to read from console.
If they don’t match, then this does not screw up the rest, but only have a local effect:
filename The path of the file to be read, or
alternatively a minus - to read from console.
The important point here: The 9 whitespaces do not change if the code is rearranged. It would only change if the overall help string itself is modified. In case of such an UI-change, it would be clear to have a look on the translations as well.
val = "abc\ndef"
final = """\
xxx
{val}
yyy
""".dedent().format(val=val)
to avoid problems with too early formatting.
Somewhat related, tmpl.format() being a shorthand for something like tmpl.format(**(globals() | locals())) would be convenient. It would allow dropping val=val from the above example and in general avoid the need to update the arguments passed to .format() if the formatted string changes.
globals() | locals() is not the same as the usual name binding rules. This was one of the main reasons why f-strings were invented at the first place (PEP 498)
Yes, alternatives are possible and actually helpful for the description of what is going on (ordering of the actions that happen.) Just a small nitpick at your example: str.format is generally quite restrictive concerning allowed expressions and a bit odd in one aspect: In your example, it would need to be translations[desc] without the quotes.
str.format() does not work in a scalable way for reindenting multiline.
Also another problem : You do not know where are the line returns within translation['desc'] before knowing at which indentation level it is inserted.
In the following example, the d constructor can be replaced by .dedent() (or .dedent().strip()) easily, but replacing the reindenting symbol -> at the proper level with an external method would be much more difficult :
You can emulate dedented f-strings using raw templates and a custom dedent function:
def convert(value, conversion):
if conversion == "a":
return ascii(value)
elif conversion == "r":
return repr(value)
elif conversion == "s":
return str(value)
return value
def raw_template_dedent(template):
lines_list = [string.split('\n') for string in template.strings]
non_blank_lines = [l for lines in lines_list for l in lines if (l2 := l.removesuffix("\\")) and not l2.isspace()]
l1 = min(non_blank_lines, default='')
l2 = max(non_blank_lines, default='')
margin = 0
for margin, c in enumerate(l1):
if c != l2[margin] or c not in ' \t':
break
parts = []
for item in template:
if isinstance(item, str):
lines = item.split('\n')
item = '\n'.join([l[margin:] if (l2 := l.removesuffix("\\")) and not l2.isspace() else l.lstrip() for l in lines])
parts.append(item.encode().decode('unicode_escape'))
else:
value = convert(item.value, item.conversion)
value = format(value, item.format_spec)
parts.append(value)
return "".join(parts)
Example:
val = "abc\ndef"
print(raw_template_dedent(rt"""\
xxx
{val}
yyy\
"""))
Ouput:
xxx
abc
def
yyy
And if you alias raw_template_dedent as df, it’s only 4 characters longer. Edit: fixed some edge cases.
While technically true, a function using dt-strings absolutely could implement something useful here. I wouldn’t try to have -> as part of the first proposal, I think adding more syntax constructs is not going to help.