I would only support language hints if we can come up with a way of avoiding writing our own standardised hint ↔ language mapping. The best [1] I can think of is to consider the names or mimetypes used by pygments.lexers.get_all_lexers() to be the canonical source of truth and then do nothing in Python itself to enforce it.
Right, I overlooked that. So the first newline is not a problem.
I overlooked something else here :
This problem goes further than OP case : OP wants to include ‘external’ code snippets in the user code in a clean way… While this relates to indentation management in ‘metacoding’ (code that generates code) and can be thought as an extension of generalizability and scalability of the OP proposal.
→ But It would not work as above “right now”, because only the first lines of the combined strings are re-indented.
Another point : the following trick to skip the ending newlines has a minor scalability problem :
…if one of the string does not have a final new line, it will mess everything, this seems like a minor problem, but it can degrade scalability for more complex scenarios.
Yet that scalability can be actually managed I think (and without making d-string as a new type of string, I think no one wants it)… We can conceive a new symbol to state a multi-line reindentation construction by introducing a new symbol on f / t-strings, for example -> :
-The reindentation level is automatically inferred from the position of the opening curly brace.
-Additionally, the -> construction can also strip the last empty line, if any.
But one can also think that when the d-strings are used, there are two possible reasons : the OP case (where -> is not required) and the ‘metacoding’ case : the user wants to construct code with multiline reindentation… Thus I think it can be assumed that for every df / dt constructor, the behavior introduced with the -> symbol here is actually desired.
If this is true, then it might be worth questioning the value of this additional ‘magic’ construction behavior for df / dt strings (i.e.: included strings are automatically multi-line reindented and the last line is removed if empty).
I feel that the functionality is far too complex to be built into the core language.
Java, C#, Julia, Swift, and PHP don’t do such reindentation. (ref)
For such use cases you should use a dedicated templating library.
I think, we could align with JEP 378 here and allow any whitespace between the opening quotes and the newline (\n). The whitespace and the first newline are ignored.
Agreed. I think this means that a SyntaxError should be generated if one of the lines has fewer whitespaces than the line with the closing quotes.
Maybe also add a note that there is no dedent following an \n-escape, so
s = d'''
abc\nde\
f
ghi
'''
gives "abc\ndef\nghi\n". Both phenomena follow from a gneral rule: dedent happens before the handling of backslash-escapes.
Isn’t this diverging a little too much from existing multiline string rules?
I’d expect d"""\ to be the way to strip a leading newline. Have I missed some nuances of the proposed dedent rules which make that impossible?
How will users control the presence or absence of a trailing newline?
For me, it’s just a matter of I almost certainly will never want that leading newline so I’d welcome for the syntax in its default/most plain form to do what I almost certainly intend it to do rather than what a strict literal extrapolation of non d-string behaviour says it should do.
I know it’s a bit of an inconsistency but I’d prefer to wave the practicality beats purity card on that one.
So how do we include an initial newline if we want one?
I’m sure that this rule, and the one for trailing newlines:
are workable, but they don’t match the established behaviour for Python’s triple-quoted strings. I understand “practicality beats purity”, but it feels like having two very different sets of rules depending on the string prefix will be both hard to learn, and a bug magnet. We shouldn’t just be grabbing a Java proposal and bolting it onto Python without considering how well it fits with the rest of the language.
While I’m not a fan of the whole d-string idea, if we’re going to have it can we at least keep the leading and trailing newline rules consistent across all forms of triple-quoted strings in Python?
Allowing for text in the first line can lead to fragments of the string that are misaligned:
s = '''abc
def
'''
Here, a and d would appear in the final string "abc\ndef" as the first character in the corresponding line, but would be non-aligned in the source code. So, there are reasons for disallowing text in the initial line. But if text in the initial line is disallowed, then omitting the first newline also makes sense to me.
If you want to start with a newline, just add one:
s = d'''
abc
def
'''
For the end, I think here is no problem in allowing
s = d'''
abc
def'''
Only if the last line (containing def) would need some indentation, one would need another line to specify it.
To me, that simply seems wrong. The least amount of whitespace at the start of a line is none (before the “a”) so I’d expect "abc\n def\n " as the resulting string. Maybe that’s not very useful in practice, but I’d rather have consistent rules that I could reason about than a complex set of “do what I mean” heuristics…
Again, that seems like a weird rule. Exactly one newline, but any amount of non-newline whitespace, will be stripped at the start of the string? Or would any whitespace on that first (blank) line be included in the string? How would it affect the indenting rules (say, if it’s shorter than the common indent on the rest of the lines)?
I don’t really care about the answers to those questions. What matters to me is that it’s not self-evident from the basic rules of how dedenting will work, because it’s a special case.
IMO, the simple solution is to just not allow text on the first line & have the closing quotes be on a line of their own as well. That results in consistent rules and it’s obvious what is being dedent by how much (the amount is defined by the closing quote). Yeah, this is different from other strings, but I don’t think it’s going to be more difficult to get used to than having edge cases that would for sure show up in “how well do you know python” quizzes.
This is basically why I think we shouldn’t use d-strings but instead redesign the triple-quoted string rules and opt-in on a per-module basis with __future__.
I think it would be better to aim for a more satisfactory rule for triple-quoted strings that can be applied at compile time without special-casing d-strings or docstrings, and make a transition plan.
I know I was on the losing side of the __future__ vs d-string poll, but it seems like objecting to differing rules rejects the entire idea of d-strings and puts us back at square one.
That is the proposed rule: The string starts in the line after the line containing the opening quotes. Everything before that is removed. Allowing ignored whitespace between the opening quotes and the first newline would be just be a permissive behavior of the parser. The alternative behavior would be to create a SyntaxError if there is any character between the opening quotes and the first newline.
[Edit: The current rules for docstrings do already special case the first line directly after the opening quotes: They consider the common initial whitespace of every non-empty line after the first line of the docstring. The proposal for d-strings would use less special casing.]
That would be no special case in the rules, only a consequence of the rules.
Consistency also has a practical benefit, which is that I don’t need to memorize a new and different rule for the leading newline. I’m not trying to create a meta-debate here, but I don’t think this is a strictly “practical” trade-off.
I agree with you that I’d almost never want the leading newline. But I’d like to keep the same syntax which I already use for strings fed to textwrap.dedent to express it.
Similarly, as regards trailing newlines, adding a backslash escape in that context seems odd because it will mess with the typical rules for raw strings.
I’d like proponents of d-strings to think about what their first change in a shared codebase to use d-strings will look like to their reviewers, who never saw this discussion and expect multiline f-strings, raw strings, and unadorned Unicode strings to all act similarly.
Will your reviewer be able to read the change and reason correctly about the whitespace after being told “d-strings dedent”? If you make a mistake, will your reviewer be able to catch it?
I see each of these little inconsistencies as minor, but the less it acts like the rest of the language, the less we’ll be able to reason about it consistently and accurately as a whole community.
I think that this will break too much code when the new rules become the default eventually, so it will never happen. And having a __future__ import, where it is clear from the beginning that it never becomes the default, would be odd as well.
Transition plans would need to consider many cases. I have some code that generates HTML. If the grammar was changed such that any of that code was rejected by CPython, I would be not that happy. If you just mess with the amount of whitespace in the output, I would not care to much. But I am sure that there is also code around where messing with the whitespace was not acceptable.
Honestly, it took me a while to stop being surprised that:
foo = """
long
string
"""
has a leading newline. It makes sense when you think about at an acknowledge that ok yes, there is indeed a line break after the opening quotes but it’s never what you want so I wouldn’t expect it.
And FWIW, I didn’t see anyone tying themselves in knots when Java’s multi-line strings started appearing in code reviews.
In case it helps to have a scenario in mind, the presentation layer for a cli application needs to be fairly precise about whitespace.
The introduction of multi-line strings was completely new to Java, so we should expect them to have an easier time adapting.
It’s a case of first-mover disadvantage. If we were designing Python multiline strings with the advantage of our current knowledge and experience, I’d be advocating for stripping the leading newline. I agree that the first time I saw the “extra” newline I was surprised. But I’d rather keep the current rule than add another variation on it.
Docstring processing is one existing variation already. If we want to consider that precedent, then I’d like the new rules to at least match docstring handling. My preference is otherwise, but my preference is also for str.dedent() over a string prefix, so I may just be a loud minority voice here.
If it gets the proposal over the line then I guess I don’t mind if the leading newline is preserved although I do think we’ll regret it after a few years of getting used to d-strings and observing that every one of them has to come with a boilerplate """.lstrip("\n") suffix.