PEP 701 – Syntactic formalization of f-strings

pablogsal · December 20, 2022, 10:08pm

Thanks, @pf_moore for the suggestion. We will try to incorporate something along these lines if we decided to finally propose quote reuse.

Rosuav · December 20, 2022, 10:08pm

Yep, agreed on “confusing” (and that’s why I would try to avoid it where possible), but wanted to clarify whether it was truly ambiguous.

pf_moore · December 20, 2022, 10:45pm

Is the possibility of code generators worth mentioning here? I imagine it would be more complex to write a code generator that needed to track what types of quote have been used when generating f-strings.

Having said that, I can’t imagine why a code generator would be making heavy use of f-strings in the first place, as if you’re already generating the code, why not generate the expression that the f-string is equivalent to? But I’ve never written a code generator, so my intuitions on what’s likely to be common are suspect, at best

robbo · December 21, 2022, 1:47am

I voted Yes because I think that any (unnecessary) restriction will sooner or later bite someone. But I agree that unnecessary nesting of the same kind of quote is poor style and should be discouraged. [Rob Cliffe]

malemburg · December 21, 2022, 10:15am

IMO f-strings should continue to look like regular Python strings to a Python programmer and not allow non-intuitive quoting just because the parser can handle this (since it knows that it’s actually parsing outside the string definition context).

It would be really strange to have the parser complain about:

s = "The Zen of Python emphasizes on "Simple is better than complex." Let's keep it that way."

while accepting:

from formatting import bold
s = f"The Zen of Python emphasizes on {bold("Simple is better than complex.")} Let's keep it that way."

Otherwise, a programmer can easily lose track of where the f-string starts and ends.

I also don’t think that nesting f-strings should be allowed, but I guess that ship has already sailed.

Personally, I find the tendency of Python getting more and more complex in many niche areas unfortunate. Often these additions are not done willingly, but get added by accident (e.g. in the case of nested f-strings), but, of course, people find these “features” and start using them. And then we’re stuck with those “features” because of backwards compatibility concerns.

sinoroc · December 21, 2022, 10:56am

It feels to me like it is okay if the parser allows quote reuse, as long as we have the documentation, formatters, and linters to warn us against it.

monk-time · December 21, 2022, 11:39am

Reading this discussion, I find it difficult to understand why prior art (this feature as implemented in other languages) is given so little weight. Shouldn’t we strive for general compatibility so as to make it easier for users to learn and use the language? Instead of introducing another stumbling block where a programmer must go “Oh, this language has this common feature, so I assume it works the same as in all other languages… hey, what’s this error?”

It also feels to me (and forgive me if I’m wrong) that people reacting to unreadability of f"test {"-".join(words)} test" do so without taking syntax highlighting into account. In pure black - I agree, it’s unpleasant to parse, but with color it’s really no problem to see where pure string ends and embedded code begins.

h-vetinari · December 21, 2022, 12:20pm

Why? There’s a clear marker in the second one that says “stuff between {} is treated differently” (discourse even does the highlighting for that). f-strings have been around for almost 6 years now, and that {} does un-string-like things is the whole point, which people seem to have gotten with relative ease.

Several of the examples against quote reuse have minimal code being executed to interpolate the string. Once that part becomes longer, like

f"Please, have some {assorted_snacks["for_guests"].serve()}, they're delicious!"

it becomes much less ambiguous (YMMV) which parts are string and what parts are string interpolation.

For example, not being able to simply use a some_dict["key"] within the interpolation part, in a way that’s consistent way with the (usually black-enforced) style of double-quotes in the rest of the code base is really annoying, and a constant source of stylistic and mental friction.

My POV is that it’s completely fine if someone doesn’t like nested quotes of the same kind. But like simplicity, another oft-repeated koan is “we’re all consenting adults here”, and – as the poll shows – there’s a sizeable portion of people who want the quote reuse to be possible. It would be easier to argue the case against enabling that if the costs were borne by the implementation, but even that’s not the case - the implementation would get simpler^[1], the language more regular (for those who care about it), and people could choose whether to embrace or forbid its use in their codebase.

At least, I think that personal style choices are not a sufficiently strong argument for forbidding quote reuse.

speaking about CPython here, not other software like editors ↩︎

malemburg · December 21, 2022, 1:04pm

I’m pretty sure that black can be fixed to allow single quotes in {} parts of f-strings to continue to allow use of string literals - perhaps even enforce this

This code doesn’t look much different, but it clearly matches expectations when writing Python string literals:

f"Please, have some {assorted_snacks['for_guests'].serve()}, they're delicious!"

I frankly don’t understand, why people would want to disrupt Python’s quote handling on purpose. If a programming language requires code highlighting to be fully understood by humans, then something is wrong, and Python has a long history of having a nice human friendly syntax.

For more complex templating logic, it’s much better to go with e.g. Jinja2. f-string are simply a nice compact way of getting access to variables. The fact that you can use arbitrary expressions is one of the “features” I talked about in my previous post. This is mostly due to keeping the implementation simple and avoid limiting ways of accessing variables, but not really intended for arbitrary code execution. Of course, people still use the feature that way and that’s unfortunate, IMO.

ajoino · December 21, 2022, 3:41pm

I don’t think it disrupts the quote handling really. The first time I tried to subscript a key in an f-string I was surprised that I had to change from double to single quote. So in that case the current qoute handling was disruptive. It felt like an arbritrary restriction.
Furthermore, I don’t really see the readability issues that are brought up here, maybe because I started using Python after f-strings were introduced and because I’ve always used syntax highlighting tools.

I agree with the people saying that the reuse of quotes in f-strings should not be syntactically incorrect but instead be discouraged in the docs.

stoneleaf · December 21, 2022, 7:01pm

If all the languages are “compatible”, what makes them different?

The reason I fell in love with Python was its readability:

lack of boilerplate
significant whitespace
use of keywords

As for prior art, that goes both ways: Python finally got an assignment expression, and it does not look like C’s – why? Because C’s looked the same as equality testing and was therefore easy to use incorrectly.

monk-time · December 21, 2022, 7:32pm

I don’t think you’re asking this question in good faith. Languages differ by a multitude of things, such as supported paradigms and features, how they jell with each other, which aspect gets the most attention and ease of use and so on. Gotchas in basic features, however, are not one of them. I don’t think anyone would disagree that mutable defaults gotcha is a regrettable part of Python that might be too late to fix. Banning re-use of quotes is just that, a gotcha.

This comparison is not apt here, C predates Python by 20 years, plus 30 more until this feature got introduced in Python. That’s long enough to call it “a reimagining”, not simply a reimplementation. Whereas four modern languages that the PEP is referring to have implemented string interpolation within a few years from Python at most. JavaScript got it the same year as Python IIRC, 2016. Swift was first released in 2014.

achhina · December 21, 2022, 8:10pm

As for reusing the opening quotes, I wouldn’t recommend it and would likely try to have a formatter rule to avoid even having such discussions in reviews. However, I personally do not have an issue with parsing this example f"These are the things: {", ".join(things)}", and even less so with syntax highlighting.

f"These are the things: {", ".join(things)}"

It does make me wonder how much of this is an issue with familiarity; has this been a point of contention or a pain point in other communities that allow quote reusing, like the aforementioned Ruby/JS communities?

In terms of the benefits, in addition to reducing the complexity of the implementation, I think this increased freedom would be helpful for ad hoc testing/experimenting. On numerous occasions, I have gotten SyntaxError when absent-mindedly reusing opening quotes in a REPL session. While not particularly inconvenient, this PEP would have been nice to have in those situations.

Overall, considering all those things I’m +1 for this PEP.

godlygeek · December 21, 2022, 8:24pm

Note that any IDE, code review tool, or text editor that supports syntax highlighting of POSIX sh code already needs to handle nested string delimiters. For example, I’ve got this little shell script that I can use from within WSL to let Windows open a file as though I had double clicked it:

#!/bin/sh
set -eu
filename="$1"
cd "$(dirname "$filename")"
explorer.exe "$(basename "$filename")"

Unlike in Python, you don’t have the choice of using different quotes in shell; those nested quotes are necessary for correctness and cannot be avoided.

Note that this isn’t an argument for making Python’s syntax more sh-like, only an argument that nested quotes are already the norm in a major enough and old enough programming language that nearly every general purpose text editor will have a way to handle highlighting of nested quotes already.

guido · December 21, 2022, 9:28pm

I think the first example from the PEP using ", ".join(things) is not great to explain why we’d want this feature. The second example is better:

f"{source.removesuffix(".py")}.c: $(srcdir)/{source}"

(though maybe it tries a little too hard to be realistic by using $(srcdir) which looks like another form of interpolation but is just a literal string).

I like the proposed feature, because it makes it possible to copy any valid expression from another part of the code into an interpolation without having to worry about string quotes.

holdenweb · December 21, 2022, 9:37pm

It’s a putative language development. Editor vendors are used to having to track those - you might almost say it comes with the territory.

envp · December 21, 2022, 9:45pm

In my lay opinion, once a feature exists there will be people who use it in all manners of clever ways.

I voted no, for the following reason:

Harder for beginners to parse at a glance. This feature absolutely requires some kind of syntax highlighting / code-formatting to understand what is happening in the code. This is a drastic increase in complexity.

guido · December 21, 2022, 10:18pm

That same argument has been used against any new feature. The mere possibility of abuse should not be enough to reject a new feature (TBH, there are better reasons to reject most feature proposals). Only the overwhelming likelihood that most uses would be (“clever”) abuse should count against a feature.

Beginners aren’t expected to know the full grammar of the language, and let’s not forget that most f-strings don’t contain any string literals inside interpolations. I expect that to remain so – most of the time there’s just no reason to put anything beyond a mere variable name or a very simple expression.

It has been explained over and over now that that’s par for the course for editor maintainers.

Very few, of course, but that’s not even close to being a litmus test for a few feature.

A better way to evaluate proposed new features (especially new syntactic features) is if it composes cleanly with other syntax. Sometimes people seem to use the term “referential transparency”; an example is that if we have f(x+1), assuming a is a brand new variable, it should behave the same as a = x+1; f(a). And vice versa.

So if we have

def py2c(source):
    prefix = source.removesuffix(".py")
    return f"{prefix}.c"

It would be expected that if we replace the variable prefix with its definition, the answer should be the same:

def py2c(source):
    return f"{source.removesuffix(".py")}.c"

ucodery · December 21, 2022, 11:34pm

First thanks for all the work so far on this PEP, I’ve been looking forward to this grammar for some time!

The grammar for fstring_replacement_field accepts a NAME after the optional !, but anything other than s r or a produce a SyntaxError which makes it seem that this should be formally specified in the grammar. Or is this somehow up to the different implementations to define?

`FSTRING_START: This token includes f-string character (f/F) and the open quote(s).

There is also the case of fr strings, which I’ll note the reference implementation handles just fine. In this case, I assume that the r/R character is also part of FSTRING_START. What about the reversed; is r/R captured in this token for rf strings? I know the tokens are stated to not be binding definitions, but part of this PEP is increased clarity.

Relatedly, the tokenizer module in the reference implementation seems to have not caught up with the PEP yet. I assume because it is the python and not the private C implementation. I was curious if the tokens like FSTRING_START would be the type, exact_type, or disappear into STRING before it got to the user.

echo 'f"These are the things: {", ".join(things)}"'|./python.exe -m tokenize -e
1,0-1,26:           STRING         'f"These are the things: {"'
1,26-1,27:          COMMA          ','
1,28-1,44:          STRING         '".join(things)}"'
1,44-1,45:          NEWLINE        '\n'
2,0-2,0:            ENDMARKER      ''

Just a quick note on the nested quotes. I believe @pf_moore mentioned code generators which I probably know equally little about, but round-tripping from text → code → text is very hard with nested f-strings as it stands. Printing out strings always uses single quotes, unless there are single quotes, but not double quotes in the value, then it uses double quotes to print. It will never use triple quotes as delimiters.

pablogsal · December 22, 2022, 1:30am

That is checked when the AST node is constructed. This is just because our parser handles this much better this way because we don’t want to make single letters keywords or soft keywords. We will mention this in the PEP.

Relatedly, the tokenizer module in the reference implementation seems to have not caught up with the PEP yet

Yep, we are not planning of doing all the work until the PEP is accepted (in case is accepted) and we know exactly the set of constraints.

There is also the case of fr strings, which I’ll note the reference implementation handles just fine. In this case, I assume that the r /R character is also part of FSTRING_START . What about the reversed; is r /R captured in this token for rf strings? I know the tokens are stated to not be binding definitions, but part of this PEP is increased clarity.

The PEP leaves the specifics of the token contents undefined and up to the implementation. I am not sure what we will decide for CPython if the PEP is accepted at the end but I can tell you that currently, we capture both. For instance, for fr'hello {1+2} bye':

[TokenInfo(type=61 (FSTRING_START), string="fr'", start=(1, 0), end=(1, 3), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=62 (FSTRING_MIDDLE), string='hello ', start=(1, 3), end=(1, 9), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=2 (NUMBER), string='1', start=(1, 10), end=(1, 11), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=14 (PLUS), string='+', start=(1, 11), end=(1, 12), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=2 (NUMBER), string='2', start=(1, 12), end=(1, 13), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=26 (RBRACE), string='}', start=(1, 13), end=(1, 14), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=63 (FSTRING_END), string=' bye', start=(1, 14), end=(1, 18), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=4 (NEWLINE), string='', start=(1, 19), end=(1, 19), line="fr'hello {1+2} bye'\n")]

and for "fr'hello {1+2} bye'":

[TokenInfo(type=61 (FSTRING_START), string="rf'", start=(1, 0), end=(1, 3), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=62 (FSTRING_MIDDLE), string='hello ', start=(1, 3), end=(1, 9), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=2 (NUMBER), string='1', start=(1, 10), end=(1, 11), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=14 (PLUS), string='+', start=(1, 11), end=(1, 12), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=2 (NUMBER), string='2', start=(1, 12), end=(1, 13), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=26 (RBRACE), string='}', start=(1, 13), end=(1, 14), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=63 (FSTRING_END), string=' bye', start=(1, 14), end=(1, 18), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=4 (NEWLINE), string='', start=(1, 19), end=(1, 19), line="rf'hello {1+2} bye'\n")]