PEP 701 – Syntactic formalization of f-strings

steven.daprano · December 19, 2022, 11:43pm

You describe f-strings as “literals” in the PEP. Please don’t do that. F-strings can contain executable code and can have side-effects. They are code not string literals.

We don’t talk about “def literals” and “lambda literals”. The documentation doesn’t even refer to list or dict literals (rather, “displays”). We shouldn’t abuse the word literal to describe f-strings either.

The PEP says:

(emphasis added) and then give three examples:

f"These are the things: {", ".join(things)}"

f"{source.removesuffix(".py")}.c: $(srcdir)/{source}"

f"{f"{f"infinite"}"}" + " " + f"{f"nesting!!!"}"

The first two might be perfectly understandable to the parser, but as a human reader, they make it more complicated and error-prone to work out which quotes delimit the f-string and which do not.

Especially when the f-string may be concatenated with other strings, or embedded in larger expressions. Even more especially when used with long line lengths.

Taken in isolation, the human reader can just look at the start of the line and see the f" token, and then skip to the end of the line to find the matching close quote. But in real code, where f-strings are often embedded in complex expressions and long lines, it is not that easy. You have to manually parse the f-string, counting quotes and braces, and work out which ones are paired with which other ones.

You know. The sort of thing computers are excellent at but people suck at.

I consider the first two examples terrible code which should be discouraged and the fact that your PEP allows it is a point against it, not in favour.

Especially since we can get the same effect by just changing one of the pairs of quotes to '. So in this regard, the PEP doesn’t even add functionality. It just encourages people to write code which is harder to read and more error prone.

In Python 1.x and 2.x, we had the backtick for evaluating the repr of objects. It could be nested exactly as you have here:

# Python 2.7

>>> `(`123`, 456)`

"('123', 456)"

One of the reasons we got rid of it was because it was too hard for the human reader to read nested backtick expressions.

Also keep in mind that many editors will have very simple-minded colourizers. Code inside f-strings may not be colourized, or if it is, the colourizer surely won’t include a full blown Python parser. So it will likely colour the first example as a comma-separated pair of strings:

f"These are the things: {" COMMA ".join(things)}"

and the second looks like you are calling a .py method on a string literal, followed by a quote instead of the opening paren. Many editors will colour this as a syntax error, or at least least, incorrectly.

f"{source.removesuffix("  .py  ")}.c: $(srcdir)/{source}"

pablogsal · December 19, 2022, 11:52pm

Thanks a lot for the feedback @steven.daprano and thanks for taking the time to add some context to the explanations!

We will correct this in the document soon.

Steven D'Aprano:

The first two might be perfectly understandable to the parser, but as a human reader, they make it more complicated and error-prone to work out which quotes delimit the f-string and which do not.

Especially when the f-string may be concatenated with other strings, or embedded in larger expressions. Even more especially when used with long line lengths.

Taken in isolation, the human reader can just look at the start of the line and see the f" token, and then skip to the end of the line to find the matching close quote. But in real code, where f-strings are often embedded in complex expressions and long lines, it is not that easy. You have to manually parse the f-string, counting quotes and braces, and work out which ones are paired with which other ones.

You know. The sort of thing computers are excellent at but people suck at.

I consider the first two examples terrible code which should be discouraged and the fact that your PEP allows it is a point against it, not in favour.

Especially since we can get the same effect by just changing one of the pairs of quotes to '. So in this regard, the PEP doesn’t even add functionality. It just encourages people to write code which is harder to read and more error prone.

If I understand correctly your concern is that the nesting of the same kind of quote will make it harder for humans to parse and will lead to code that may be harder to read because the quote character is the same on start and end (as opposed to other delimiters). Would that summarise your concerns correctly?

Yes, this is being discussed currently in this thread and we will update the PEP after discussing about it.

Also notice that as I indicated before any editor that supports other popular languages including ruby and JavaScript will have the same problem as arbitrary nesting with quote reuse is allowed in those languages.

pablogsal · December 19, 2022, 11:54pm

As I assume this is going to be a controversial point of the proposal I am making a poll.

Question: Do you think that f-strings should allow quote reuse within the expression part? (as in f” something { my_dict[“key”] } something else “)

Yes
No

0 voters

Please take the time to read the PEP first and then the discussion so far regarding sone of the arguments in favour and sone of the objections.

Rosuav · December 19, 2022, 11:58pm

Be sure to put in a docs patch then.

A formatted string literal or f-string is a string literal that is prefixed with ‘f’ or ‘F’.
2. Lexical analysis — Python 3.12.1 documentation

Formatted string literals (also called f-strings for short)
7. Input and Output — Python 3.12.1 documentation

The original PEP refers to them as literals too.

In Python source code, an f-string is a literal string, prefixed with ‘f’, which contains expressions inside braces.
https://peps.python.org/pep-0498/

I don’t think it’s worth quibbling over this distinction. We refer to 3+4j as a complex literal, and it’s quite happily parsed by ast.literal_eval - and literal_eval’s docstring says that the expression “may only consist of literal structures”, and then includes list/dict/set in that.

There is a difference between what the tokenizer calls a literal and what a programmer calls a literal. PEP 701 is no further at fault than large slabs of other formal Python documentation.

brettcannon · December 20, 2022, 12:39am

As one of those “editor people”, my answer for VS Code is, “don’t worry about us”. It’s our job to make this work, not for the language to restrict itself just to conform to some technical limitations an editor may have for some grammar. Python did quite well for a long time only worrying if you could type code out in Notepad, so I don’t think we should start changing that policy now.

steven.daprano · December 20, 2022, 1:08am

And for those editor people who don’t have the resources of the world’s largest IT company behind them, paying them to make it work?

tjreedy · December 20, 2022, 1:14am

For recognition of an entire f-string, the issue is reuse of opening quote. IDLE currently properly recognizes the following:

f"""Things: {
a + f"{b + ', '.join(things) # Nested f-string.
}" suffix} ending"""

This current compiles as “SyntaxError: incomplete input”.

gh-73473 requests that IDLE do replacement field syntax highlighting with brace matching and auto-completion. No one has submitted a PR yet. I have no plans to do so. A problem for key-by-key response is the need to both within a string (to recognize the end of the string) and not within a string (to do the other stuff).

Would you like us to reflect this in the PEP?

If quote reuse stays, please do so. The SC should know that approving this as is will break a few things. Perhaps add at least “Lexing all possible f-strings with relational expressions will no longer be possible. In particular, reuse of opening quotes in replacement fields will be a problem.”

I would initially document that IDLE cannot properly handle f-strings that reuse the opening quote in replacement fields, and that highlighting would not be the only thing affected. (To expand the latter, I will have to examine the uses of hyperparser to see the consequence of stopping scanning prematurely. Or clone, build python, and run IDLE in your alternate repository.) I would add that IDLE users should not use code reformatters that change the quotes used for f-strings or any quotes within f-strings.

if this point proves to be too controversial, we are happy to consider dropping it

I’d like that possibility kept alive.

Continuing other comments:

Specification - please note that FSTRING-X tokens are defined below. I went looking for them in the PEP the text directed me to.

Consequences: expression comments require additional lines even to close the replacement field.

Rejected ideas, 1: I think you mean wrap in parentheses, as in the example? Would f’{(a !=b)} be an example for ‘!’?

Rosuav · December 20, 2022, 3:21am

I personally use SciTE, the reference editor for Scintilla which is also used in Notepad++ and others. It has syntax highlighting support for a number of obscure languages. I would say that, once one open source project has good syntax highlighting support for nested f-strings, others will be able to imitate. One of the huge advantages to a programmer’s editor is that it’s incredibly easy to dogfood, so pain points are that much more likely to be dealt with.

(Case in point: I would never have expected my editor to recognize #define directives in C programs and then intelligently parse #if/#else directives to show which branch is active and which isn’t; but it does. And it’s very handy.)

pf_moore · December 20, 2022, 10:28am

I voted yes, as I have definitely hit the “you cannot use the same type of quote in the expression part” issue myself. And I think consistency in allowing whatever can be in an expression is easier to explain and understand.

Having said that, I’d prefer it if the PEP warned agains over-use of this feature, as it has the possibility of being difficult to read. In particular, I’m concerned about the possibility that formatters like black could try to enforce a particular style here, and I don’t think that would be a good idea (there are too many variables involved in deciding what’s “readable” in this context).

mauve · December 20, 2022, 10:48am

Sure, corporate entities can easily rise to the challenge.

I’m more worried about the long tail of less active libraries and apps that will just have slightly broken Python support forever because the amount of refactor needed to lex Python code accurately is prohibitive. It’s not quite Python’s problem but it’s also not the fault of Python users when they write valid code and encounter buggy highlighting.

Let me add some research though:

Pygments (Python library) already supports nested f-strings with the same quote type
highlight.js (which Discourse uses) can already lex nested Python f-strings with the same quote type
prism.js (another top Google hit for Javascript) states that it can’t lex even current f-strings

I didn’t expect a 2/3 pass rate, so that’s quite promising.

Code I tried in case people want to reproduce

I tested Pygments in a notebook with

from IPython.core.display import HTML
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter

code = """
f"a{str(f"foo{int}")}b" + 1
"""
HTML(highlight(code, PythonLexer(), HtmlFormatter()))

I tested highlight.js with

<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.7.0/build/styles/default.min.css">
<script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.7.0/build/highlight.min.js"></script>
<script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.7.0/build/languages/python.min.js"></script>
<script>hljs.highlightAll();</script>
<pre><code class="language-python">f"a{str(f"foo{int}")}b" + 1</code></pre>

And to demonstate Discourse rendering it:

f"a{str(f"foo{int}")}b" + 1

pf_moore · December 20, 2022, 11:01am

I will note that even with Discourse highlighting it (and lol, it looks like it’s not highlighted in quotes) I still find it hard to interpret. I would hope that even if it’s allowed (and I’m in favour of that, if only for the purposes of making the rules more consistent), it will be discouraged, and remain a rare case in real-world code.

steven.daprano · December 20, 2022, 11:10am

Yes, thank you.

This situation would be equivalent to the Python 2 backtick repr syntax, except that was never widely used and I expect that f-strings will be used very frequently

Likewise in mathematics, where we use |x| for absolute value. Its fine when there’s only one pair, but |x|y-2|z| is ambiguous.

It would be worth checking with people in the Ruby and JS communities to see just how well their editors support arbitrary nesting, and if they don’t, how much of a problem it is in practice.

barry · December 20, 2022, 7:35pm

I appreciate the goals of the PEP, and support more consistency in the various parsers, both for CPython and across alternative implementations. I do think it will break editors and other syntax highlighting tools, but hopefully they will eventually catch up.

The biggest problem for me is the reuse of opening quote feature. I get why you want to lift this restriction, and I get that you want to enable arbitrary nesting of quotes, but I think this is helping the machine more than the human. Even the simple example from the PEP is challenging for me to parse.

>>> f"These are the things: {", ".join(things)}"

I read code left to right, so as soon as I hit that second quote, I start scratching my head and asking “what’s going on here?”

Okay, I can eventually work it out, but it won’t remain obvious. It’s worse for some of the more complex examples, and I suspect it’ll be one of those things that trip up every reader of code like this, bringing their comprehension of the intent to a screeching halt. If I had to review such code, I wouldn’t let it in. It’s like everything is now an f-nail, and JWZ’s famous quote about regexps comes to mind.

Are these concerns enough to add artificial constraints on the implementation? Maybe, since the PEP already carves out at least one restriction. But maybe for consistency, the answer should be to let people write terrible, unreadable code!

I voted “no” since for me, the concerns outweigh the benefits in this case. Heavily nested or reused quote characters is a code smell to me.

Rosuav · December 20, 2022, 7:44pm

I agree; it’s a code smell. As such, style guides and code review should frown upon it. But it will make things easier if the language permits it.

brettcannon · December 20, 2022, 8:04pm

I have no idea, hence why I said, “my answer for VS Code”, not “for all editor developers worldwide”.

pablogsal · December 20, 2022, 9:58pm

Thanks a lot, @tjreedy for taking the time to give all this context about IDLE and how this would affect it. Your experience here is very important to us.

I’d like that possibility kept alive.

We will reflect this on the PEP so this is taken very seriously when making decisions but I can promise that we will discuss this issue among the authors to see what we can do after we gather some other feedback from other places.

Specification - please note that FSTRING-X tokens are defined below. I went looking for them in the PEP the text directed me to.

Thanks for the correction, we will fix it!

Consequences: expression comments require additional lines even to close the replacement field.

We will incorporate this to the PEP.

Rejected ideas, 1: I think you mean wrap in parentheses, as in the example? Would f’{(a !=b)} be an example for ‘!’?

I will try to rephase that. Answering your question (if I understand it correctly) f’{(a !=b)} works currently without parens and that will keep working:

>>> a = b = 1
>>> f"{a != b}"
'False'

Thanks a lot for this feedback! We will get to fix it as soon as possible!

pablogsal · December 20, 2022, 10:02pm

Hmm, this is a good point @steven.daprano! Thanks a lot for raising this with us. At the very least we will reflect this on the PEP so it will be taken as an important aspect of the balance.

I am not sure what we finally propose (quote reuse or not) as we want to gather some feedback first, but we will discuss this aspect among authors again. I think this is an important point but we need to think about it carefully because is less “consistent” and it raises the maintenance cost quite a lot.

In any case, thanks for pointing it out and for the examples!

Rosuav · December 20, 2022, 10:03pm

If I’m understanding correctly, the only way for the initial quote to be contained within an f-string is if it is also inside an (unbalanced) open brace, right? So it shouldn’t be ambiguous.

pablogsal · December 20, 2022, 10:05pm

Ignoring the consistency arguments, limiting quote reuse raises quite a lot the complexity. This is because when parsing the expression part now the parser needs to be aware that is parsing an expression inside an f-string with a given quite, and that becomes even more tricky when f-strings are nested with different quotes.

This doesn’t mean that this invalidates the “code smell” argument by any means: I just want to give some context on the maintenance point. I totally understand why you voted “no” (and that’s why I really want feedback from everyone).

I think this is an important point that I guess is not going to have an easy answer. But I promise that we will do our best to reflect all of these considerations in the PEP when a decision is taken and to take all of these into account for our final proposal.

pablogsal · December 20, 2022, 10:07pm

Yeah, is not technically ambiguous in the grammatical sense, but I was interpreting “ambiguous” as “ambiguous for the human eye” as a proxy for “visually confusing” or similar. If I misunderstood that I apologize.