PEP 701 – Syntactic formalization of f-strings

pablogsal · December 19, 2022, 5:53pm

Hi

I am very excited to share with you a PEP that @isidentical, @lys.nikolaou and myself have been working on recently: PEP 701 - PEP 701 – Syntactic formalization of f-strings. We believe this will be a great improvement in both the maintainability of CPython and the usability of f-strings.

We look forward to hear what you think about this and to getting your feedback!

Thanks a lot, everyone for your time!

TLDR

The PEP proposes a formalized grammar for f-strings in Python by adding f-strings directly into the Grammar instead of using a two-pass hand-written parser.
This would lift some existing restrictions for f-strings that (we believe) will improve the user experience with f-strings.
Other benefits include:
- Reduced maintenance costs for f-string parsing code as well us improved usability for users and library developers.
- Better error messages involving f-strings by leveraging the PEG parser machinery.
- The proposed changes would improve overall consistency of the language and provide a way for alternative implementations to accurately implement f-strings.

Link to the PEP

(I am not pasting the PEP contents here to avoid having a diverging version if the PEP is updated)

pablogsal · December 19, 2022, 6:13pm

For further context you can read the coverage of my talk about this project on the 2022 Python Language Summit:

charliermarsh · December 19, 2022, 7:00pm

Thank you for this – I’m a big fan of the formalization proposed herein!

One clarification (and apologies for any ignorance – I’ve read through the PEP and skimmed the reference implementation, but I’m more familiar with the RustPython parser than the CPython parser right now): how, if at all, will this impact the AST representation? And will there be any formalization of the rules by which f-strings are “resolved” to AST nodes?

I’m referring, e.g., to the logic that I think lives in _PyPegen_concatenate_strings, to combine adjacent constant strings in the JoinedStr, along with the rules for resolving implicit string concatenations containing f-strings and “normal” strings. This is another area in which we’ve implemented custom logic for RustPython based on observing CPython’s behavior. I’m assuming that it’s an entirely separate topic and one that will not be impacted by this PEP as written, which is fine, but I wanted to confirm my understanding and flag it as a pain point with similar characteristics to the f-string parsing itself.

pablogsal · December 19, 2022, 7:06pm

Thanks for the feedback @charliermarsh!

Nop, the AST will be the same. In fact, that’s how we are checking the implementation: by comparing ASTs of most PyPi packages

Noted. I promise to discuss it with the co-authors but for this PEP we don’t envision changing any of those semantics, as our main target is maintainability for CPython and user experience for end users (not tool authors ATM).

In any case, we can chat offline about this if you want to elaborate. You can write to pablogsal (at) python (dot) org if you want

mauve · December 19, 2022, 8:05pm

Can the new syntax be correctly syntax-highlighted in IDEs, code review tools, etc.?

pablogsal · December 19, 2022, 8:15pm

There isn’t any reason why this would not be possible. This change is not different from any other syntax change in Python: tools need to update to handle the new syntax and old syntax will work the same way.

IDEs and other review tools need to parse the code and any parser will be able to handle this new syntax because is equivalent to how parsers need to handle nested parenthesis. There is no hard requirement for a PEG parser.

ericvsmith · December 19, 2022, 8:29pm

As you know, I’m in favor of this PEP.

I would like the PEP to note that the existing restrictions on nested quotes (and probably other things) was a deliberate design decision in order to make it easy for editors, syntax highlighters and the like to “support” f-strings. All they really had to do is add “f” to “r”, “b”, “u” and they could at least get past the f-string.

Along the same lines, I think it would also be worth polling some notable editors and see if this change will cause them any headaches, and record that info in the PEP.

Thanks for working on this. I look forward to backslashes in expressions!

pablogsal · December 19, 2022, 8:36pm

Most editors that support other languages need to deal with that anyway. For instance in ruby you can do:

>>> puts "#{ "1 + 2" }"
1 + 2

or even

>>> puts "#{ "#{" #{1 + 2} "}" }"
 3

The change indeed implies that tools and editors need to do some work to parse the new syntax but it shouldn’t be something that either they are not dealing with in other languages or something that is impossible to be implemented.

Along the same lines, I think it would also be worth polling some notable editors and see if this change will cause them any headaches, and record that info in the PEP.

In any case, this is a good idea: we will try to reach out to some IDE authors and other tools to gather some feedback as per your proposal

smontanaro · December 19, 2022, 8:52pm

Not sure where to make comments. Is there something on GitHub? At any rate… Item 3 in the Motivation section states:

Comments are forbidden even in multi-line f-strings:
>>> f'''A complex trick: {
... bag['bag']  # recursive bags!
... }'''
SyntaxError: f-string expression part cannot include '#'

I don’t know if that’s expected to change, but I will point out that while you can insert “comments” in regular strings, they are just part of the string:

>>> '''A complex trick: {
...     bag['bag']  # recursive bags!
... }'''
"A complex trick: {\n    bag['bag']  # recursive bags!\n}"

Do you intend to support actual comments in f-strings or just allow # ... to be embedded within the string? I’d be disappointed to find the semantics between f-strings and regular strings had diverged (though, of course they have already).

pablogsal · December 19, 2022, 9:14pm

Thanks for the feedback @smontanaro ! Unfortunately I am afraid I don’t follow your concern.

I will point out that while you can insert “comments” in regular strings, they are just part of the string:

Here the ‘#’ is not a comment because is a character of the string itself as is on the example that you mentioned:

>>> '''A complex trick: {
...     bag['bag']  # recursive bags!
... }'''
"A complex trick: {\n    bag['bag']  # recursive bags!\n}"

Notice in f-strings, the expression part of the string is not part of the string itself and that part is evaluated. Currently, if you try to use a comment there you get this:

>>> f"""A complex trick: {
... bag['bag']  # recursive bags!
... }"""
  File "<stdin>", line 3
    }"""
        ^
SyntaxError: f-string expression part cannot include '#'

What we are proposing is that this error disappears and anything after the # comment character gets ignored as any comment of any expression (only when the comment is in the expression part).

Notice this does not change semantics between f-strings and strings because you can still put the # character in the string part of the f-string. That is:

>>> f"""A complex trick: # look a comment here {
... 1+1
... } # look, another comment"""
'A complex trick: # look a comment here 2 # look, another comment'

The comments that we want to allow are only on the expression part.

pf_moore · December 19, 2022, 9:15pm

Isn’t the difference here that the # ... is inside a {...} pair? I would certainly expect that outside of {...}, # ... would be treated just as normal text (i.e., as part of the string).

Edit: I see that @pablogsal answered this more fully while I was typing…

pablogsal · December 19, 2022, 9:16pm

Exactly Noting will change on the string parts of f-strings.

tjreedy · December 19, 2022, 9:27pm

Rationale: while allowing backslash escapes is no problem (point 2), as they must already be recognized to detect strings, allowing un-escaped quotes and infinite nesting (points 1 and 3) is, certainly for regex-based syntax highlighting, which IDLE uses. Regexes are known to not handle any nesting easily and indefinite nesting not at all. Detecting indentifiers, strings, and comments in code does not require dealing with nested parentheses at all. IDLE has a ‘hyperparser’, used for various purposes, that detects the same elements, plus line ends, scanning backwards. I suspect that it would also not work for f strings with the changes proposed.

Less important comments:

Motivation point 4: the linked Wikipedia section says nothing about escape sequences.

In the next sentence, does ‘regular grammar’ have the technical meaning of the restricted grammar corresponding to ‘regular expressions’ or the colloquial meaning of ‘normal’ (which is different now then when 536 was written)?

One limitation that this will not and I presume this cannot fix is that set expressions (displays and comprehensions) must still be surrounded by spaces to avoid ‘{{’ and ‘}}’.

pablogsal · December 19, 2022, 9:36pm

allowing un-escaped quotes and infinite nesting (points 1 and 3) is, certainly for regex-based syntax highlighting, which IDLE uses. Regexes are known to not handle any nesting easily and indefinite nesting not at all. Detecting indentifiers, strings, and comments in code does not require dealing with nested parentheses at all. IDLE has a ‘hyperparser’, used for various purposes, that detects the same elements, plus line ends, scanning backwards. I suspect that it would also not work for f strings with the changes proposed.

This is correct, this means that it will not be possible to lex strings anymore using regular expressions. After this PEP (at least in its current form), string quotes in f-strings must be dealt with the same way parentheses-highlighting is dealt with.

@tjreedy Would you like us to reflect this in the PEP?

Notice that forcing the parser to reject reused quotes would complicate the lexer and the parser considerably because it means that it cannot just “parse” the expression part in a regular way because it needs to be aware that is inside an in-flight f-string in order to reject repeated quotes. This was easy to do in a two-pass method because the main parser is unaware of what’s inside the f-string but that won’t be true any more once we move this to the main parser.

In any case, if this point proves to be too controversial, we are happy to consider dropping it

Motivation point 4: the linked Wikipedia section says nothing about escape sequences.

Oh, thanks for pointing that out. Something went wrong there in the writing. We will fix it

[Edit] Corrected it here

In the next sentence, does ‘regular grammar’ have the technical meaning of the restricted grammar corresponding to ‘regular expressions’ or the colloquial meaning of ‘normal’ (which is different now then when 536 was written)?

Here “regular grammar” means adding f-strings to the main (PEG) parser as part of the formal grammar (the one in Grammar/python.gram).

Correct, there is no way to disambiguate that.

pablogsal · December 19, 2022, 9:45pm

Another option here is to restrict arbitrary nesting to a specific depth. I think that will allow regex-based lexers to work at the price of having some complex regular expression.

@tjreedy do you think that would be a good compromise? If not, what can we do to make it easier to support in IDLE?

ofek · December 19, 2022, 9:56pm

Are there any performance implications?

pablogsal · December 19, 2022, 10:02pm

Nothing that we can think of. Parsing may be anecdotally faster due to the lack of a second phase for f-string parsing.

smontanaro · December 19, 2022, 10:31pm

Got it. Thanks.

barry · December 19, 2022, 11:25pm

This might cause problems for Emacs modes for Python.

pablogsal · December 19, 2022, 11:42pm

Surely emacs supports JavaScript and Ruby and both languages have interpolation strings with the same constraints as this PEP so I think that a solution to this problem already exists there, although I am not sure how that works as I am not familiar with emacs. Maybe someone more versed in how this is supported can help us understand how that is being handled.