PEP 701 – Syntactic formalization of f-strings

pablogsal · December 22, 2022, 1:30am

That is checked when the AST node is constructed. This is just because our parser handles this much better this way because we don’t want to make single letters keywords or soft keywords. We will mention this in the PEP.

Relatedly, the tokenizer module in the reference implementation seems to have not caught up with the PEP yet

Yep, we are not planning of doing all the work until the PEP is accepted (in case is accepted) and we know exactly the set of constraints.

There is also the case of fr strings, which I’ll note the reference implementation handles just fine. In this case, I assume that the r /R character is also part of FSTRING_START . What about the reversed; is r /R captured in this token for rf strings? I know the tokens are stated to not be binding definitions, but part of this PEP is increased clarity.

The PEP leaves the specifics of the token contents undefined and up to the implementation. I am not sure what we will decide for CPython if the PEP is accepted at the end but I can tell you that currently, we capture both. For instance, for fr'hello {1+2} bye':

[TokenInfo(type=61 (FSTRING_START), string="fr'", start=(1, 0), end=(1, 3), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=62 (FSTRING_MIDDLE), string='hello ', start=(1, 3), end=(1, 9), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=2 (NUMBER), string='1', start=(1, 10), end=(1, 11), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=14 (PLUS), string='+', start=(1, 11), end=(1, 12), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=2 (NUMBER), string='2', start=(1, 12), end=(1, 13), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=26 (RBRACE), string='}', start=(1, 13), end=(1, 14), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=63 (FSTRING_END), string=' bye', start=(1, 14), end=(1, 18), line="fr'hello {1+2} bye'\n"),
 TokenInfo(type=4 (NEWLINE), string='', start=(1, 19), end=(1, 19), line="fr'hello {1+2} bye'\n")]

and for "fr'hello {1+2} bye'":

[TokenInfo(type=61 (FSTRING_START), string="rf'", start=(1, 0), end=(1, 3), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=62 (FSTRING_MIDDLE), string='hello ', start=(1, 3), end=(1, 9), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=2 (NUMBER), string='1', start=(1, 10), end=(1, 11), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=14 (PLUS), string='+', start=(1, 11), end=(1, 12), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=2 (NUMBER), string='2', start=(1, 12), end=(1, 13), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=26 (RBRACE), string='}', start=(1, 13), end=(1, 14), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=63 (FSTRING_END), string=' bye', start=(1, 14), end=(1, 18), line="rf'hello {1+2} bye'\n"),
 TokenInfo(type=4 (NEWLINE), string='', start=(1, 19), end=(1, 19), line="rf'hello {1+2} bye'\n")]