That is checked when the AST node is constructed. This is just because our parser handles this much better this way because we don’t want to make single letters keywords or soft keywords. We will mention this in the PEP.
Relatedly, the tokenizer module in the reference implementation seems to have not caught up with the PEP yet
Yep, we are not planning of doing all the work until the PEP is accepted (in case is accepted) and we know exactly the set of constraints.
There is also the case of
fr
strings, which I’ll note the reference implementation handles just fine. In this case, I assume that ther
/R
character is also part ofFSTRING_START
. What about the reversed; isr
/R
captured in this token forrf
strings? I know the tokens are stated to not be binding definitions, but part of this PEP is increased clarity.
The PEP leaves the specifics of the token contents undefined and up to the implementation. I am not sure what we will decide for CPython if the PEP is accepted at the end but I can tell you that currently, we capture both. For instance, for fr'hello {1+2} bye'
:
[TokenInfo(type=61 (FSTRING_START), string="fr'", start=(1, 0), end=(1, 3), line="fr'hello {1+2} bye'\n"),
TokenInfo(type=62 (FSTRING_MIDDLE), string='hello ', start=(1, 3), end=(1, 9), line="fr'hello {1+2} bye'\n"),
TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="fr'hello {1+2} bye'\n"),
TokenInfo(type=2 (NUMBER), string='1', start=(1, 10), end=(1, 11), line="fr'hello {1+2} bye'\n"),
TokenInfo(type=14 (PLUS), string='+', start=(1, 11), end=(1, 12), line="fr'hello {1+2} bye'\n"),
TokenInfo(type=2 (NUMBER), string='2', start=(1, 12), end=(1, 13), line="fr'hello {1+2} bye'\n"),
TokenInfo(type=26 (RBRACE), string='}', start=(1, 13), end=(1, 14), line="fr'hello {1+2} bye'\n"),
TokenInfo(type=63 (FSTRING_END), string=' bye', start=(1, 14), end=(1, 18), line="fr'hello {1+2} bye'\n"),
TokenInfo(type=4 (NEWLINE), string='', start=(1, 19), end=(1, 19), line="fr'hello {1+2} bye'\n")]
and for "fr'hello {1+2} bye'"
:
[TokenInfo(type=61 (FSTRING_START), string="rf'", start=(1, 0), end=(1, 3), line="rf'hello {1+2} bye'\n"),
TokenInfo(type=62 (FSTRING_MIDDLE), string='hello ', start=(1, 3), end=(1, 9), line="rf'hello {1+2} bye'\n"),
TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="rf'hello {1+2} bye'\n"),
TokenInfo(type=2 (NUMBER), string='1', start=(1, 10), end=(1, 11), line="rf'hello {1+2} bye'\n"),
TokenInfo(type=14 (PLUS), string='+', start=(1, 11), end=(1, 12), line="rf'hello {1+2} bye'\n"),
TokenInfo(type=2 (NUMBER), string='2', start=(1, 12), end=(1, 13), line="rf'hello {1+2} bye'\n"),
TokenInfo(type=26 (RBRACE), string='}', start=(1, 13), end=(1, 14), line="rf'hello {1+2} bye'\n"),
TokenInfo(type=63 (FSTRING_END), string=' bye', start=(1, 14), end=(1, 18), line="rf'hello {1+2} bye'\n"),
TokenInfo(type=4 (NEWLINE), string='', start=(1, 19), end=(1, 19), line="rf'hello {1+2} bye'\n")]