An example where you disagree with treating the tokenizer mode after :
then same as the mode for f-string or where you disagree with the proposal in general?
Maybe the PEP should have a section that explains in detail what will change?
We can add that, we somehow have something similar now in “consequences of the grammar”. I can adapt it so is more clear.
but e.g. a regular expression showing what the lexer accepts for each token
I don’t think a regular expression is possible (or at least straightforward so it helps for clarifications) because the cut points depend on the level of parenthesis and bracket some of the characters are and some other state the lexer needs to keep track of.
Yes I would like that very much.
Ok, we will incorporate a refined version of that description to the document
It would also be nice if we had an executable version of that specification in the form of a Python class into which one can feed examples.
If you don’t mind that the token contents are asymmetric, you can already play with it by getting the tokens from the C tokenizer using the private interface we use for testing (you need to do this from the implementation branch):
>>> import pprint
>>> import tokenize
>>> pprint.pprint(list(tokenize._generate_tokens_from_c_tokenizer("f' foo { f' {1 + 1:.10f} ' } bar'")))
[TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 0), end=(1, 2), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=62 (FSTRING_MIDDLE), string=' foo ', start=(1, 2), end=(1, 7), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=25 (LBRACE), string='{', start=(1, 7), end=(1, 8), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 9), end=(1, 11), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=62 (FSTRING_MIDDLE), string=' ', start=(1, 11), end=(1, 12), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=25 (LBRACE), string='{', start=(1, 12), end=(1, 13), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=2 (NUMBER), string='1', start=(1, 13), end=(1, 14), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=14 (PLUS), string='+', start=(1, 15), end=(1, 16), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=2 (NUMBER), string='1', start=(1, 17), end=(1, 18), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=11 (COLON), string=':', start=(1, 18), end=(1, 19), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=62 (FSTRING_MIDDLE), string='.10f', start=(1, 19), end=(1, 23), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=26 (RBRACE), string='}', start=(1, 23), end=(1, 24), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=63 (FSTRING_END), string=' ', start=(1, 24), end=(1, 25), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=26 (RBRACE), string='}', start=(1, 27), end=(1, 28), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=63 (FSTRING_END), string=' bar', start=(1, 28), end=(1, 32), line="f' foo { f' {1 + 1:.10f} ' } bar'\n"),
TokenInfo(type=4 (NEWLINE), string='', start=(1, 33), end=(1, 33), line="f' foo { f' {1 + 1:.10f} ' } bar'\n")]
I will discuss with @isidentical and @lys.nikolaou about the changes in tokenize.py
to see if we can add this to the PEP and the proposed implementation.