How are fstrings lexed & parsed in cPython?

devdave · November 10, 2022, 6:08pm

This might be a question for core but figured I would start with help first. I have a semi-working tokenizer and thanks to LibCST a nearly rock solid parser implemented.

In Python/tokenizer.c, fstrings are wrapped up as dumb ‘f"Hello {place}"’ string that are parsed somewhere else.

Somewhere after that middle man parser, fstrings go through the PEG parser with this rule cpython/python.gram at main · python/cpython · GitHub but that rule isn’t prepared to handle ‘f"Hello {place}"’.

Rust in Python steps character by character over a fstring to create a mini-ast which makes me think cPython does the same but I haven’t been able to find where.

For the curious, my code base is at github under user devdave, project name rython4. The antispam filter won’t let me post more than two hrefs to a post so that’s why there isn’t a direct link.

NOTE the Python software foundation license txt file hasn’t been copied into the repository yet. Regardless, because I am almost directly translating from cPython’s code base to rust, it’s a PSF project.

nikdissv-Forever · November 11, 2022, 3:31am

https://docs.python.org/3.6/reference/lexical_analysis.html#formatted-string-literals
maybe this is what you are looking for

devdave · November 11, 2022, 5:15pm

I have the language reference open/pinned on my browser and unfortunately the lexical rules for fstrings is only a little helpful, the valuable discovery would be finding the C code that processes f"string" to something the PEG parser can consume.

Standre · November 11, 2022, 7:21pm

Possibly in string_parser.c

github.com

python/cpython/blob/6abec1caffdba2e282b14fe57c6ce61974de4bbe/Parser/string_parser.c#L295


      
                  }
                  else {
                      *result = decode_unicode_with_escapes(p, s, len, t);
                  }
              }
              return *result == NULL ? -1 : 0;
          }
          
          

          

          
// FSTRING STUFF
          
          
/* Fix locations for the given node and its children.
          
          
   `parent` is the enclosing node.
             `expr_start` is the starting position of the expression (pointing to the open brace).
             `n` is the node which locations are going to be fixed relative to parent.
             `expr_str` is the child node's string representation, including braces.
          */
          static bool
          fstring_find_expr_location(Token *parent, const char* expr_start, char *expr_str, int *p_lines, int *p_cols)

devdave · November 11, 2022, 9:55pm

Only partially joking when I wonder if perhaps I will pretend f-strings don’t exist.

Standre · November 12, 2022, 5:49pm

In May, Pablo Galindo proposed moving f-strings into the grammar, I don’t know what’s the progress on this.

Topic		Replies	Views
Looking for a pointer/overview of how cPython handles fStrings Core Development help	6	618	November 15, 2022
Creating and compiling an artificial f-string AST Python Help	3	500	October 1, 2023
Pre-cooked lexer definitions Python Help	2	205	October 21, 2022
PEP 501: (reopen) General purpose string template literals PEPs	20	2644	April 29, 2023
PEP 675 – Arbitrary Literal String Type Typing pep	1	357	February 14, 2024

How are fstrings lexed & parsed in cPython?

Related Topics