PEP 750: Tag Strings For Writing Domain-Specific Languages

pablogsal · August 10, 2024, 5:52pm

Unfortunately, now it’s not that easy because the tag: the lexer uses it to enter f-string mode (now tag string mode) and its included on the FSTRING_START token. Using dotted_name won’t do because the lexer doesn’t know what dotted_name is; therefore, it cannot know if it needs to enter tag string mode or normal string mode. The parser cannot drive the lexer so the lexer must do the lexing on its own without any grammatical information (the same way the parser can be directly driven by a bunch of tokens alone without the lexer). Anything that couples both pieces it will be a nightmare.

There are hacky ways around it. I suggested one way to make that work which is that basically when the lexer detects the start of a string it asks “What’s the last token I emitted”. If it is NAME or ) or some of the other ones that currently are illegal it emits tag-string tokens but @lys.nikolaou noticed that the way he implemented this idea was backwards incompatible because that was too big of a change since it broke all tokenization code that tokenizes STRING tokens. Maybe there are better ways around it or there are ways to make this approach not backwards incompatible, but we are are certainly stretching the lexer a bit so maintenance may be a concern.