Creating and compiling an artificial f-string AST

Gouvernathor · September 30, 2023, 9:36pm

My input is a string containing an evaluable expression, a potential = sign, a potential !s, !a or !r flag, and a format information (which follows : in f-string syntax).
I want to create an f-string AST node/tree which would compile to the same bytecode that the original f-string would.

The problem is, first, that it’s very hard to understand which AST node type goes where. In particular, AST-ing the string containing the expression and putting it in a FormattedValue node directly, doesn’t work. And also, the format information is itself a JoinedStr node, the same type as the all-encompassing node…
If someone has a clear explanation for how it works

Second, that the equal sign seems to be swallowed by AST internally (since the AST generated from an f-string containing = will look no different, through any documented attribute access, to the AST of an f-string without the =). It seems that it relies on an undocumented part of the node which saves it, without exposing it or allowing it to be manually emulated (or added or removed from AST nodes created from a parse). Can someone confirm that ?

Lastly, there seemed to be an additional issue with filenames and line numbers, but I think it stems from my first point of having a dysfunctional AST tree which fix_missing_locations then fails to apply to.

hansgeunsmeyer · September 30, 2023, 11:18pm

The shortest explanation is probably: The AST tree needs to conform to the Python grammar (see Python docs: 10. Full Grammar specification — Python 3.11.5 documentation and ast — Abstract Syntax Trees — Python 3.11.5 documentation and for formatted strings: FormattedValue)

If you don’t have much experience with those kind of grammars, then I would advise to read up a bit on compilers. A pretty nice intro may be the description in the parser-compiler-builder tool SLY (Sly Lex Yacc) — sly 0.0 documentation (You can even use that to write your own Python parser in Python… Or at least write a compiler for part of the language.)

Even if you have experience and know your way around BNF grammars – it’s still pretty hard since the grammar is pretty complex. So, I would start from the other side. Take a valid Python expression, run ast.parse (in different modes) and then run ast.dump, then compare that to the grammar. I think this is the fastest way to get a feel for how those trees are (or need to be) constructed.
(Or perhaps this is already the way you’re exploring this?)

As to the = sign. The AST trees of f{a} and f{a = } look pretty different to me, but I’m not sure if you’re referring to those expressions or to something else.

import ast
def d(s, mode='exec'):
    tree = ast.parse(s, mode=mode)
    print(ast.dump(tree, indent=2))

d('f"{a}"', mode='eval')
Expression(
  body=JoinedStr(
    values=[
      FormattedValue(
        value=Name(id='a', ctx=Load()),
        conversion=-1)]))

d('f"{a = }"', mode='eval')
Expression(
  body=JoinedStr(
    values=[
      Constant(value='a = '),
      FormattedValue(
        value=Name(id='a', ctx=Load()),
        conversion=114)]))  #  !r  repr formatting seems to be default here
                            # it kind of makes sense that this is the default
                            # not sure if this detail is fully documented

So, assuming you want the simulate an expression like f{a = }, this is possible with this kind of AST tree.

>>> from ast import *
>>> expr = Expression(
  body=JoinedStr(
    values=[
      Constant(value='a = '),
      FormattedValue(
        value=Name(id='a', ctx=Load()),
        conversion=114)]))  # but -1 also works here
>>> expr = fix_missing_locations(expr)
>>> code = compile(expr, ' ', mode='eval')
>>> a = 3.14
>>> eval(code)
'a = 3.14'

If you change conversion to 115 (‘!s’ conversion) then you do see expected diffs in the way strings are formatted (after assigning some string to a).
You can use this to show that Python parses f"a = {a!r}" and f"{a = }" in exactly the same way.

Gouvernathor · October 1, 2023, 1:58am

Thanks ! Another instance of rubberducking at its finest, I guess
And I made my thing work, finally !

The fact that = defaults to repr is documented : both in the main doc and in the whatsnew it links to, which talks about the “representation” of the expression.

hansgeunsmeyer · October 1, 2023, 2:46pm

Indeed, you might think that’s why I called my function ‘d’ with the ‘d’ of ‘dump’, ‘debug’, and
(I actually didn’t know the word “rubberducking” so had to look it up on Wikipedia!)

Topic		Replies	Views
Compiling/evaling arbitrary AST trees Python Help	8	1196	September 27, 2023
Looking for a pointer/overview of how cPython handles fStrings Core Development help	6	628	November 15, 2022
How are fstrings lexed & parsed in cPython? Python Help help	5	416	November 12, 2022
Add tagged template literals Ideas	3	1526	July 26, 2022
A new syntax for getting the ast node of an expression instead of the value Ideas	2	496	September 4, 2022

Creating and compiling an artificial f-string AST

Related Topics