Creating and compiling an artificial f-string AST

My input is a string containing an evaluable expression, a potential = sign, a potential !s, !a or !r flag, and a format information (which follows : in f-string syntax).
I want to create an f-string AST node/tree which would compile to the same bytecode that the original f-string would.

The problem is, first, that it’s very hard to understand which AST node type goes where. In particular, AST-ing the string containing the expression and putting it in a FormattedValue node directly, doesn’t work. And also, the format information is itself a JoinedStr node, the same type as the all-encompassing node…
If someone has a clear explanation for how it works :pray:

Second, that the equal sign seems to be swallowed by AST internally (since the AST generated from an f-string containing = will look no different, through any documented attribute access, to the AST of an f-string without the =). It seems that it relies on an undocumented part of the node which saves it, without exposing it or allowing it to be manually emulated (or added or removed from AST nodes created from a parse). Can someone confirm that ?

Lastly, there seemed to be an additional issue with filenames and line numbers, but I think it stems from my first point of having a dysfunctional AST tree which fix_missing_locations then fails to apply to.

The shortest explanation is probably: The AST tree needs to conform to the Python grammar (see Python docs: 10. Full Grammar specification — Python 3.11.5 documentation and ast — Abstract Syntax Trees — Python 3.11.5 documentation and for formatted strings: FormattedValue)

If you don’t have much experience with those kind of grammars, then I would advise to read up a bit on compilers. A pretty nice intro may be the description in the parser-compiler-builder tool SLY (Sly Lex Yacc) — sly 0.0 documentation (You can even use that to write your own Python parser in Python… Or at least write a compiler for part of the language.)

Even if you have experience and know your way around BNF grammars – it’s still pretty hard since the grammar is pretty complex. So, I would start from the other side. Take a valid Python expression, run ast.parse (in different modes) and then run ast.dump, then compare that to the grammar. I think this is the fastest way to get a feel for how those trees are (or need to be) constructed.
(Or perhaps this is already the way you’re exploring this?)

As to the = sign. The AST trees of f{a} and f{a = } look pretty different to me, but I’m not sure if you’re referring to those expressions or to something else.

import ast
def d(s, mode='exec'):
    tree = ast.parse(s, mode=mode)
    print(ast.dump(tree, indent=2))
d('f"{a}"', mode='eval')
Expression(
  body=JoinedStr(
    values=[
      FormattedValue(
        value=Name(id='a', ctx=Load()),
        conversion=-1)]))

d('f"{a = }"', mode='eval')
Expression(
  body=JoinedStr(
    values=[
      Constant(value='a = '),
      FormattedValue(
        value=Name(id='a', ctx=Load()),
        conversion=114)]))  #  !r  repr formatting seems to be default here
                            # it kind of makes sense that this is the default
                            # not sure if this detail is fully documented

So, assuming you want the simulate an expression like f{a = }, this is possible with this kind of AST tree.

>>> from ast import *
>>> expr = Expression(
  body=JoinedStr(
    values=[
      Constant(value='a = '),
      FormattedValue(
        value=Name(id='a', ctx=Load()),
        conversion=114)]))  # but -1 also works here
>>> expr = fix_missing_locations(expr)
>>> code = compile(expr, ' ', mode='eval')
>>> a = 3.14
>>> eval(code)
'a = 3.14'

If you change conversion to 115 (‘!s’ conversion) then you do see expected diffs in the way strings are formatted (after assigning some string to a).
You can use this to show that Python parses f"a = {a!r}" and f"{a = }" in exactly the same way.

Thanks ! Another instance of rubberducking at its finest, I guess :sweat_smile:
And I made my thing work, finally !

The fact that = defaults to repr is documented : both in the main doc and in the whatsnew it links to, which talks about the “representation” of the expression.

1 Like

Indeed, you might think that’s why I called my function ‘d’ with the ‘d’ of ‘dump’, ‘debug’, and :duck: :rofl:
(I actually didn’t know the word “rubberducking” so had to look it up on Wikipedia!)