Compiling/evaling arbitrary AST trees

The top of the AST module doc precautionously says : An abstract syntax tree can be generated by passing ast.PyCF_ONLY_AST as a flag to the compile() built-in function, or using the parse() helper provided in this module. The result will be a tree of objects whose classes all inherit from ast.AST. An abstract syntax tree can be compiled into a Python code object using the built-in compile() function.
So, ast.parse(mode=“eval”) returns an ast.Expression object which can be passed to compile and its return value can itself be passed to eval.

The issue is, creating an Expression node directly using the Expression class and the other AST node classes, doesn’t work when passing that to the compile function. That’s warned about, but passing the result of ast.fix_missing_locations(exp), with exp being the Expression node, doesn’t work either. In either case, compile complains that the expr lacks the field “lineno”.
The problem is, even Expression nodes resulting from ast.parse do not have a publicly accessible attribute called “lineno”, so given that fix_missing_locations works by copying the location information of the base node, if I can’t give a location to that base node it won’t work. increment_lineno and copy_location have similar issues in that they all require a base node with a given location.

How can I work around that ? How can I evaluate an arbitrary tree of AST nodes I created all by myself from the public classes ?

2 Likes

I’ve been playing with some code that translates sort of lisp to ast then generates python, sort of like hylang. I can send you one of my early attempts (~400 lines). It runs with python 10 and with pyparsing and pprint cos I’m lazy. The code is a mess but it does work (mostly) and should be simple enough to understand
John

1 Like

You must be reading my code. :wink:

I noticed this only a few days ago myself, having made a Constant, which is an ast.expr subclass, and found it not to have the attributes claimed in the documentation for lineno, col_offset, end_lineno, and end_col_offset.

I think you only have to make something up at the top level, then call fix_missing_locations:

>>> c = ast.Expr(
...         body=ast.BinOp(
...             left=ast.Name(id='x', ctx=ast.Load()),
...             op=ast.Add(),
...             right=ast.Name(id='y', ctx=ast.Load())))
... 
>>> print(ast.dump(c, indent=4))
Expr()
>>> e = ast.Expression(c.body)
>>> 
>>> compile(e, '', 'eval')
Traceback (most recent call last):
  File "<pyshell#93>", line 1, in <module>
    compile(e, '', 'eval')
TypeError: required field "lineno" missing from expr
>>> e.lineno, e.col_offset = 42, 99
>>> compile(ast.fix_missing_locations(e), '', 'eval')
<code object <module> at 0x0000023F87471FE0, file "", line 1>

I was surprised it didn’t say line 42.

FWIW it seems reasonable to me to have an AST not derived from source text, making these attributes meaningless, and unreasonable of helpers to reject said tree, unless you’re asking for the source, as in get_source_segment.

1 Like

I think it is an issue when it comes to laypersons understanding how AST objects work : if you’re supposed to add the attributes manually and expect it to work (in other words, if the following:

is actually supported behavior), then the same attributes should be readable on valid, parsed AST objects. Otherwise you come to think (as I did) that these informations are managed internally, behind closed doors, and are only writeable by the fix_missing_locations, copy_location and increment_lineno functions.

If that is not actually supported behavior, then there should be a supported way for running arbitrary AST trees.

Actually, I see I messed this up slightly. It is allowed because the nodes are all dictionary objects to which any attribute can be added, but those values have no effect, being unexpected in Expression.

A better answer is:

>>> b = ast.BinOp(
...         left=ast.Name(id='x', ctx=ast.Load()),
...         op=ast.Add(),
...         right=ast.Name(id='y', ctx=ast.Load()))
>>> ast.fix_missing_locations(b)
<ast.BinOp object at 0x000001B31F45B280>
>>> compile(ast.Expression(b), '', 'eval')
<code object <module> at 0x000001B31F431B00, file "", line 1>

You can set:

>>> b.lineno, b.end_lineno, b.col_offset, b.end_col_offset = 42, 45, 8, 99

or something before the call to fix_missing_locations and the line numbers in the code object will reflect that 42. You seem to have to set all of them, if you set any, or defaults kick in that make invalid combinations.

Can anyone tell me what I’m missing ?

>>> a = ast.JoinedStr([ast.FormattedValue(ast.parse("sin(x)"))])
>>> ast.fix_missing_locations(a)
<ast.JoinedStr object at 0x0000024E31DA9570>
>>> eval(compile(ast.Expression(a), '', 'eval'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: required field "lineno" missing from expr
>>> b = ast.BinOp(
... left=ast.Name(id='x', ctx=ast.Load()),
... op=ast.Add(),
... right=ast.Name(id='y', ctx=ast.Load()))
>>> ast.fix_missing_locations(b)
<ast.BinOp object at 0x0000024E31DAB970>
>>> eval(compile(ast.Expression(b), '', 'eval'))
12

I have been staring at this a bit, and don’t have an answer, but will just tell you what I found. I hope I’m not totally barking up the wrong tree here, but I trust I’m asking the right follow-up questions.

First off, the error message coming out of the stdlib is not very helpful and kind of annoying, since it doesn’t give a hint about which expression is missing the “lineno” field. It would really be nice if that was improved somehow. I’d file a bug/enhancement request about that.

I went in and made a local override of the fix_missing_locations function, modifying it:

def fix_missing_locations(node):
    """
    When you compile a node tree with compile(), the compiler expects lineno and
    col_offset attributes for every node that supports them.  This is rather
    tedious to fill in for generated nodes, so this helper adds these attributes
    recursively where not already set, by setting them to the values of the
    parent node.  It works recursively starting at *node*.
    """
    def _fix(node, lineno, col_offset, end_lineno, end_col_offset):
        # if 'lineno' in node._attributes:  << I commented this out
        ## so you get a  hard, probably incorrect addition of lineno
        ## even if lineno is not needed or would not be expected
        ## (I'm not at all familiar with this code, I'm merely tinkering with
        ## it in order to get better debug info)
        if not hasattr(node, 'lineno'):
            node.lineno = lineno
        else:
            lineno = node.lineno
        if 'end_lineno' in node._attributes:
            if getattr(node, 'end_lineno', None) is None:
                node.end_lineno = end_lineno
            else:
                end_lineno = node.end_lineno
        # if 'col_offset' in node._attributes:  # similarly commented out
        if not hasattr(node, 'col_offset'):
            node.col_offset = col_offset
        else:
            col_offset = node.col_offset
        if 'end_col_offset' in node._attributes:
            if getattr(node, 'end_col_offset', None) is None:
                node.end_col_offset = end_col_offset
            else:
                end_col_offset = node.end_col_offset
        for child in iter_child_nodes(node):
            _fix(child, lineno, col_offset, end_lineno, end_col_offset)
    _fix(node, 1, 0, 1, 0)
    return node

Then I called your code, but with the modified fix_missing_locations function.

>>> a = ast.JoinedStr([ast.FormattedValue(ast.parse("sin(x)"))])
>>> a = fix_missing_locations(a)
>>> compile(ast.Expression(a), '', 'eval')

The error that compile then gives is no longer that “lineno” is missing - since I really plugged that in everywhere now – but:

TypeError: expected some sort of expr, but got <ast.Module object at 0x10364f190>

(Aside: Much better error message since it points to the error node.)

So, I wonder if the way you build up the code object in this case

Expression(
  body=JoinedStr(
    values=FormattedValue(
      value=Module(
        body=[
          Expr(
            value=Call(
              func=Name(id='sin', ctx=Load()),
              args=[
                Name(id='x', ctx=Load())],
              keywords=[]))],
        type_ignores=[]))))

is perhaps not a valid, compilable tree?
(Looking at the grammar, I’d think it’s not, since FormattedValue can not be applied to a Module,
it needs an ast.expr - See: ast — Abstract Syntax Trees — Python 3.12.0 documentation)

What is the Python equivalent of the code you’re trying to build?

I think this part of the doc is relevant (both for the error message and for the actual error):

All possible attributes must be present and have valid values when compiling an AST with compile().

(From: ast — Abstract Syntax Trees — Python 3.12.0 documentation under _fields)
As I understand it, my hack in fix_missing_locations then lets invalid trees also pass through, and compile then errors on it.
If so, the original error message (is correct but) only indicates very implicitly “… as through a glass darkly” why the AST tree is ungrammatical/uncompilable :confused:

So, if I see this correctly, then FormattedValue needs an ast.expr, but it gets a Module (if parse mode is the default, or an Expression, if parse mode is ‘eval’ - neither of these can be used). See:

FormattedValue.mro()  # or consult the grammar 
Expression.mro()
Module.mro()

So, assuming everything I said so far is correct, then the AST tree that @Gouvernathor built up from scratch is ungrammatical.
If you compare to the parse tree you get out of `ast.parse(“f’{sin(x)}'”), you will also see that that is different from the tree built up above.

In general then, for valid trees, after applying ast.fix_missing_locations, compile should work (also assuming the right mode is used), and you don’t need any work-arounds. The compile function itself is the best check to see if a tree you built up is OK. This still leads back to the error message (about lineno) which can really be misleading in this case - since the lineno’s are not the real issue.

I hope someone will correct me if I’m wrong here.

I think you’re right on both point : the tree I was making is incorrect tree-wise, and the error message was unhelpful at best, inaccurate at worst.

I’m still working on it.