Limiting the depth of ASTs to improve reliability and security

Very deep ASTs can cause two problems:

  1. Stack overflow
  2. DOS attacks through ast.literal_eval() (and maybe other places)

We should limit the depth of ASTs to about 50 (maybe 80) with a couple of exceptions:

  1. The number of elifs in an if statement should be unlimited, by converting the current tree structure into a list.
  2. The depth of try-finally within finally blocks would be limited to 6 (maybe 8), due to exponential blowup.

To me, this post raises lots of questions – I feel that I’m missing some research.
Regarding stack overflow – is sys.getrecursionlimit not enough for AST nodes? Were does the number 50 (or 80) come from?
Should the limit be configurable, e.g. to better support auto-generated code?
Should ast.literal_eval have a separate limit, since that’s used for untrusted input?
Where do finally blocks, in particular, cause exponential blowup?

2 Likes

Here is some background: Ideas for making ast.literal_eval() usable · Issue #83340 · python/cpython · GitHub

3 Likes

sys.getrecursionlimit() is not sufficient. The default depth of 1000 will blow the stack on WASI.

Where does 50 or 80 come from? It is low enough to be safe and high enough not to prevent us compiling any real code.

50 or 80 looks too low to me. Even simple sum a + b + c + d + ... adds a level for every addition. In generated code you can get very long expressions.

Unfortunately there is no way to determine the safe recursion limit, so we just use sys.getrecursionlimit()/3 (the coefficient is arbitrary, and just copied from one site to other). It at least allows the user to control it somewhere. It is okay if the limit depends on platform, on WASI we can use /5 or /10, or set some hard limit if it can be determined on WASI.

Unfortunately there is no way to determine the safe recursion limit

Which is why we shouldn’t be using a counter at all to protect the stack, that’s a separate issue.

In generated code you can get very long expressions.

Do you have examples? I can see code generators generating long sequences of statements, but it not clear to me why they would need to generate especially deep expressions.

2 Likes

I do not have this code now, but over 20 years ago I wrote a program containing expressions containing several tens of additions and substractions (expressions were generated in Maple V and copy-pasted in the Pascal or C++ code). I do not remember whether it was less or more than 100 of them. It was perhaps not the best way to calculate these things, but anyway.

That sounds like a platform-specific issue. WASI’s stack is limited, and the default is 600 rather than 1000.
@tiran & @brettcannon, do you know about this issue?

That’s quite a strong claim to make. I don’t have the time right now to look for counterexamples, so I hope you’re right!
Just to be sure: did you consider code generated with syntactic macros (e.g. Hy)?

Counter-examples, or any actual data, would be great. The numbers 50 and 80 are off the top of my head.

It is not just WASI, it is all too easy to blow the stack on Windows, or using Clang debug builds, …

Yep. Another example: When running on threads created by C++ code explicitly with a small stack as most of our serving stacks at work do.

I’m not saying that limiting the depth of ASTs is the right solution, but being aware that we’ve got a recursive algorithm that has resource needs that platforms cannot guarantee the C stack provides seems important. It suggests we should not use C stack at all for AST walking. That’s probably a large code change.

Alternatively: We could spawn a thread of our own with a C stack that is explicitly large enough for our recursion limit’s needs to run all AST code in. (uh, that is if all WASI platforms even have threads, which I doubt…)

But not outrageously large. The parser itself is a highly recursive function but it is all generated code so we could design a dedicated DSL which we generate instead, and have an interpreter for that DSL that uses a heap-allocated stack of data items, instead of relying on the C stack for recursion.

However that doesn’t solve code that wants to traverse ASTs. There’s always an equivalent non-recursive algorithm using an equivalent stack, but for handwritten traversal code you’re probably right that it would be a large code change. (I haven’t looked for the specific case of literal_eval.)

2 Likes

The code for ast.literal_eval() does recursively walk the AST but does so using pure Python which doesn’t create any issues with the C stack. Pretty much the entire problem for literal_eval happens when the AST is created, not when it is walked. Here’s the part of the code that can easily be made to fail:

    if isinstance(node_or_string, str):
        node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
    if isinstance(node_or_string, Expression):
        node_or_string = node_or_string.body

I think your suggested improvement to the parser should automatically benefit literal_eval() and solve our core issue. Apologies if you already knew this.

I think there was another discussion about this elsewhere (maybe Discord) where it was pointed out that the problem is not the parser itself (which implements stack overflow checks) but the C code that translates the original (C-only) AST to Python objects. The latter C code is a simple recursive traversal whose stack requirements may be different from the parser that built the AST.

Initially a check for the AST depth was only in a single place – in symtable.c. But later we were need to copy it more and more:

  • In the AST optimizer, which is called before creating symtable.
  • In the code that converts an AST from C to Python – mainly for literal_eval().
  • In the AST validator, called after converting AST from Python to C.

It was added incrementally, when we discovered more ways to crash Python by using deep AST. I think that it can now be simplified and the AST depth could only be checked when an internal representation of AST is created – either from a CST, or from the Python representation.

I considered also idea of creating a general tool which recursively walks AST in C without using the C stack and calls some callbacks for every node. It would work for converter to Python, validator, optimizer and symtable creator, but not for the code generator.

Assuming by “code generator” you mean the 3rd bullet (the AST validator), perhaps it would still be useful to have this for the other use cases? For the code generator we could adopt a separate solution using a heap-allocated stack; it would have to be traversing the input. This would nicely catch loops in the input AST (I don’t know what currently catches those, I tried to play around and got “TypeError: required field “lineno” missing from expr”).

No, it is the code in compile.c which generates a bytecode from an AST tree. Currently, this code does not even contain any recursion depth checks, it relies on checks in symtable.c. All other code can be made using a heap-allocated stack; they do not depend on the children visiting order, and the code executed for every node before or after visiting children is not recursive. But the code in compile.c is recursive by its nature, it visits children in specific order, sometimes more than once, and uses intermediate results of visiting children. It could be made using a heap-allocated stack only by rewriting it as a state machine, similar to one in the regular expression engine. It would make the code too much complicated and less maintainable, and perhaps slower.

So it is almost inevitable that the compiler has some limit, unless we rewrite it in other programming language, which supports unlimited call stack. It should not be in the language specification, it is a limitation of the implementation.

2 Likes

This seems to be a problem solvable by tail recursion optimization, if not in the general case at least for simple chains of additions. I know tail recursion elimination has been a touchy subject before, but I’d argue that an expression like this doesn’t “obviously look like” something a developer would want to have a recursive stack trace for, it looks more iterative. The “natural” implementation (for some definition of programmer) seems like it would be reduce-addition-from-left, rather than create-recursive-stack-of-addition-frames-and-resolve-from-inside-to-outside.

That’s a runtime concept, we’re talking about the AST for such expressions. At runtime (as you can verify using the dis module) an expression of the form a + b + c + d + ... only need a stack depth of 2 – it’s basically “load a, load b, add, load c, add, load d, add, …”.

In the AST, however, such expressions lead to a stack of nodes – it’s like an extremely unbalanced binary tree:

>>> import ast
>>> t = ast.parse("a + b + c + d + e", mode="eval")
>>> print(ast.dump(t, indent=4))
Expression(
    body=BinOp(
        left=BinOp(
            left=BinOp(
                left=BinOp(
                    left=Name(id='a', ctx=Load()),
                    op=Add(),
                    right=Name(id='b', ctx=Load())),
                op=Add(),
                right=Name(id='c', ctx=Load())),
            op=Add(),
            right=Name(id='d', ctx=Load())),
        op=Add(),
        right=Name(id='e', ctx=Load())))
>>> 

We could solve this (at least for such simple cases) by changing the grammar to somehow accept a series of alternating operands and operators, storing them in a list (our PEG parser generator is capable of doing that rather easily), and then having the code generation (and other analysis phases, if needed) use a simple stack-based operator precedence “parser” on that list. This would require a refactoring of the symbol table and the code generator, though, and I haven’t thought about how much effort this would be.

And of course this would not provide protection against generated attacks on the AST tree depth like "f("*1000 + "0" + ")"*1000.

2 Likes