It’s election year! Here’s a controversial trial balloon.
There are a number of places in Python’s parser that really aren’t LL(1). E.g. I wish we could have a set of rules describing simple statements as follows (simplified):
simple_statement : expression | assignment assignment: target '=' expression target: atom trailer* atom: NAME | literal | '(' expression ') trailer: '[' expression ']' | '(' expression (',' expression)* ')' | '.' NAME expression: ...
Unfortunately the FIRST set for
expression overlap and the rule for
simple_statement is invalid – we have to replace it with
simple_statement: [expression '='] expression
and then in a second pass of the parser we analyze the expression to the left of the ‘=’ to ensure that it doesn’t contain illegal things like function calls or operators. (To see for yourself, observe the difference between these:
>>> f() = 1 File "<stdin>", line 1 SyntaxError: can't assign to function call >>> f/ = 1 File "<stdin>", line 1 f/ = 1 ^ SyntaxError: invalid syntax >>>
There are a number of other examples too (e.g. keyword arguments).
So I propose to look for a better parsing technique (maybe LALR(1) or LR(k)) that can handle such grammargs directly.
Thoughts? Suggestions? Anyone with understanding of current compiler-compiler technology interested in working on this? I think the output of the compiler (an AST) could remain the same, so the API of the
ast should not have to change, but the
parser (already not that loved) module would probably have to be replaced or just killed off.