It’s election year! Here’s a controversial trial balloon.
There are a number of places in Python’s parser that really aren’t LL(1). E.g. I wish we could have a set of rules describing simple statements as follows (simplified):
simple_statement : expression | assignment
assignment: target '=' expression
target: atom trailer*
atom: NAME | literal | '(' expression ')
trailer: '[' expression ']' | '(' expression (',' expression)* ')' | '.' NAME
expression: ...
Unfortunately the FIRST set for target
and expression
overlap and the rule for simple_statement
is invalid – we have to replace it with
simple_statement: [expression '='] expression
and then in a second pass of the parser we analyze the expression to the left of the ‘=’ to ensure that it doesn’t contain illegal things like function calls or operators. (To see for yourself, observe the difference between these:
>>> f() = 1
File "<stdin>", line 1
SyntaxError: can't assign to function call
>>> f/ = 1
File "<stdin>", line 1
f/ = 1
^
SyntaxError: invalid syntax
>>>
There are a number of other examples too (e.g. keyword arguments).
So I propose to look for a better parsing technique (maybe LALR(1) or LR(k)) that can handle such grammargs directly.
Thoughts? Suggestions? Anyone with understanding of current compiler-compiler technology interested in working on this? I think the output of the compiler (an AST) could remain the same, so the API of the ast
should not have to change, but the parser
(already not that loved) module would probably have to be replaced or just killed off.