Lossless Syntax Trees

(Nicholas Chammas) #1

Continuing the discussion from Switch Python's parsing tech to something more powerful than LL(1):

Just an aside that may be interesting to folks, I read about something called “Lossless Syntax Trees” on the Oil project blog last year:

[A Lossless Syntax Tree is] a representation for source code that can be converted back to the original text . In Oil, I implement it with a combination of:

  1. An ordered ASDL tree structure, and
  2. An array of line spans . Concatenating these spans gives you back the original source file.

It’s not an AST because it preserves details of the syntax that are irrelevant to interpretation and compilation.

I don’t know much about parsing. Is this idea relevant to those who are looking to improve Python’s syntax error messages?

Here are some related issues on bugs.python.org, by the way: