Parse python code

Hi,
I’m looking for an easy way to parse python code.

class A:
    def get():
        for i in "bl":
             print(i)

class B:
    def get():
        for i in "bl":
             print(i)

what I want at the end is to identify each for loop or get metthod.
the only difference between the 2 for loop and 2 get method sis that they own a different class. How could I know it ?

thanks

You can access Python’s own parser output with the ast module; ast.parse() returns a tree of node objects representing different grammar components, and then you could use a NodeVisitor subclass to help track the relationships between elements.

Take a look at the Green Tree Snakes documentation for more information, or look at various examples in various Stack Overflow answers I have written over time.

Thank you for your answer.
I think ast isn’t the thing since I need to keep information about my line number, which is not possible with ast.

Your only option at the moment is lib2to3, but warn you doing transformation is not trivial.

ast expression and statement nodes do have a line number and column offset; I don’t know if that’s enough for your needs.

AST nodes record the line numbers and start column of the source element they were generated from. AST is exactly the thing you want.

lib2to3 is built on top of the AST (with comments retained). If you must have that level of detail retained you would be better off using typed-ast though. Not that you need that to determine what class a method belongs to.

Oh indeed I missed the linenumbers.
it’s clearly what I need
thanks

I forgot to say that I have to use the old python3.3 but inspected code could be python 3.6 or 3.7 so ast will faill with newer syntax (async…)

That’s quite a limitation. Why is that? Python 3.3 is no longer supported even, the last regular release was over 5 years ago, and the last security fix release dates from September 2017. You’d be better off with a locally compiled Python release.

sublime text runs under python3.3. Writing plugin is then limited to python3.3.

We really should get them to upgrade their embedded Python version.

Other than that you can always use a child process from your plugin. The black code formatter requires Python 3.6 and is run as a long-running server by the sublime plugin.

I misunderstood how much typed-ast preserves; it records the type-specific information from the PEP-documented comment syntax, so doesn’t preserve all comment content. Sorry about that.

Doesn’t lib2to3 built top on CST which generated by pgen2?. AST does keep line info and col ofset but doesnt keep ‘unnecasary info’ like whitespace.

That was my understanding, too. The trees that lib2to3 uses are much closer to the concrete parse trees produced by Python’s parser module than to Python’s AST.

1 Like

just some feedback to thank for help and for further readers.

Since I wanted parse python code with older python version and get line number, I finally used ast via interpreter :

def get_index_with_interpreter(view, body, encoding):
    """ extract an index for each ast node using the specified interpreter"""

    cmd = """import ast;b={};print([
        getattr(node, "lineno", 0)
        for node in ast.walk(ast.parse(b.decode(encoding="{}")))
        if hasattr(node, "lineno")
    ])""".format(
        body, encoding
    )

    proc = popen([python, "-c", cmd], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = proc.communicate()
    if proc.returncode == 0:
        return json.loads(out.decode())
    else:
        raise FoldingError(err.decode())

maybe not the nicest, I does the job.

1 Like