Parse python code

(Jgirardet) #1

I’m looking for an easy way to parse python code.

class A:
    def get():
        for i in "bl":

class B:
    def get():
        for i in "bl":

what I want at the end is to identify each for loop or get metthod.
the only difference between the 2 for loop and 2 get method sis that they own a different class. How could I know it ?


(Martijn Pieters) #2

You can access Python’s own parser output with the ast module; ast.parse() returns a tree of node objects representing different grammar components, and then you could use a NodeVisitor subclass to help track the relationships between elements.

Take a look at the Green Tree Snakes documentation for more information, or look at various examples in various Stack Overflow answers I have written over time.

(Jgirardet) #3

Thank you for your answer.
I think ast isn’t the thing since I need to keep information about my line number, which is not possible with ast.

(Bernat Gabor) #4

Your only option at the moment is lib2to3, but warn you doing transformation is not trivial.

(Mark Dickinson) #5

ast expression and statement nodes do have a line number and column offset; I don’t know if that’s enough for your needs.

(Martijn Pieters) #6

AST nodes record the line numbers and start column of the source element they were generated from. AST is exactly the thing you want.

(Martijn Pieters) #7

lib2to3 is built on top of the AST (with comments retained). If you must have that level of detail retained you would be better off using typed-ast though. Not that you need that to determine what class a method belongs to.

(Jgirardet) #8

Oh indeed I missed the linenumbers.
it’s clearly what I need

(Jgirardet) #9

I forgot to say that I have to use the old python3.3 but inspected code could be python 3.6 or 3.7 so ast will faill with newer syntax (async…)

(Martijn Pieters) #10

That’s quite a limitation. Why is that? Python 3.3 is no longer supported even, the last regular release was over 5 years ago, and the last security fix release dates from September 2017. You’d be better off with a locally compiled Python release.

(Jgirardet) #11

sublime text runs under python3.3. Writing plugin is then limited to python3.3.

(Martijn Pieters) #12

We really should get them to upgrade their embedded Python version.

Other than that you can always use a child process from your plugin. The black code formatter requires Python 3.6 and is run as a long-running server by the sublime plugin.

(Martijn Pieters) #13

I misunderstood how much typed-ast preserves; it records the type-specific information from the PEP-documented comment syntax, so doesn’t preserve all comment content. Sorry about that.

(Batuhan) #14

Doesn’t lib2to3 built top on CST which generated by pgen2?. AST does keep line info and col ofset but doesnt keep ‘unnecasary info’ like whitespace.

(Mark Dickinson) #15

That was my understanding, too. The trees that lib2to3 uses are much closer to the concrete parse trees produced by Python’s parser module than to Python’s AST.

1 Like