Order of Node traversal for declarations

I want to know when python encounters function or class or variable declaration, in which order it adds the name into the current scope. Does it first adds the name in the scope with some empty value in the scope dictionary and then traverses the body collecting the meta-data and later adding the meta_data corrosponding to the key in the scope dictionary or whether it first traverses the body first and later add the name into the scope with the meta_data collected from body traversal. I tried writing a recursive function and see what it is resolving to so I believe the first process is correct one but still needs to confirm my understanding.

Is this the the explanation you are looking for 4. Execution model — Python 3.11.3 documentation ?

1 Like

Not really @barry-scott . Actually I want to know from the AST perspective, when python encounters func_decl or class_decl nodes, does it first add the name with empty meta_data value in the scope or first traverses the body and then adds the name in the scope.

You understand, I hope, that the AST is consumed by the compiler, not at run-time? In

when python encounters function or class or variable declaration

you sound slightly as if you are thinking of Python executing your program, encountering a definition for the first time, and having to compile it. Function (or class) definition is an executable statement, but at run-time Python is executing code produced by the compiler, not examining the AST.

The AST is the output of the parser, and is a structure built in memory from reading the module source.

The compiler can then pass over this structure as many times and in whatever way it likes, but I believe it makes two passes.

In the first pass, the compiler notices what names are used in each scope, and where they are defined. (Is a name defined there, or in an outer or module-level scope? If defined there, it might later discover that it is referenced from a nested scope. Assignment is definition of a name, but so also is a function definition.) This scope information is collected in a symbol table for each scope.

In the second pass, the compiler generates a code object from each scope. Python is designed so the code can be translated without a lot of type information, but it does need the symbol table from the first pass for that scope to generate the right kind of get or set operation (local, cell, or global).

Now, at run-time, when the interpreter reaches the def, it finds instructions that build a function object and push it to the stack. The code object for the body of that function is just one of the constants embedded in the code object it is currently executing, referenced by instructions. The new function object (a new one every time you execute the def) is pushed onto the stack. Finally, the top of stack is stored in the variable that has the name of the function.

Here is an example of the generated code corresponding to the heading def f1(a, b, u=1)::

  7           2 LOAD_CONST               8 ((1,))
              4 LOAD_CONST               1 (<code object f1 at 0x000002AE9C8997A0, file "...\function_closure.py", line 7>)
              6 MAKE_FUNCTION            1 (defaults)
              8 STORE_NAME               0 (f1)

dis.dis() is a good tool for exploring the code generated from various source fragments.

1 Like

Thanks @jeff5 for the response. Actually I am talking about the compile time stage only. When AST traversal reaches a function declaration, it needs to add the name and meta_data of the function to the current scope. My question is whether it is done before traversing the body of the function or after that. All of this I am asking for compile time name resolution.

Ok, cool. I think the answer may be as simple as observing that function definition is in the end just assignment, so from the perspective of symbol table processing, the name that represents it is handled the same as any other name.

The two phases I am thinking of are identifiable here:

However, the top of the file only gives me two marks out of five :slight_smile: for remembering the phases:

This file and the related symtable.c are the ones to look at for detail. There’s a bit of a guide to the action at:

With a bit of searching, we can find where a function definition is processed for symbols:

You will see that the name is entered into the symbol table for the current scope first. The choice of DEF_LOCAL is provisional, pending discovery of a nested reference. The next bit processes the function declaration. Later, we enter the nested scope, but remember we are only collecting names and their uses in this phase, not generating code.

1 Like

Thanks @jeff5 for such a detailed answer. This is the perfect explanation and answer I was looking for my question. I will look into these links more. Thanks again for helping :slight_smile:

Hey @jeff5 I really want to understand what the second pass of AST at compile time is doing ? is it just compiling the code or doing anything more which alters the scope table. Is it wrong to say that first pass fills up the scope table and second pass compiles the AST into bytecode ?

I think that’s right. It’s a pass over the whole compilation unit (module) so it has processed all the scopes before it begins to generate code from the AST. I reproduced the scopes processing once for a project, so I feel I’m on solid ground there.

The assemble phase is new to me, but it seems to consist of optimisations carried out as each block is completed, and while the code is still in the compiler object. Only at the end of assemble do we see an actual code object.

1 Like

Thanks @jeff5 for the clarification. Now I think I have a decent mental model for name resolution in python.