How does scoping for class building vs methods work?

I’m referring to this behavior:

a = 1
class A:
    a = 2
    def m():
        assert a == 1
    m()
    assert a == 2

I’ve been working on this in RustPython, but I can’t for the life of me figure out where in CPython this happens. I think it might be related to the codeobject flag CO_NEWLOCALS, but that might just be for whether PyEval_EvalCodeEx creates a new namespace for the frame based on the one passed to it or just uses the passed one directly.

1 Like

Documentation

This behavior is documented in https://docs.python.org/3.3/reference/executionmodel.html#naming-and-binding

“The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods” + “When a name is used in a code block, it is resolved using the nearest enclosing scope.”

Since the declaration of a does not extend to the method m, the nearest enclosing scope is the global a = 1 which is what ends up getting used.

Implementation

Scoping and all the related issues are handled during compilation within symtable.c

You’ll see for classes, it does not perform newbound |= local step that it would normally do (in the if (ste->ste_type != ClassBlock) block). This causes the bound a = 2 in the class definition to not be visible when analyzing the symbols inside the m() method. And since a isn’t bound, it gets marked as GLOBAL_IMPLICIT.

You can verify this by disassembling the relevant bytecode:

>>> code = compile("""
... a = 'Outer'
... class A:
...     a = 'Inner'
...     def m():
...         print(a)
...     m()
...     print(a)""", '<stdin>', 'exec')
>>> dis.dis(code)
  1           0 LOAD_CONST               0 ('Outer')
              2 STORE_NAME               0 (a)

  2           4 LOAD_BUILD_CLASS
              6 LOAD_CONST               1 (<code object A at ...)
              8 LOAD_CONST               2 ('A')
             10 MAKE_FUNCTION            0
             12 LOAD_CONST               2 ('A')
             14 CALL_FUNCTION            2
             16 STORE_NAME               1 (A)
             18 LOAD_CONST               3 (None)
             20 RETURN_VALUE

Disassembly of <code object A at 0x015AF568, file "<stdin>", line 2>:
  2           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 ('A')
              6 STORE_NAME               2 (__qualname__)

  3           8 LOAD_CONST               1 ('Inner')
             10 STORE_NAME               3 (a)

  4          12 LOAD_CONST               2 (<code object m ...)
             14 LOAD_CONST               3 ('A.m')
             16 MAKE_FUNCTION            0
             18 STORE_NAME               4 (m)

  6          20 LOAD_NAME                4 (m)
             22 CALL_FUNCTION            0
             24 POP_TOP

  7          26 LOAD_NAME                5 (print)
             28 LOAD_NAME                3 (a)
             30 CALL_FUNCTION            1
             32 POP_TOP
             34 LOAD_CONST               4 (None)
             36 RETURN_VALUE

Disassembly of <code object m at 0x015AF3C8, file "<stdin>", line 4>:
  5           0 LOAD_GLOBAL              0 (print)
              2 LOAD_GLOBAL              1 (a)
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

Notice how the disassembly for A shows a LOAD_NAME call to retrieve a whereas within the function m, it uses a LOAD_GLOBAL to access `a.

4 Likes

The behavior is documented under Naming and binding:

The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods

Scoping rules are handled by the parser (the first stage of compilation), not the runtime, so there are no code object flags for it. Specifically, this is a function of the symbol table, which is where the compiler would look for names in nested scopes if the name is not available in the current function.

You can access Python’s symbol tables with the symtable module:

>>> from symtable import symtable
>>> tab = symtable("""\
... a = 1
... class A:
...     a = 2
...     def m():
...         assert a == 1
...     m()
...     assert a == 2
... """, "<stdin>", "exec")
>>> tab
<SymbolTable for top in <stdin>>
>>> tab.get_type()
'module'
>>> tab.lookup("a")
<symbol 'a'>
>>> tab.lookup("A").is_namespace()  # class A has a namespace
True
>>> class_tab = tab.lookup("A").get_namespace()
>>> class_tab.get_type()
'class'
>>> A_tab.lookup("m").is_namespace()  # so does the A.m function
True
>>> A_m_tab = A_tab.lookup("m").get_namespace()
>>> A_m_tab.get_type()
'function'

You can use the SymbolTable.lookup() method to find where names come from when looked up in different scopes. This returns a Symbol instance which you can use to figure out where a name came from, or to find nested scopes (with the Symbol.get_namespace() method I used above).

For closures to be created, you need a reference to the variable in a child scope (so non-local names that are sourced from a parent scope other than the module-level (global) scope). Such references will be marked as free, as this needs to be recorded in the code object for that scope. You’ll note that the name a in the method body is not marked as free, even though it is not a local variable:

>>> A_m_tab.lookup("a").is_free()
False
>>> A_m_tab.lookup("a").is_local()
False

If it is not a free variable and it is not a local, it must be a global; the A.a name is not available to the nested scope of the m function. And indeed, "a" in that scope is a global instead:

>>> A_m_tab.lookup("a").is_global()
True

In fact, the namespace of m is marked as not nested; free variables can only exist in nested scopes:

>>> A_m_tab.is_nested()
False

The symtable module simply uses the same PySymtable_BuildObject() function that the Python compiler uses to give you access to this information. The normal compilation steps are:

  • parse source code into an AST (using PyParser_ASTFromFile() or PyParser_ASTFromString()), call PyAST_CompileObject() with the result.
  • PyAST_CompileObject() uses PySymtable_BuildObject() to produce a symbol table from the AST.
  • PyAST_CompileObject() then use the AST and symbol table to produce bytecode, grouped into code objects per namespace, by calling the compile_mod() function.

It is PySymtable_BuildObject() that determines scopes; it walks the AST that the parsing stage has produced and calls various functions in a visitor pattern to output the symbol table. For classes, the symtable_visit_stmt() function calls symtable_enter_block() with the _Py_block_ty block parameter set to ClassType, which informs how it records names.

If you want to track how the symbol table treats the class scope, you could start by searching through symtable.c for ste->ste_type == ClassBlock tests, and focus on the analyze_block() function. The symtable.c source code has helpful comments such as:

/* Analyze raw symbol information to determine scope of each name.

   The next several functions are helpers for symtable_analyze(),
   which determines whether a name is local, global, or free.  In addition,
   it determines which local variables are cell variables; they provide
   bindings that are used for free variables in enclosed blocks.

   There are also two kinds of global variables, implicit and explicit.  An
   explicit global is declared with the global statement.  An implicit
   global is a free variable for which the compiler has found no binding
   in an enclosing function scope.  The implicit global is either a global
   or a builtin.  Python's module and class blocks use the xxx_NAME opcodes
   to handle these names to implement slightly odd semantics.  In such a
   block, the name is treated as global until it is assigned to; then it
   is treated as a local.

   The symbol table requires two passes to determine the scope of each name.
   The first pass collects raw facts from the AST via the symtable_visit_*
   functions: the name is a parameter here, the name is used but not defined
   here, etc.  The second pass analyzes these facts during a pass over the
   PySTEntryObjects created during pass 1.

   When a function is entered during the second pass, the parent passes
   the set of all name bindings visible to its children.  These bindings
   are used to determine if non-local variables are free or implicit globals.
   Names which are explicitly declared nonlocal must exist in this set of
   visible names - if they do not, a syntax error is raised. After doing
   the local analysis, it analyzes each of its child blocks using an
   updated set of name bindings.

   The children update the free variable set.  If a local variable is added to
   the free variable set by the child, the variable is marked as a cell.  The
   function object being defined must provide runtime storage for the variable
   that may outlive the function's frame.  Cell variables are removed from the
   free set before the analyze function returns to its parent.

   During analysis, the names are:
      symbols: dict mapping from symbol names to flag values (including offset scope values)
      scopes: dict mapping from symbol names to scope values (no offset)
      local: set of all symbol names local to the current scope
      bound: set of all symbol names local to a containing function scope
      free: set of all symbol names referenced but not bound in child scopes
      global: set of all symbol names explicitly declared as global
*/

where we learn that names that are not bound to in a scope start as implicit globals, and

    /* Allocate new global and bound variable dictionaries.  These
       dictionaries hold the names visible in nested blocks.  For
       ClassBlocks, the bound and global names are initialized
       before analyzing names, because class bindings aren't
       visible in methods.  For other blocks, they are initialized
       after names are analyzed.
     */

and

    /* Class namespace has no effect on names visible in
       nested functions, so populate the global and bound
       sets to be passed to child blocks before analyzing
       this one.
     */

These tell us that any additions to the global and bound sets made in the class body are not passed on to child scopes; they are never used to help determine the scope of names in child scopes.

And finally:

    /* Check if any local variables must be converted to cell variables */
    if (ste->ste_type == FunctionBlock && !analyze_cells(scopes, newfree))
        goto error;
    else if (ste->ste_type == ClassBlock && !drop_class_free(ste, newfree))
        goto error;

So A.a in the class scope is not passed on to recursive analyze_block() calls, and names in class scopes can’t be closure cells. The above decides between function scopes and class scopes, marking applicable free variables from child scopes as closures in the function scopes only. Names in class bodies never can become closures, A.m can’t find A.a because the above analysis excludes locals in class bodies, and the a reference in an expression in A.m remains an implicit global, and so lookups find the global a instead of A.a.

2 Likes

On a side note, while on the subject: you can reference a global to assign a class attribute with the same name, but you can’t do the same with a closure:

>>> a = 42  # global name a
>>> class A:
...     # A.a is set from the global name a
...     a = a
...
>>> A.a
42
>>> del a  # no more global a
>>> def nested_A():
...     # local name a, could be a closure for nested functions and classes
...     a = 42
...     class A:
...         # you'd expect the nested_A local name a to be referenced
...         a = a
...     return A
...
>>> nested_A()  # but no, a global is expected instead
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in nested_A
  File "<stdin>", line 4, in A
NameError: name 'a' is not defined

The disassembly of the bytecode shows that the symboltable has recorded a as a global, as the compiler has emitted a LOAD_NAME bytecode to load the value:

>>> from dis import dis
>>> nested_A.__code__.co_consts  # find the class code object
(None, 42, <code object A at 0x10d85b660, file "<stdin>", line 3>, 'A')
>>> dis(nested_A.__code__.co_consts[2])  # a uses LOAD_NAME, looking for globals!
  3           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 ('nested_A.<locals>.A')
              6 STORE_NAME               2 (__qualname__)

  4           8 LOAD_NAME                3 (a)
             10 STORE_NAME               3 (a)
             12 LOAD_CONST               1 (None)
             14 RETURN_VALUE

Since there is no such global, the function call fails with a NameError exception. Setting a new global a makes the function work:

>>> a = 17  # set a new global
>>> nested_A().a  # and things work again
17

The moment you use a different name for the class attribute, closures work again:

>>> def nested_A():  # new attempt, not masking the name
...     a = 42  # still a local named a
...     class A:
...         # but we set a new name, b
...         b = a
...     return A
...
>>> nested_A().b  # this works
42
>>> dis(nested_A.__code__.co_consts[2])  # Now LOAD_CLASSDEREF is used!
  3           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 ('nested_A.<locals>.A')
              6 STORE_NAME               2 (__qualname__)

  4           8 LOAD_CLASSDEREF          0 (a)
             10 STORE_NAME               3 (b)
             12 LOAD_CONST               1 (None)
             14 RETURN_VALUE

This is a long-standing “wart” in Python (17 years and counting), too gnarly to consider fixing, because doing so would break a lot of existing code.

Arguably, you’d expect a = a to not work in either situation as you are effectively assigning a local variable from a global.

1 Like