Painful details of variable scope mixed with classes

I’m revisiting variable scope technicalities in Python for my personal interpreter project. Some time ago, I asked about that and got the tip that CPython has a multi-pass system that figures out variables, scopes, and bindings ahead of generating byte code. I started doing similar, and I was doing great with just functions, but I realized I wasn’t managing classes with it correctly.

It’s pretty natural to delve into some really obscure stuff when you start testing for this stuff, and I can say that I’ve lost myself.

So here’s something that kind of puzzles me:

a = 1

class Foo1:
    a += 1

    def __init__(self):
        global a
        self.local_a = a + 1

    class Foo2:
        a += 100

        def __init__(self):
            global a
            self.local_a = a + 1

f = Foo1()

g = Foo1.Foo2()




It looks like Foo1 and Foo2 on first invocation will get a copy of the global a and do their own thing with their copy. The initializers grab the root level one.

If I qualify the class members with ‘global a’ then I’ll be playing with the global one at the root just fine. Otherwise, nothing happens to it. If I want Foo2 to touch the global ‘a’, I have to make it global in Foo1 as well:

class Foo1:
   global a     # Need this if I want Foo2 to see it.
   class Foo2:
      global a

If I had nested functions, I wouldn’t have to “carry” the global.

I hope somebody can explain how classes muddle with variable scopes. At first glance, it looked like a class declaration creates a jail blocking against upper scope and takes copies of globals inside of itself. Then a class method runs and can break right out of that to reach globals outside of the class anyways.

…work in a language for nearly 15 years and then break your brain on this kind of thing…

There’s a lot to cover here, so I’m going to break it down over multiple posts with simplified examples.

a = 1

class Spam:
   # Could also use a += 1
   a = a + 1

assert a == 1 and Spam.a == 2

In this snippet, the name lookup for a returns the global a but binds a local a.

This works all the way back to Python 1.5 so I guess it is intentional. The PyPy developers seem to believe it is intentional, as they have copied the behaviour.

(Disclaimer: I have only tested it in CPython 1.5 and 3.10, and PyPy 2.7.)

I believe what is happening here is the inside a class, the interpreter is using the full LEGB scoping rule without the function optimization.

That is, any name lookup (outside of a function):

  1. searches the local scope for that name (L);
  2. if not found it searches any enclosing (nonlocal) function scopes (E);
  3. if not found it searches the module level globals (G);
  4. if still not found it searches the builtins (B);
  5. and if still not found it raises NameError.

At the top level of a module, the local scope is the global scope, so the search path is just GB, or LB if you prefer.

Name bindings theoretically apply in the same order, however the local binding always succeeds, so this is effectively just an immediate local binding. (Unless declared global.)

If my model of the interpreter is correct, then this:

a = 1

class Eggs:
    a = a + 1
    a = a + 1

assert a == 1 and Eggs.a == 3

should pass. And sure enough, it does.

It is only functions which are special, and use an abbreviated lookup for locals (L only). So inside a function, you cannot replicate that Spam.a behaviour in CPython. Instead, you get a NameError subclass, UnboundLocalError:

a = 1
def spam():
    a = a + 1


This raises UnboundLocalError: local variable 'a' referenced before assignment.

(Other interpreters are permitted to allow this, and behave like a class. The fast locals trick is documented as an interpreter implementation, not a language feature.)

1 Like

Next we can look at nested classes. When a class is nested inside another class, the surrounding class is not part of the variable search path. The E in LEGB only refers to enclosing functions.

a = 'global'
class Spam:
    a = 'class local'
    class Eggs:
        b = a  # Picks up the *global* a
        a = 'local'  # Now we have a local a
        a = a + '!'  # And this uses the local a

assert a == 'global'
assert Spam.a == 'class local'
assert Spam.Eggs.a == 'local!'
assert Spam.Eggs.b == 'global'

So inside the nested class Eggs, the scopes are L (the Eggs class body), there is no E, G (module globals) and B (builtins).

If you try inserting a nonlocal declaration inside Eggs, it fails because there is no enclosing function.

1 Like

Oh man I didn’t even think about trying the a + 1 statement twice in a row to see what happens. That’s, uh, wow. Huh. I’d write more now but I’m hosting a party tonight and I think I’ll need to sneak in a drink! =D I’ll give you a cheers haha.

I’m still digesting all of it. Somehow I got it in my head that the double a + 1 statements resulted in Eggs.a == 2 instead of 3, which was really hurting my head. You did help me scheme some ways to perform these different operations without it become a huge pile of paranoid-red-yarn-if-else-garbage so thanks. I may end up following up in a week or two of free time hacking if I tripped on something else.

How pedantic are we getting with functions? I had an interesting effect with methods:

a = 100

class Spam:
    a = 101

    def __init__(self, some_num):
        self.a = a + some_num

eggs = Spam(1000)

This will give me 1100. This appears to mean that the initializer is grabbing a as a global as you might expect of a non-function LEGB binding order lookup.

I have to admit I didn’t really understand the fasts optimization. I implemented it in my little project too to make comparisons much easier, but I didn’t really understand the . . . scope of it haw haw haw.

Steven covered this here:

You can access class or object variables through the dot operator (not by their plain name): Spam.a or self.a.

Spam.__init__ is a function. Spam(1000).__init__ is a method. When a method is called, the __func__ attribute of the method in turn gets called with the method’s __self__ attribute inserted as the first positional argument (e.g. self). For example:

>>> Spam.__init__
<function Spam.__init__ at 0x7f4e33f6ad40>
>>> s = Spam(1000)
>>> s.__init__
<bound method Spam.__init__ of <__main__.Spam object at 0x7f4e33f5fd50>>
>>> s.__init__.__func__ is Spam.__init__
>>> s.__init__.__self__ is s

In your example, in the scope of the __init__ function call, variable a is the global variable defined by a = 100. The instance attribute self.a is thus assigned the value of the expression 100 + 1000.

The class itself can be referenced as the instance attribute self.__class__. One can also use the implicit __class__ closure. See creating the class object. For example:

>>> code = compile(r'''
... a = 100
... class Spam:
...     a = 101
...     def __init__(self, some_num):
...         self.a = __class__.a + some_num
... eggs = Spam(1000)
... print(eggs.a)
... ''', '', 'exec')
>>> exec(code)

In Spam.__init__, the variable __class__ is a free variable. It gets assigned the contents of the corresponding closure cell when the function is called:

>>> Spam.__init__.__code__.co_freevars
>>> Spam.__init__.__closure__
(<cell at 0x7f4e3419faf0: type object at 0x555faaea4840>,)
>>> Spam.__init__.__closure__[0].cell_contents is Spam

In theory, the compiler could emit code to implicitly reference __class__.a for variable a, but that’s new behavior. We’d need a keyword to indicate that a is a class variable.