There is a function f and an assignment z=f, why f() and z() behaves differently?

functions = []
for i in range(3):
    def f():
        return f.index
    f.index = i
    functions.append(f)

# functions == [<function f at 0x01C48BB0>, <function f at 0x01CA3340>, <function f at 0x01CA3388>]
# functions[0].__dict__=={'index': 0}, functions[1].__dict__=={'index': 1}, functions[2].__dict__=={'index': 2}

f.index = 8

lst= []
for f in functions:  # at first iter: f==<function f at 0x01C48BB0>, f()==0
    lst.append(f())
print(lst)  # [0, 1, 8]

lst= []
for z in functions:  # at first iter: z==<function f at 0x01C48BB0>, z()==8
    lst.append(z())
print(lst)  # [8, 8, 8]

I expect both print() has the same result, but they are not. It was tested under Windows 10 and Python 3.8.

1 Like

The code for the f function will look up the name f in the global namespace, at the time that it’s called. The lookup is “lazy” - it does not mean to use whatever f means at definition time. It’s the same as if you had return i, for example: all of the functions would return 2 - not the i value from when they were created. And if you changed i later, that would be “seen” by the functions, too.

At this point in the code (after the loop), the global f means the last function object created in the for loop, i.e. functions[2]. Its index is set to 8; the others have their previous settings.

Each time through the loop, for f in functions: rebinds the global name f to be one of the elements from the function. (Remember: this code is outside of any function, and for loops don’t create their own scope in Python.) Therefore, when the function runs, and looks up the name f in the global namespace, it finds itself.

After the loop has finished, f means functions[2] again, because that was the last value that was assigned to it.

This loop doesn’t assign to f; therefore, each time the function is called (via the z name), it looks up the global name f and finds the value that was set from before, in the previous loop (i.e., the last function in the list, no matter which iteration of the loop we’re on). Its .index, of course, is 8, and that gets appended every time.


It’s actually quite tricky to make code inside the function refer to “the current function object, unambiguously”. I’m not sure I could write it off the top of my head. As we’ve seen, it’s certainly not enough to just use the name from the def statement.

For the same reason, if you decorate a recursive function, the recursive calls will call the decorated version. (If you don’t want that, it may be simplest to start the recursion from a helper, and decorate that instead.)

Great interview question. Not to solve necessarily, but to hear someone think about it.

def g(i):
    def f():
        return f.index
    f.index = i
    return f

functions = [g(i) for i in range(3)]

lst= []
for f in functions:
    lst.append(f())
print(lst)  # [0, 1, 2]

lst= []
for z in functions:
    lst.append(z())
print(lst)  # [0, 1, 2]

One has to make variable f inside function f be a cell variable, created afresh for each index of the loop. It will then be in the closure created when f is defined. This required an extra level of nesting be involved somehow, but it took me a while to find a correct expression of the idea.

Right, this is the simple way. I didn’t count it because the name is still actually looked up in an enclosing namespace - it’s just much harder to modify that namespace :wink:

>>> my_f = g(0)
>>> my_f.__closure__[0].cell_contents = 'test'
>>> my_f() # looks up the string instead, and finds its index method
<built-in method index of str object at 0x7fe05e047230>
>>> my_f.index # the function still has the original attribute
0

It also doesn’t solve the decorator problem. Well, I guess it does if you decorate f explicitly after it’s been returned from g.


I remember giving up on the hard version of the problem a while ago, actually. The goal is to end up with the function’s .__code__.co_consts containing a reference back to the function itself (so that a local variable access can find it); but that’s an immutable tuple. Even if you create a new code object and a new function object from there, the constructor for the code object isn’t going to be able to refer to the function object that hasn’t been constructed yet. And this is still assuming that you’ve provided a way for the lookup to be compiled as local; if you just write normal code and want to patch it afterwards, that will require bytecode-level surgery.

I do remember another attempt at a hack to replace the function’s __globals__. But then it still needs to support every other potential global lookup, and account for future changes to actual global variables. And the attribute is understandably readonly, so you still have to make a new function. And collections.ChainMap doesn’t work for the lookup logic (although it seems like it would accept a custom dict subclass…).

True, but you can rely on closures.

>>> from functools import wraps
>>> def selfref(f):
...     @wraps(f)
...     def inner(*a, **kw): return f(f, *a, **kw)
...     return inner
... 
>>> @selfref
... def needs_self_reference(this_function, x, y):
...     print("I am", this_function)
...     return x + y
... 
>>> needs_self_reference(42, 12)
I am <function needs_self_reference at 0x7f59ea154720>
54

This requires that it be the first decorator applied to a function (or more generally, that the self-reference argument not be passed along to some other function), but that’s sufficient for cases where you need recursion.

Thank you Karl for your detailed explanation.

Even after learning python for so many years, I still haven’t gotten used to its late (“lazy” :slight_smile: binding characteristics.

Theoretically yes, if we simply enclose the entirety of the OP’s code in an outer function then f would be looked up in a cell variable via LOAD_DEREF instead of in the global namespace via LOAD_GLOBAL:

def main():
    functions = []
    for i in range(3):
        def f():
            return f.index
        f.index = i
        functions.append(f)

    f.index = 8

    lst= []
    for f in functions:
        print(f.__closure__[0].cell_contents) # should be a reference to itself
        lst.append(f())
    print(lst)

    lst= []
    for z in functions:
        print(z.__closure__[0].cell_contents) # should be a reference to itself
        lst.append(z())
    print(lst)

main()

But to my surprise, the above outputs:

<function main.<locals>.f at 0x000001540CD745E0>
<function main.<locals>.f at 0x000001540CDF5260>
<function main.<locals>.f at 0x000001540CDF5300>
[0, 1, 8]
<function main.<locals>.f at 0x000001540CDF5300>
<function main.<locals>.f at 0x000001540CDF5300>
<function main.<locals>.f at 0x000001540CDF5300>
[8, 8, 8]

Somehow the reference in cell_contents changes dynamically on its own, and I don’t quite understand why.

If I create a copy of f with cell_contents explicitly assigned to be a reference to f then it works as intended:

from types import FunctionType, CellType

def main():
    functions = []
    for i in range(3):
        def f():
            return f.index
        f = FunctionType(
            f.__code__,
            f.__globals__,
            f.__name__,
            f.__defaults__,
            (c := CellType(),)
        )
        c.cell_contents = f
        f.index = i
        functions.append(f)

    f.index = 8

    lst= []
    for f in functions:
        print(f.__closure__[0].cell_contents)
        lst.append(f())
    print(lst)

    lst= []
    for z in functions:
        print(z.__closure__[0].cell_contents)
        lst.append(z())
    print(lst)

main()

which outputs:

<function main.<locals>.f at 0x000001E7C8CF6660>
<function main.<locals>.f at 0x000001E7C8CF6700>
<function main.<locals>.f at 0x000001E7C8CF67A0>
[0, 1, 8]
<function main.<locals>.f at 0x000001E7C8CF6660>
<function main.<locals>.f at 0x000001E7C8CF6700>
<function main.<locals>.f at 0x000001E7C8CF67A0>
[0, 1, 8]

What makes my copy of f with an explicit assignment of itself to the cell content not dynamic then?

CC: @kknechtel, @Rosuav

Yes, that is correct; but that also applies to the function definition. You’ve created a single closure for main() and then assigned to f several times, so this is exactly the same as the situation with globals - each new def statement reassigns the name f. The only way to separate them is to have a separate closure for each one, which is what happens in the decorator (since the closure is just surrounding the one function).

1 Like

It isn’t the same situation with globals because while LOAD_GLOBAL loads a name from the specified name slot in co_names and then looks up the name in the function’s __globals__, LOAD_DEREF loads an object reference directly from the specified slot in the function’s tuple of cells, with no name lookup involved in the process, so I still fail to see how cell_contents changes dynamically in my first code snippet.

It’s not strictly the same as there are two levels of indirection (name to cell slot during compilation, and cell slot to value during execution) rather than one (name to value), but since you’re using the exact same name everywhere, the net effect is the same. You still have only one cell slot for the name “f”, since main() is only ever called once; which means that, for a given invocation of main, all assignments to f (which are the def statements) are overwriting the same cell slot.

1 Like

Yes. But those tuples share the cells:

>>> def main():
...     functions = []
...     for i in range(3):
...         def f():
...             return f.index
...         f.index = i
...         functions.append(f)
...     return functions
... 
>>> fs = main()
>>> fs[0] is fs[1] is fs[2]
False
>>> fs[0].__closure__ is fs[1].__closure__ is fs[2].__closure__
False
>>> fs[0].__closure__[0] is fs[1].__closure__[0] is fs[2].__closure__[0]
True

It’s still the same thing as the “lambda in a loop” problem.

Of course. This way, each time through the loop, the closure tuple (for the copy) is created with a new, separate cell object (explicitly) which is then loaded with the appropriate value.

1 Like

Whoa thanks. Great find. Must be some runtime optimization logics here that makes each new definition of f share cells of the same names.

I wonder if this should be considered an undesired/counter-intuitive behavior and possibly “fixed” by making functions share cells only if they actually share the same object references rather than just names.

Ah, I think I now understand why free variables of the same name have to share the same cells since the name-to-cell mapping is decided at compilation time. Never mind my suggested “fix” above then. Thanks.

No. They need to work this way; the semantics are intentional, and the implementation isn’t made more complex to enable the sharing. The implementation exists to enable the sharing. If each function could rebind the enclosing-scope names without affecting anything else, that would be the same thing as just letting the function have those names and initializing them to alias whatever the enclosing scope names are bound to. There would be no need to create abstractions like the closure tuple or “cell objects” at all.

Instead, the cell object exists specifically so that the inner function “shares” a name with the enclosing function. By design, the enclosing function can see every change made by inner functions, and vice versa. (That’s what a closure is, in the general CS sense.) So it would be completely illogical to expect the inner functions not to see each others’ changes to the same name.

That’s also why if you disassemble something like

>>> def x():
...     a = 1
...     def y():
...         nonlocal a
...         return a

then the code for x has to use STORE_DEREF and not STORE_FAST: i.e., it writes to the cell_contents of the same cell object from which y will LOAD_DEREF. And that’s, presumably, why the opcode name has DEREF in it: at the machine level, there’s an extra level of indirection.

1 Like

Yep! Here’s a simpler example of what is happening here:

def counter():
    n = 0
    def increment():
        nonlocal n
        n += 1
        print("Now up to", n)
    return increment

c = counter()
c()
c()
c()

It’s clear from this that there’s only one n slot (per counter - if you call the outer function more than once, they’re independent). The same would happen if I made it like this:

def counter():
    n = 0
    def one():
        nonlocal n
        n += 1
        return n
    def two():
        nonlocal n
        n += 1
        return n
    return one, two

o, t = counter()
o()
t()

And finally, this one should make it clear what’s happening:

def counter():
    n = 0
    n += 1
    def one():
        return n
    n += 1
    def two():
        return n
    return one, two

o, t = counter()
o()
t()

Even though the inner functions aren’t being called until later, the increments have already happened, and there’s only one variable still. This is what you’re seeing with the different f functions; they’re all getting assigned back to the same slot.

Closures take a lot of getting your head around. It’s often worth working through some examples in painstaking detail, getting yourself thoroughly familiar with exactly what’s happening.

Very true. I wasn’t connecting the dots between the closure concept and its implementation through cells properly. Thanks.

Right. I know how variables are shared in a closure fairly well, but forgot about it when viewing its implementation details through cells. Thanks again.

Eh.

Shared mutable state takes a lot of getting one’s head around. Late binding takes a bit more - shared mutable state is a prerequisite for late binding to make a difference, but after that you mostly just need to know that the late binding is there. (And, famously, one of the biggest gotchas [1] in Python becomes a gotcha exactly because it’s an example of early binding.)

And, sure, Python’s specific implementation of closures is a little bit weird, in order to support the high level of dynamism in the language while avoiding that indirection when it isn’t needed (LOAD_FAST etc.).

But closures in general? Nah, they’re about the simplest possible realization of the underlying idea.

Here’s a far more complex realization:

class n_closure:
    def __init__(self):
        self.n = 0
    def one(self):
        return self.n
    def two(self):
        return self.n

def counter():
    closure = n_closure()
    closure.n += 1
    one = closure.one
    closure.n += 1
    two = closure.two
    return one, two

Closures have a sort of equivalence with classes[2]: the purpose of both is to hold mutable state so that it can be shared between calls to functions (or methods). Because the shared state is, well, shared, it inherently represents an outer, “wider” scope. (The fact that this scope is “enclosing” is just a consequence of lexical scoping; you can just as easily have closures in a language with dynamic scoping, rare as those are.) “The attributes of the n_closure instance” are every bit as much a scope (i.e. a namespace) as an actual closure would be.

But classes also bring with them tons of other quirks. :wink:


  1. See if you can guess what I’m talking about without peeking! (Actually, for you in particular, I think this should be a gimme :wink: ) ↩︎

  2. Well, the concept does. The individual closure objects are more like instances. ↩︎

I know the one referred to, although personally I don’t think it’s as big a gotcha as people think. But maybe it’s Python’s biggest gotchas because Python doesn’t HAVE big gotchas (like, ahem, the ?: operator having its associativity backwards, but not in all versions - not that any mainstream language would ever do THAT…).