Make lambdas proper closures

maksym · September 10, 2021, 9:54am

Does nobody find this behavior bizarre? Is it actually useful for anything and does anybody rely on it?

>>> larr = [lambda: i for i in range(10)]
>>> iarr = [l() for l in larr]
>>> iarr
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

TeamSpen210 · September 10, 2021, 9:48pm

Lambdas do have closures, that’s not really the problem here. This happens because Python doesn’t have block scope, the for loop simply reassigns the same variable each iteration. It is unfortunate it doesn’t really do what you want here, but it’s not exactly obvious how you’d change the language to fix it.

aroberge · September 10, 2021, 10:56pm

Yes, it is definitely unexpected at first glance. You can use the following instead:

>>> larr = [lambda i=i: i for i in range(10)]
>>> iarr = [l() for l in larr]
>>> iarr
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

André

maksym · September 11, 2021, 10:54am

i’m sure there are ways to work around this, i’m saying python needs to change to not need such workarounds. this behavior is clearly breaking the least surprise principle to the point where i’d speculate it’s actually hurting the adoption of an entire syntactic construct.

maksym · September 11, 2021, 11:09am

i suppose everything hangs on the definition of “proper closure”. yes, python lambdas do close over locals available in their lexical scope, but is it actually valuable to store references to variables rather than references to values of those variables? readability counts and if i encounter lambda: x, i would prefer to read it as “the value of x in the immediate vicinity of where this lambda was created” rather than “the value of x where lambda was created, but also possibly any other value that x may have been assigned to as the function proceeded through loops and branches”.

as to how the language could be changed to fix this issue - i didn’t look at how lambdas are implemented, but if we’re going by how they function, i suppose changing something like lambda_scope = locals() to lambda_scope = dict(**locals()) would do it.

Edit: disregard this, it’s a dumb idea, the real issue is that loop variables are scoped to the loop rather than to the iteration.

steven.daprano · September 12, 2021, 4:53am

Hi Maksym,

Python doesn’t need to change from late binding to early binding in
closures. Doing so would unnecessarily break code that requires late
binding and annoy people who expect the current behaviour and are
surprised by early binding.

Lambdas and closures here work with exactly the same execution model
(late binding) as functions using module or builtin variables and names,
and methods using attribute lookups and inheritence. If you write a
function or a method, you are using exactly the same model for name
lookups that is used by lambda. (Although the mechanism is different.)

Names are evaluated when the function is called, not when the function
is defined.

This is true for closures regardless of whether you use lambda or def;
it is true for methods and module-level functions regardless of whether
you use lambda or def. It is always true in Python.

You probably use this behaviour dozens, hundreds, maybe thousands of
times and take it completely for granted. You probably rely on this
behaviour and have code that would break if we shifted from late binding
to early binding.

There is one part of Python’s execution module which consistently uses
early binding instead of late binding: default arguments for function
parameters. And there people have the opposite “surprise” – they get
surprised and annoyed because parameter default values work the way you
want lambda to work, and demand that we follow the Principle Of Least
Surprise and swap to late binding.

So we can’t win. Whatever behaviour we choose, early or late binding,
people will be surprised and annoyed.

What this demonstrates is one of the weakness of the Principle Of Least
Surprise: people aren’t surprised because lambdas in closures violate
Python’s execution model, or because it violates some fundamental
principle of behaviour. It doesn’t – it follows the same rules as other
functions and methods, and many other languages.

People get “surprised” because the computer does what they told it to
do, rather than what they wanted it to do. And there’s nothing we can do
about that.

maksym · September 12, 2021, 6:24am

Hi Steven,

as i was told in other discussions on this, the real issue is that loop variables are scoped to the loop rather than to the iteration. this works for pretty much every purpose, but doesn’t when one wants to create a lambda to delay execution of some task - loop variables are inputs to those delayed tasks and having those inputs turn out to be the same value for all tasks is imo surprising.

and i mean surprising in a sense that code clearly intends to do something other than what ends up happening and it takes a quite deep understanding of execution model to explain the difference.

here’s another example and tell me if you don’t find it surprising in the sense that i described:

>>> def lg():
...     for i in range(10):
...             yield lambda: i
...
>>> g = lg()
>>> iarr1 = [v() for v in g]
>>> g = lg()
>>> garr = list(g)
>>> iarr2 = [v() for v in garr]
>>> iarr1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> iarr2
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

steven.daprano · September 12, 2021, 9:11am

To answer your question, no, I don’t find it surprising. I won’t lie and
say I have never accidentally written code like that, I have. When I
do, I am annoyed but not surprised. I am annoyed that I made a
mistake.

A few days ago I was annoyed at myself to realise that this doesn’t sort
data from smallest to largest:

data = [x.as_integer_ratio() for x in numbers]
data.sort()

The code, or something very similar, had been in production for years
until I realised that not only does it not do what I thought it did, but
instead of being an optimisation, it was a pessimisation (slowed down
the code). Clearly sorting is broken because it didn’t do what I want,
only what I told it to do.

Consider this snippet of code:

def f():
    print(i, 'f', string)

for i, string in enumerate(('hello', 'goodbye')):
    f()

assert i == 1 and string == 'goodbye'
f()

What output do you expect? I think that nearly everyone will agree that
it should print

0 f hello
1 f goodbye
1 f goodbye

How about this?

for i, string in enumerate(('hello', 'goodbye')):
    def f():
        print(i, 'f', string)
    f()

assert i == 1 and string == 'goodbye'
f()

Now what output do you expect? I think most people will expect exactly
the same output as the first version.

What if we captured both of the function objects, rather than letting
the first be garbage-collected?

for i, string in enumerate(('hello', 'goodbye')):
    if i == 0:
        def f():  print(i, 'f', string)
        f()
    elif i == 1:
        def g():  print(i, 'g', string)
        g()

assert i == 1 and string == 'goodbye'
f()
g()

Apart from the one extra line of output, and the change from f to g, I
think most people will predict the behaviour of this will be the same as
the first two:

0 f hello
1 g goodbye
1 f goodbye
1 g goodbye

I don’t think there’s anything even a little bit surprising so far. Do
you agree, or are you surprised?

What if we changed the def statements to lambda? The behaviour is
exactly the same. Lambdas are not a distinct kind of thing with
different behaviour, they’re just syntactic sugar for an anonymous
function. Feel free to change the def statements into lambdas:

f = lambda: print(i, 'f', string)

but nothing should change. I hope that so far nothing surprises you.

Now let’s move that code, exactly as it is, into a function:

def test():
    for i, string in enumerate(('hello', 'goodbye')):
        if i == 0:
            def f():  print(i, 'f', string)
            f()
        elif i == 1:
            def g():  print(i, 'g', string)
            g()
    assert i == 1 and string == 'goodbye'
    f()
    g()


test()

Do you agree that moving a block of code into a function shouldn’t
change its behaviour? Nothing about the environment has changed except
that it is inside a function rather than top level module code. The
semantics are identical.

Now change the test() function so that it returns f and g, and call
them from outside the function:

def test():
    for i, string in enumerate(('hello', 'goodbye')):
        if i == 0:
            def f():  print(i, 'f', string)
            f()
        elif i == 1:
            def g():  print(i, 'g', string)
            g()
    assert i == 1 and string == 'goodbye'
    return f, g


a, b = test()
a()
b()

Why would you expect the behaviour to change? That really would be
surprising!

maksym · September 12, 2021, 11:46am

def f():
    print(i, 'f', string)

for i, string in enumerate(('hello', 'goodbye')):
    f()

assert i == 1 and string == 'goodbye'
f()

i would honestly expect this code to outright fail for using undeclared variable inside f.

it is my personal preference for loop variable to be scoped to single iteration rather than to the entire loop, so to answer your question regarding the first f/g snippet - no, i would not expect the output to be the same, i would expect f to always have i be bound to the variable at the iteration where f was created.

Do you agree that moving a block of code into a function shouldn’t
change its behaviour?

i’m not sure from your examples why would moving that code into the function change the result if loop variable was scoped to the iteration.

as a followup here’s almost the same statement: contents of a sequence shouldn’t depend on how you’re iterating over that sequence. but right now it can seem like it does.

i see it as a language wart and i agree that if one understands model of execution it changes from being a surprise to being an annoyance. i’m just arguing that it should be neither.

steven.daprano · September 12, 2021, 12:43pm

i would honestly expect this code to outright fail for using

undeclared variable inside f.

Can I ask how much experience you have actually using Python? And what

your programming background is?

If you expect to need to declare variables, it sounds like you don’t

know Python very well, and you are expecting it to work like some other

language which you do know.

Python never requires variables to be declared before use. (At most, you

may need a global or nonlocal statement to tell the compiler which scope

the variable belongs to.) It would be awfully inconvenient to have

declare every variable you use, since every class, function, and module

name is a variable.

it is my personal preference for loop variable to be scoped to single

iteration

Ewwww.

That violates the expectation that unrolled and rolled loops are

equivalent.

# Unquestionably one scope.

i = 1

process(i)

i = 2

process(i)



# Equivalent to this.

for i in (1, 2):

    process(i)

As you have discovered, in Python the rolled up loop behaves just like

the unrolled loop, even if you have a closure and a function.

Anyway, it’s not clear to me why “loop variables” are so special that

they have to be handled differently from other variables.

i would expect f to always have i be bound to the

variable at the iteration where f was created.

That is precisely what Python already does.

[…]

as a followup here’s almost the same statement: contents of a sequence

shouldn’t depend on how you’re iterating over that sequence. but right

now it does.

How do you come to that conclusion?

The contents of the sequence doesn’t depend on how you iterate over

it, but when you call the functions and the values currently held by

any variables used by the function.

It shouldn’t be surprising that when you use variables that vary, the

results of calling the functions will likewise vary.

maksym · September 12, 2021, 1:22pm

If you expect to need to declare variables

you misunderstood me.

print(x) # <- x is undeclared / out of scope

def f():
    print(x) # <- x is still undeclared / out of scope

x = 1 # <- current scope gains a variable x

def g():
    print(x) # <- g's scope doesn't have x,
             # but the scope in which g is defined does have x

hope that clears it up.

personally i think it’s a bug that x magically becomes visible inside f after f has been defined. to me it’s scope leaking.

That violates the expectation that unrolled and rolled loops are equivalent.

this entirely depends on how you think loop unrolling should work, do you think there’s one true answer in here?

# rolled
for i in (1,2):
  print(i)

# unrolled 1
i = 1
print(i)
i = 2
print(2)

# unrolled 2
i1 = 1
print(i1)
i2 = 2
print(i2)

i would expect f to always have i be bound to the variable at the iteration where f was created.

That is precisely what Python already does.

i will be bound to the single variable created for entire loop, rather than to the variable isolated for the iteration.

How do you come to that conclusion?

i’ve edited my answer before you replied to include that “it can seem like” contents of the collection depend on how it’s iterated. and it can seem so because that’s what the end user observes - they get a hold of a collection and their result is different depending on how they iterate over it.

i know that in current execution model it’s a mistake on part of the developer that created that collection, but if loop variables were scoped to the iteration - those kinds of bugs would be fixed and i’m yet to see a valid code that would break because of it.

storchaka · September 12, 2021, 2:26pm

Do you expect that no recursive function should work? Because when you define function f the name f is not yet set. And even if by some magic allow self-recursive functions, what to do with indirect recursion? You will need a forward declaration of functions, classes, global and local variables, methods, attributes.

There are programming languages which require you to declare every name before using it. Python is not such language. It is a part of what makes it Python. If you cannot live with this, Python is not for you.

maksym · September 12, 2021, 2:56pm

It is a part of what makes it Python. If you cannot live with this, Python is not for you.

first of all, no need for condescending tone, there is no god-given decree that sets in stone what python is and how it has to be. anything can change and a lot has already changed throughout this language’s history.

now on to the relevant stuff:

And even if by some magic allow self-recursive functions, what to do with indirect recursion? You will need a forward declaration of functions, classes, global and local variables, methods, attributes.

you’re reacting to an off-hand comment i made, treating it as if it’s my main contention and suggestion to change it in python. well, quite simply, it’s not.

since we’re on this topic, i can elaborate why i made that comment, but just so we’re clear - i’m ok with how name resolution works in python and absolutely don’t intend to suggest to change it.

HOWEVER

whether the code is interpreted or compiled, usually it passes through AST parser beforehand - right there it’s possible to tell if a name refers to something that hasn’t been or will not be declared in the relevant scope (dynamic declarations aside).

it is an implementation detail of python that functions are callable objects, so you may say that making a distinction between accessing a variable before it’s been created and accessing a function before it’s been defined, is wrong, but IMHO allowing access to an undefined variable leads to more errors than allowing to call an undefined function (provided that interpreter/compiler checks that the name will eventually be defined in relevant scope), which is why i would ban the former and allow the latter.

Asday · September 12, 2021, 3:12pm

personally i think it’s a bug that x magically becomes visible inside f after f has been defined.

That’s an opinion only you hold. “Fixing” that bug would break untold amounts of programs in production.

to me it’s scope leaking.

Scope leaking would be the following:

>>> def f():
...     x = 1
...
>>> f()
>>> print(x)
1

What you’re experiencing is late binding.

If you have a good, evidence-based case for why removing late binding from Python would make the language better, rather than “I am personally surprised by it”, please make that case.

maksym · September 12, 2021, 3:14pm

If you have a good, evidence-based case for why removing late binding from Python would make the language better

i am not suggesting to remove late binding, please read my comments in this thread again.

Asday · September 12, 2021, 3:17pm

print(x) # <- x is undeclared / out of scope

def f():
    print(x) # <- x is still undeclared / out of scope

x = 1 # <- current scope gains a variable x

def g():
    print(x) # <- g's scope doesn't have x,
             # but the scope in which g is defined does have x

That’s exactly what you’re suggesting with this snippet.

maksym · September 12, 2021, 3:23pm

again, please read this thread or ask questions when confused, because it sure looks like you’ve made a lot of assumptions about what i think and suggest.

here: Make lambdas proper closures - #13 by maksym

steven.daprano · September 12, 2021, 3:39pm

For anyone who feels strongly about this issue, I’ve created a poll on this issue and mentioned it on the Python-Ideas mailing list.

steven.daprano · September 12, 2021, 4:19pm

Sorry Maksym, your attempt to explain why I have misunderstood you has
just confused me more.

You seem to understand why no declarations are needed, that forward
references are perfectly legal in Python (regardless of whether they are
within a function or not) and yet you were surprised that code using a
forward reference didn’t fail.

And then after demonstrating that you understand perfectly why it is
legal for a function to have a forward reference to a name not yet
defined (otherwise we would need Pascal-like “forward” declarations,
how 1970s is that?) you make a statement like this:

“personally i think it’s a bug that x magically becomes visible inside
f after f has been defined. to me it’s scope leaking.”

o_O

Anyway, moving on…

Obviously there is no “one true answer” to how loop unrolling should
work. Likewise for scoping and early/late binding, where programming
languages are free to make their own choices.

I may have strong opinions on what I like, but that’s not to say that
other languages should not make other decisions.

But if the rolled and unrolled loops are not equivalent, then that is
going to be surprising, and it would likely rule out future compiler
optimizations to do with loop unrolling. CPython doesn’t currently do
that, but some day it might, and other implementations such as PyPy
might even do it now.

One major issue is that there is a ton of code that assumes that the
body of a for loop is the same as the surrounding scope.

x = 1
for item in seq:
    x = something()

If the x inside the loop and the x outside the loop were different
variables, that would break lots of code. So any change to the scoping
rules would have to be include an implicit nonlocal declaration inside
the body. And that would likely fail mysteriously in nested code like
this:

def outer():
    x = 999
    seq = something()
    assert seq

    def inner():
        for item in seq:
            # implicit nonlocal x
            x = process(item)
        do_something_with(x)

    inner()
    assert x == 999  # Fails!

The intent is for x to be local to inner(), but the implicit nonlocal
makes it local to outer() instead because that’s the first scope with
an existing name x. Ouch!

https://docs.python.org/3/reference/simple_stmts.html#the-nonlocal-statement

Now there are probably cunning ways to make it work so that x goes into
inner() rather than outer(), but the level of complexity is increasing
and the likelihood of bugs and gotchas and weird unexpected corner cases
is high. And even if we can get it perfectly right, it is still a
backwards-incompatible change due to putting the loop variable in its
own scope rather than the function scope.

maksym · September 12, 2021, 4:44pm

Steven, I appreciate your patience!

you were surprised that code using a forward reference didn’t fail

this was a bit of a tongue in cheek and i’ve elaborated why i made that comment in my reply to Serhiy: Make lambdas proper closures - #13 by maksym

to reiterate - i would make distinction in name resolution between functions and variables, but i realize that in python it’s not possible because both are just names and functions just happen to be callable. not suggesting anything needs to change here.

Regarding loop rolling, we both agree that there is no one true answer on how to unroll, but in then whichever way to unroll we choose, we can pick the right scoping rules to satisfy rolled/unrolled equivalency. I don’t understand why you seem to insist that there wouldn’t be equivalency with iteration-scoped loop variables?

As for your examples with x variable, i think we’re getting sidetracked as my suggestion would be only to make loop variables iteration-scoped (only item in your examples) and leave everything else working as it did before. Also it seems completely doable for me to let the last iteration of loop variables to leak into outer scope to remain compatible with essentially all existing workflows.

Thanks for making that poll and linking lots of useful context!