Make lambdas proper closures

Does nobody find this behavior bizarre? Is it actually useful for anything and does anybody rely on it?

>>> larr = [lambda: i for i in range(10)]
>>> iarr = [l() for l in larr]
>>> iarr
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

Lambdas do have closures, that’s not really the problem here. This happens because Python doesn’t have block scope, the for loop simply reassigns the same variable each iteration. It is unfortunate it doesn’t really do what you want here, but it’s not exactly obvious how you’d change the language to fix it.

Yes, it is definitely unexpected at first glance. You can use the following instead:

>>> larr = [lambda i=i: i for i in range(10)]
>>> iarr = [l() for l in larr]
>>> iarr
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

André

2 Likes

i’m sure there are ways to work around this, i’m saying python needs to change to not need such workarounds. this behavior is clearly breaking the least surprise principle to the point where i’d speculate it’s actually hurting the adoption of an entire syntactic construct.

i suppose everything hangs on the definition of “proper closure”. yes, python lambdas do close over locals available in their lexical scope, but is it actually valuable to store references to variables rather than references to values of those variables? readability counts and if i encounter lambda: x, i would prefer to read it as “the value of x in the immediate vicinity of where this lambda was created” rather than “the value of x where lambda was created, but also possibly any other value that x may have been assigned to as the function proceeded through loops and branches”.

as to how the language could be changed to fix this issue - i didn’t look at how lambdas are implemented, but if we’re going by how they function, i suppose changing something like lambda_scope = locals() to lambda_scope = dict(**locals()) would do it.

Edit: disregard this, it’s a dumb idea, the real issue is that loop variables are scoped to the loop rather than to the iteration.

Hi Maksym,

Python doesn’t need to change from late binding to early binding in
closures. Doing so would unnecessarily break code that requires late
binding and annoy people who expect the current behaviour and are
surprised by early binding.

Lambdas and closures here work with exactly the same execution model
(late binding) as functions using module or builtin variables and names,
and methods using attribute lookups and inheritence. If you write a
function or a method, you are using exactly the same model for name
lookups that is used by lambda. (Although the mechanism is different.)

Names are evaluated when the function is called, not when the function
is defined.

This is true for closures regardless of whether you use lambda or def;
it is true for methods and module-level functions regardless of whether
you use lambda or def. It is always true in Python.

You probably use this behaviour dozens, hundreds, maybe thousands of
times and take it completely for granted. You probably rely on this
behaviour and have code that would break if we shifted from late binding
to early binding.

There is one part of Python’s execution module which consistently uses
early binding instead of late binding: default arguments for function
parameters. And there people have the opposite “surprise” – they get
surprised and annoyed because parameter default values work the way you
want lambda to work, and demand that we follow the Principle Of Least
Surprise and swap to late binding.

So we can’t win. Whatever behaviour we choose, early or late binding,
people will be surprised and annoyed.

What this demonstrates is one of the weakness of the Principle Of Least
Surprise: people aren’t surprised because lambdas in closures violate
Python’s execution model, or because it violates some fundamental
principle of behaviour. It doesn’t – it follows the same rules as other
functions and methods, and many other languages.

People get “surprised” because the computer does what they told it to
do, rather than what they wanted it to do. And there’s nothing we can do
about that.

2 Likes

Hi Steven,

as i was told in other discussions on this, the real issue is that loop variables are scoped to the loop rather than to the iteration. this works for pretty much every purpose, but doesn’t when one wants to create a lambda to delay execution of some task - loop variables are inputs to those delayed tasks and having those inputs turn out to be the same value for all tasks is imo surprising.

and i mean surprising in a sense that code clearly intends to do something other than what ends up happening and it takes a quite deep understanding of execution model to explain the difference.

here’s another example and tell me if you don’t find it surprising in the sense that i described:

>>> def lg():
...     for i in range(10):
...             yield lambda: i
...
>>> g = lg()
>>> iarr1 = [v() for v in g]
>>> g = lg()
>>> garr = list(g)
>>> iarr2 = [v() for v in garr]
>>> iarr1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> iarr2
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

To answer your question, no, I don’t find it surprising. I won’t lie and
say I have never accidentally written code like that, I have. When I
do, I am annoyed but not surprised. I am annoyed that I made a
mistake.

A few days ago I was annoyed at myself to realise that this doesn’t sort
data from smallest to largest:

data = [x.as_integer_ratio() for x in numbers]
data.sort()

The code, or something very similar, had been in production for years
until I realised that not only does it not do what I thought it did, but
instead of being an optimisation, it was a pessimisation (slowed down
the code). Clearly sorting is broken because it didn’t do what I want,
only what I told it to do.

Consider this snippet of code:

def f():
    print(i, 'f', string)

for i, string in enumerate(('hello', 'goodbye')):
    f()

assert i == 1 and string == 'goodbye'
f()

What output do you expect? I think that nearly everyone will agree that
it should print

0 f hello
1 f goodbye
1 f goodbye

How about this?

for i, string in enumerate(('hello', 'goodbye')):
    def f():
        print(i, 'f', string)
    f()

assert i == 1 and string == 'goodbye'
f()

Now what output do you expect? I think most people will expect exactly
the same output as the first version.

What if we captured both of the function objects, rather than letting
the first be garbage-collected?

for i, string in enumerate(('hello', 'goodbye')):
    if i == 0:
        def f():  print(i, 'f', string)
        f()
    elif i == 1:
        def g():  print(i, 'g', string)
        g()

assert i == 1 and string == 'goodbye'
f()
g()

Apart from the one extra line of output, and the change from f to g, I
think most people will predict the behaviour of this will be the same as
the first two:

0 f hello
1 g goodbye
1 f goodbye
1 g goodbye

I don’t think there’s anything even a little bit surprising so far. Do
you agree, or are you surprised?

What if we changed the def statements to lambda? The behaviour is
exactly the same. Lambdas are not a distinct kind of thing with
different behaviour, they’re just syntactic sugar for an anonymous
function. Feel free to change the def statements into lambdas:

f = lambda: print(i, 'f', string)

but nothing should change. I hope that so far nothing surprises you.

Now let’s move that code, exactly as it is, into a function:

def test():
    for i, string in enumerate(('hello', 'goodbye')):
        if i == 0:
            def f():  print(i, 'f', string)
            f()
        elif i == 1:
            def g():  print(i, 'g', string)
            g()
    assert i == 1 and string == 'goodbye'
    f()
    g()


test()

Do you agree that moving a block of code into a function shouldn’t
change its behaviour? Nothing about the environment has changed except
that it is inside a function rather than top level module code. The
semantics are identical.

Now change the test() function so that it returns f and g, and call
them from outside the function:

def test():
    for i, string in enumerate(('hello', 'goodbye')):
        if i == 0:
            def f():  print(i, 'f', string)
            f()
        elif i == 1:
            def g():  print(i, 'g', string)
            g()
    assert i == 1 and string == 'goodbye'
    return f, g


a, b = test()
a()
b()

Why would you expect the behaviour to change? That really would be
surprising!

def f():
    print(i, 'f', string)

for i, string in enumerate(('hello', 'goodbye')):
    f()

assert i == 1 and string == 'goodbye'
f()

i would honestly expect this code to outright fail for using undeclared variable inside f.

it is my personal preference for loop variable to be scoped to single iteration rather than to the entire loop, so to answer your question regarding the first f/g snippet - no, i would not expect the output to be the same, i would expect f to always have i be bound to the variable at the iteration where f was created.

Do you agree that moving a block of code into a function shouldn’t
change its behaviour?

i’m not sure from your examples why would moving that code into the function change the result if loop variable was scoped to the iteration.

as a followup here’s almost the same statement: contents of a sequence shouldn’t depend on how you’re iterating over that sequence. but right now it can seem like it does.

i see it as a language wart and i agree that if one understands model of execution it changes from being a surprise to being an annoyance. i’m just arguing that it should be neither.

i would honestly expect this code to outright fail for using

undeclared variable inside f.

Can I ask how much experience you have actually using Python? And what

your programming background is?

If you expect to need to declare variables, it sounds like you don’t

know Python very well, and you are expecting it to work like some other

language which you do know.

Python never requires variables to be declared before use. (At most, you

may need a global or nonlocal statement to tell the compiler which scope

the variable belongs to.) It would be awfully inconvenient to have

declare every variable you use, since every class, function, and module

name is a variable.

it is my personal preference for loop variable to be scoped to single

iteration

Ewwww.

That violates the expectation that unrolled and rolled loops are

equivalent.

# Unquestionably one scope.

i = 1

process(i)

i = 2

process(i)



# Equivalent to this.

for i in (1, 2):

    process(i)

As you have discovered, in Python the rolled up loop behaves just like

the unrolled loop, even if you have a closure and a function.

Anyway, it’s not clear to me why “loop variables” are so special that

they have to be handled differently from other variables.

i would expect f to always have i be bound to the

variable at the iteration where f was created.

That is precisely what Python already does.

[…]

as a followup here’s almost the same statement: contents of a sequence

shouldn’t depend on how you’re iterating over that sequence. but right

now it does.

How do you come to that conclusion?

The contents of the sequence doesn’t depend on how you iterate over

it, but when you call the functions and the values currently held by

any variables used by the function.

It shouldn’t be surprising that when you use variables that vary, the

results of calling the functions will likewise vary.

If you expect to need to declare variables

you misunderstood me.

print(x) # <- x is undeclared / out of scope

def f():
    print(x) # <- x is still undeclared / out of scope

x = 1 # <- current scope gains a variable x

def g():
    print(x) # <- g's scope doesn't have x,
             # but the scope in which g is defined does have x

hope that clears it up.

personally i think it’s a bug that x magically becomes visible inside f after f has been defined. to me it’s scope leaking.

That violates the expectation that unrolled and rolled loops are equivalent.

this entirely depends on how you think loop unrolling should work, do you think there’s one true answer in here?

# rolled
for i in (1,2):
  print(i)

# unrolled 1
i = 1
print(i)
i = 2
print(2)

# unrolled 2
i1 = 1
print(i1)
i2 = 2
print(i2)

i would expect f to always have i be bound to the variable at the iteration where f was created.

That is precisely what Python already does.

i will be bound to the single variable created for entire loop, rather than to the variable isolated for the iteration.

How do you come to that conclusion?

i’ve edited my answer before you replied to include that “it can seem like” contents of the collection depend on how it’s iterated. and it can seem so because that’s what the end user observes - they get a hold of a collection and their result is different depending on how they iterate over it.

i know that in current execution model it’s a mistake on part of the developer that created that collection, but if loop variables were scoped to the iteration - those kinds of bugs would be fixed and i’m yet to see a valid code that would break because of it.

Do you expect that no recursive function should work? Because when you define function f the name f is not yet set. And even if by some magic allow self-recursive functions, what to do with indirect recursion? You will need a forward declaration of functions, classes, global and local variables, methods, attributes.

There are programming languages which require you to declare every name before using it. Python is not such language. It is a part of what makes it Python. If you cannot live with this, Python is not for you.

1 Like

It is a part of what makes it Python. If you cannot live with this, Python is not for you.

first of all, no need for condescending tone, there is no god-given decree that sets in stone what python is and how it has to be. anything can change and a lot has already changed throughout this language’s history.

now on to the relevant stuff:

And even if by some magic allow self-recursive functions, what to do with indirect recursion? You will need a forward declaration of functions, classes, global and local variables, methods, attributes.

you’re reacting to an off-hand comment i made, treating it as if it’s my main contention and suggestion to change it in python. well, quite simply, it’s not.

since we’re on this topic, i can elaborate why i made that comment, but just so we’re clear - i’m ok with how name resolution works in python and absolutely don’t intend to suggest to change it.

HOWEVER

whether the code is interpreted or compiled, usually it passes through AST parser beforehand - right there it’s possible to tell if a name refers to something that hasn’t been or will not be declared in the relevant scope (dynamic declarations aside).

it is an implementation detail of python that functions are callable objects, so you may say that making a distinction between accessing a variable before it’s been created and accessing a function before it’s been defined, is wrong, but IMHO allowing access to an undefined variable leads to more errors than allowing to call an undefined function (provided that interpreter/compiler checks that the name will eventually be defined in relevant scope), which is why i would ban the former and allow the latter.

personally i think it’s a bug that x magically becomes visible inside f after f has been defined.

That’s an opinion only you hold. “Fixing” that bug would break untold amounts of programs in production.

to me it’s scope leaking.

Scope leaking would be the following:

>>> def f():
...     x = 1
...
>>> f()
>>> print(x)
1

What you’re experiencing is late binding.

If you have a good, evidence-based case for why removing late binding from Python would make the language better, rather than “I am personally surprised by it”, please make that case.

1 Like

If you have a good, evidence-based case for why removing late binding from Python would make the language better

i am not suggesting to remove late binding, please read my comments in this thread again.

print(x) # <- x is undeclared / out of scope

def f():
    print(x) # <- x is still undeclared / out of scope

x = 1 # <- current scope gains a variable x

def g():
    print(x) # <- g's scope doesn't have x,
             # but the scope in which g is defined does have x

That’s exactly what you’re suggesting with this snippet.

again, please read this thread or ask questions when confused, because it sure looks like you’ve made a lot of assumptions about what i think and suggest.

here: Make lambdas proper closures - #13 by maksym

This post was flagged by the community and is temporarily hidden.

This post was flagged by the community and is temporarily hidden.

For anyone who feels strongly about this issue, I’ve created a poll on this issue and mentioned it on the Python-Ideas mailing list.

1 Like