I’m curious how Python code generation is able to tell a variable is going to be needed in an inner function and put it in enclosing scope. This is specific to the thinking of compilation and code generation because I am trying to implement similar for a personal project.
Some reference code:
>>> def outer():
... a = 100
... def inner():
... return a + 1
... b = inner()
... return b
...
>>> from dis import dis
>>> dis(outer)
2 0 LOAD_CONST 1 (100)
2 STORE_DEREF 0 (a)
3 4 LOAD_CLOSURE 0 (a)
6 BUILD_TUPLE 1
8 LOAD_CONST 2 (<code object inner at 0x000001869788BC90, file "<stdin>", line 3>)
10 LOAD_CONST 3 ('outer.<locals>.inner')
12 MAKE_FUNCTION 8 (closure)
14 STORE_FAST 0 (inner)
5 16 LOAD_FAST 0 (inner)
18 CALL_FUNCTION 0
20 STORE_FAST 1 (b)
6 22 LOAD_FAST 1 (b)
24 RETURN_VALUE
Disassembly of <code object inner at 0x000001869788BC90, file "<stdin>", line 3>:
4 0 LOAD_DEREF 0 (a)
2 LOAD_CONST 1 (1)
4 BINARY_ADD
6 RETURN_VALUE
The interpreter knew to use a STORE_DEREF for the a variable in the outer function. I guess I should also highlight it knows to use LOAD_CLOSURE too. Just to be thorough, this becomes a STORE_FAST if the variable isn’t in enclosing scope:
>>> def outer2():
... a = 100
... def inner():
... return 1
... b = a + inner()
... return b
...
>>> dis(outer2)
2 0 LOAD_CONST 1 (100)
2 STORE_FAST 0 (a)
3 4 LOAD_CONST 2 (<code object inner at 0x0000018697DA2DF0, file "<stdin>", line 3>)
6 LOAD_CONST 3 ('outer2.<locals>.inner')
8 MAKE_FUNCTION 0
10 STORE_FAST 1 (inner)
5 12 LOAD_FAST 0 (a)
14 LOAD_FAST 1 (inner)
16 CALL_FUNCTION 0
18 BINARY_ADD
20 STORE_FAST 2 (b)
6 22 LOAD_FAST 2 (b)
24 RETURN_VALUE
Disassembly of <code object inner at 0x0000018697DA2DF0, file "<stdin>", line 3>:
4 0 LOAD_CONST 1 (1)
2 RETURN_VALUE
All I can really think is that the parser is detecting all the functions underneath the existing on, generating the code for them first, keeping track of the variables involved, and then exploring that from higher levels.
The one strike against this is that code objects aren’t identified first. In outer(), the code object for inner is symbol 2 and not 0. Is it really doing this on the fly and fixing it up afterwards or something?