`append = mylist.append` optimization now harmful?

Python 3.11 optimized loading methods for calls (see the table there, third row from bottom). In the case of list.append, that now seems to beat the optimization of storing the method in a local variable. For dict.get and set.add, it now seems about equally fast:

15.4 ± 0.1 ns  a.append(0)
21.2 ± 0.1 ns  a_append(0)
37.5 ± 0.2 ns  d.get(0)
37.3 ± 0.1 ns  d_get(0)
28.5 ± 0.1 ns  s.add(0)
27.6 ± 0.1 ns  s_add(0)

Python: 3.11.4 (main, Sep  9 2023, 15:09:21) [GCC 13.2.1 20230801]

With Python 3.10 (on another machine):

38.2 ± 0.1 ns  a.append(0)
28.3 ± 0.1 ns  a_append(0)
46.9 ± 0.1 ns  d.get(0)
34.4 ± 0.1 ns  d_get(0)
46.0 ± 0.5 ns  s.add(0)
34.3 ± 0.3 ns  s_add(0)

Python: 3.10.9 (main, Jan 23 2023, 22:32:48) [GCC 10.2.1 20210110]
Benchmark script

Attempt This Online!

from timeit import timeit
from statistics import mean, stdev
import sys

setup = '''
a = []
a_append = a.append
d = {0: 0}
d_get = d.get
s = set()
s_add = s.add
'''

codes = [
    'a.append(0)',
    'a_append(0)',
    'd.get(0)',
    'd_get(0)',
    's.add(0)',
    's_add(0)',
]

times = {c: [] for c in codes}
def stats(c):
    ts = [t * 1e9 for t in sorted(times[c])[:5]]
    return f'{mean(ts):4.1f} ± {stdev(ts):3.1f} ns '
for _ in range(500):
    for c in codes:
        t = timeit(c, setup, number=10**5) / 1e5
        times[c].append(t)
for c in codes:
    print(stats(c), c)

print('\nPython:', sys.version)

The standard library uses that list.append optimization in various places, for example for copy.deepcopy:
https://github.com/python/cpython/blob/7dd3c2b80064c39f1f0ebbc1f8486897b3148aa5/Lib/copy.py#L191-L197

And removing that optimization does get me faster times, for example for deepcopying list(range(10000)):

2.83 ± 0.01 ms  current
2.75 ± 0.01 ms  deoptimized

(Attempt This Online!, the copy module is copied&pasted into the Header section.)

So should the standard library remove that optimization everywhere?

Does this persist for the nogil branch?

I recommend that you read through the source code to find out whether we intentionally did something about this. Also, 3.11, 3.12 and main (3.13-to-be) might be different.

Please don’t. Maybe we can recommend no longer doing this. But let’s not solicit mass PRs removing this. Such PRs tend to introduce bugs.

3 Likes

Not sure what you mean, as that seems obvious from the announcement about PEP 659 that I linked to.

Looking at the byte/source code, I see that a.append(0) ends up with LOAD_METHOD_NO_DICT and PRECALL_NO_KW_LIST_APPEND, whereas a_append(0) ends up with LOAD_FAST__LOAD_CONST and PRECALL_NO_KW_BUILTIN_O. And I guess PRECALL_NO_KW_LIST_APPEND is just significantly faster than PRECALL_NO_KW_BUILTIN_O as it is more special (its implementation looks simpler and hardcodes a _PyList_AppendTakeRef call). Although both are followed by CALL_ADAPTIVE, maybe the difference is in there, I don’t understand enough.

No idea, and I’m not equipped to compile/install CPython (also can’t test 3.12 or main).

Moved this to Help, as the suggested action (removing the optimization from the standard library) was rejected and I’m now more interested to see whether others and other versions can reproduce this and could use help for that.

Also, in the next Python release a new optimization can be added that will make the old trick relevant again (or at least make it not slower that more direct code).

1 Like

Yes, that’s partly why I asked whether it’s now harmful, as I don’t know enough to predict future developments but I imagine the people behind the interpreter optimizations might be.