Python 3.11 optimized loading methods for calls (see the table there, third row from bottom). In the case of list.append, that now seems to beat the optimization of storing the method in a local variable. For dict.get and set.add, it now seems about equally fast:
15.4 ± 0.1 ns a.append(0)
21.2 ± 0.1 ns a_append(0)
37.5 ± 0.2 ns d.get(0)
37.3 ± 0.1 ns d_get(0)
28.5 ± 0.1 ns s.add(0)
27.6 ± 0.1 ns s_add(0)
Python: 3.11.4 (main, Sep 9 2023, 15:09:21) [GCC 13.2.1 20230801]
With Python 3.10 (on another machine):
38.2 ± 0.1 ns a.append(0)
28.3 ± 0.1 ns a_append(0)
46.9 ± 0.1 ns d.get(0)
34.4 ± 0.1 ns d_get(0)
46.0 ± 0.5 ns s.add(0)
34.3 ± 0.3 ns s_add(0)
Python: 3.10.9 (main, Jan 23 2023, 22:32:48) [GCC 10.2.1 20210110]
Benchmark script
from timeit import timeit
from statistics import mean, stdev
import sys
setup = '''
a = []
a_append = a.append
d = {0: 0}
d_get = d.get
s = set()
s_add = s.add
'''
codes = [
'a.append(0)',
'a_append(0)',
'd.get(0)',
'd_get(0)',
's.add(0)',
's_add(0)',
]
times = {c: [] for c in codes}
def stats(c):
ts = [t * 1e9 for t in sorted(times[c])[:5]]
return f'{mean(ts):4.1f} ± {stdev(ts):3.1f} ns '
for _ in range(500):
for c in codes:
t = timeit(c, setup, number=10**5) / 1e5
times[c].append(t)
for c in codes:
print(stats(c), c)
print('\nPython:', sys.version)
The standard library uses that list.append optimization in various places, for example for copy.deepcopy:
cpython/Lib/copy.py at 7dd3c2b80064c39f1f0ebbc1f8486897b3148aa5 · python/cpython · GitHub
And removing that optimization does get me faster times, for example for deepcopying list(range(10000)):
2.83 ± 0.01 ms current
2.75 ± 0.01 ms deoptimized
(Attempt This Online!, the copy module is copied&pasted into the Header section.)
So should the standard library remove that optimization everywhere?