Python 3.11 optimized loading methods for calls (see the table there, third row from bottom). In the case of list.append
, that now seems to beat the optimization of storing the method in a local variable. For dict.get
and set.add
, it now seems about equally fast:
15.4 ± 0.1 ns a.append(0)
21.2 ± 0.1 ns a_append(0)
37.5 ± 0.2 ns d.get(0)
37.3 ± 0.1 ns d_get(0)
28.5 ± 0.1 ns s.add(0)
27.6 ± 0.1 ns s_add(0)
Python: 3.11.4 (main, Sep 9 2023, 15:09:21) [GCC 13.2.1 20230801]
With Python 3.10 (on another machine):
38.2 ± 0.1 ns a.append(0)
28.3 ± 0.1 ns a_append(0)
46.9 ± 0.1 ns d.get(0)
34.4 ± 0.1 ns d_get(0)
46.0 ± 0.5 ns s.add(0)
34.3 ± 0.3 ns s_add(0)
Python: 3.10.9 (main, Jan 23 2023, 22:32:48) [GCC 10.2.1 20210110]
Benchmark script
from timeit import timeit
from statistics import mean, stdev
import sys
setup = '''
a = []
a_append = a.append
d = {0: 0}
d_get = d.get
s = set()
s_add = s.add
'''
codes = [
'a.append(0)',
'a_append(0)',
'd.get(0)',
'd_get(0)',
's.add(0)',
's_add(0)',
]
times = {c: [] for c in codes}
def stats(c):
ts = [t * 1e9 for t in sorted(times[c])[:5]]
return f'{mean(ts):4.1f} ± {stdev(ts):3.1f} ns '
for _ in range(500):
for c in codes:
t = timeit(c, setup, number=10**5) / 1e5
times[c].append(t)
for c in codes:
print(stats(c), c)
print('\nPython:', sys.version)
The standard library uses that list.append
optimization in various places, for example for copy.deepcopy
:
https://github.com/python/cpython/blob/7dd3c2b80064c39f1f0ebbc1f8486897b3148aa5/Lib/copy.py#L191-L197
And removing that optimization does get me faster times, for example for deepcopying list(range(10000))
:
2.83 ± 0.01 ms current
2.75 ± 0.01 ms deoptimized
(Attempt This Online!, the copy
module is copied&pasted into the Header section.)
So should the standard library remove that optimization everywhere?