Why is zip so much faster? For each pair it has to get elements from two input iterators, whereas pairwise reuses one element. The latter should be faster.
Comparing zip_next and pairwise_next, the only potential reason I see is the latter’s PyTuple_Pack(2, old, new). Is that it? Is PyTuple_Pack harmfully slow? Should it maybe be PyTuple_Pack(old, new), i.e., use a version for exactly two elements?
from timeit import timeit
from statistics import mean, stdev
from collections import deque
from itertools import pairwise, repeat
import sys
def it():
return repeat(None, 10**6)
def pairwise_():
return pairwise(it())
def zip_():
return zip(it(), it())
funcs = pairwise_, zip_
consume = deque(maxlen=1).extend
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):6.2f} ± {stdev(ts):4.2f} ms '
for _ in range(25):
for f in funcs:
t = timeit(lambda: consume(f()), number=1)
times[f].append(t)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
print('\nPython:', sys.version)
Note that I used consume = deque(maxlen=1).extend to consume them. Times for other consumers:
A for loop keeping a reference shows a similar picture:
def consume(iterable):
for element in iterable:
pass
26.20 ± 0.67 ms zip_
37.37 ± 1.12 ms pairwise_
maxlen=0 would allow zip to reuse its result tuple, an optimization which pairwise doesn’t have:
consume = deque(maxlen=0).extend
8.79 ± 0.05 ms zip_
31.01 ± 0.09 ms pairwise_
Note that these benchmarks were run independently, so times across benchmark runs aren’t totally comparable because the machine might’ve been differently busy. Only within each benchmark run are times comparable.
Hi Steven,
That would be a good question, but there can still be a difference on Windows machines too. Sorry - I meant to chip in earlier with my results from Python 3.11 on Windows 11 that confirm Stefan’s findings, but thought I’d leave it to you experts. I installed 3.10 just now out of curiosity and to try to recreate your findings, and will report back shortly with the results for 3.9
36.50 ± 0.30 ms zip_
45.76 ± 0.70 ms pairwise_
Python: 3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]
37.93 ± 0.20 ms zip_
44.37 ± 0.56 ms pairwise_
Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
There’s also a comment right above that, suggesting that it could re-use the tuple. That’s what zip_next is doing, although the comment references enumobject.c. Seems likely that’s the culprit?
No, that tuple reuse optimization isn’t it. I mentioned that in my second post here, where I used maxlen=0 to show the effect of that optimization. My real benchmark uses maxlen=1 to avoid that optimization (the deque then keeps a reference to the latest tuple, so zip can’t reuse it).
Interesting. For me, zip remains significantly faster, but less so. With maxlen=7, zip is still slightly faster, and with maxlen≥8, pairwise becomes slightly faster (but even with maxlen=50 it’s only slightly faster)