# Why is `pairwise` slower than `zip`?

Times for them creating a million `(None, None)` pairs:

`````` 23.17 ± 0.83 ms  zip_
33.27 ± 0.63 ms  pairwise_

Python: 3.11.4 (main, Sep  9 2023, 15:09:21) [GCC 13.2.1 20230801]
``````

The tested things:

``````def it():
return repeat(None, 10**6)

def pairwise_():
return pairwise(it())

def zip_():
return zip(it(), it())
``````

Why is `zip` so much faster? For each pair it has to get elements from two input iterators, whereas `pairwise` reuses one element. The latter should be faster.

Comparing `zip_next` and `pairwise_next`, the only potential reason I see is the latter’s `PyTuple_Pack(2, old, new)`. Is that it? Is `PyTuple_Pack` harmfully slow? Should it maybe be `PyTuple_Pack(old, new)`, i.e., use a version for exactly two elements?

Benchmark script

Attempt This Online!

``````from timeit import timeit
from statistics import mean, stdev
from collections import deque
from itertools import pairwise, repeat
import sys

def it():
return repeat(None, 10**6)

def pairwise_():
return pairwise(it())

def zip_():
return zip(it(), it())

funcs = pairwise_, zip_

consume = deque(maxlen=1).extend

times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):6.2f} ± {stdev(ts):4.2f} ms '
for _ in range(25):
for f in funcs:
t = timeit(lambda: consume(f()), number=1)
times[f].append(t)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)

print('\nPython:', sys.version)
``````
1 Like

I cannot duplicate with 3.10 on Windows 10.

``````15.93 ± 0.12 ms  pairwise_
16.14 ± 0.17 ms  zip_

Python: 3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC v.1916 64 bit (AMD64)]
``````

Note that I used `consume = deque(maxlen=1).extend` to consume them. Times for other consumers:

A `for` loop keeping a reference shows a similar picture:

``````def consume(iterable):
for element in iterable:
pass

26.20 ± 0.67 ms  zip_
37.37 ± 1.12 ms  pairwise_
``````

`maxlen=0` would allow `zip` to reuse its result tuple, an optimization which `pairwise` doesn’t have:

``````consume = deque(maxlen=0).extend

8.79 ± 0.05 ms  zip_
31.01 ± 0.09 ms  pairwise_
``````

Note that these benchmarks were run independently, so times across benchmark runs aren’t totally comparable because the machine might’ve been differently busy. Only within each benchmark run are times comparable.

1 Like

Thanks. I tried on replit.com’s 3.10.11 now and can reproduce it there:

`````` 22.01 ± 0.35 ms  zip_
28.33 ± 0.53 ms  pairwise_

Python: 3.10.11 (main, Apr  4 2023, 22:10:32) [GCC 12.2.0]
``````

Perhaps a better question is “Why is `pairwise` slower than `zip` on Linux?”

*James Parrot has duplicated the slowdown on Windows for 3.10 and 3.11.

Hi Steven,
That would be a good question, but there can still be a difference on Windows machines too. Sorry - I meant to chip in earlier with my results from Python 3.11 on Windows 11 that confirm Stefan’s findings, but thought I’d leave it to you experts. I installed 3.10 just now out of curiosity and to try to recreate your findings, and will report back shortly with the results for 3.9

`````` 36.50 ± 0.30 ms  zip_
45.76 ± 0.70 ms  pairwise_

Python: 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]

37.93 ± 0.20 ms  zip_
44.37 ± 0.56 ms  pairwise_

Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]

``````

Thanks for double-checking the results on your machine.

No need to check as `pairwise` was added in 3.10.

Indeed it was. Python 3.10.8 still shows a difference on mine. I assume the speed up on yours is due to your hardware, not to conda-forge.

34.55 ± 0.16 ms zip_
45.55 ± 0.28 ms pairwise_

Python: 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]

Apple M2:

`````` 10.41 ± 0.04 ms  zip_
11.45 ± 0.09 ms  pairwise_

Python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:33:12) [Clang 15.0.7 ]
``````

If I modify

``````consume = deque(maxlen=1).extend
``````

to

``````consume = deque(maxlen=3).extend  # or a value > 3
``````

then I consistently get a reversed order:

`````` 11.57 ± 0.01 ms  pairwise_
12.05 ± 0.02 ms  zip_
``````

There’s also a comment right above that, suggesting that it could re-use the tuple. That’s what `zip_next` is doing, although the comment references enumobject.c. Seems likely that’s the culprit?

No, that tuple reuse optimization isn’t it. I mentioned that in my second post here, where I used maxlen=0 to show the effect of that optimization. My real benchmark uses maxlen=1 to avoid that optimization (the deque then keeps a reference to the latest tuple, so zip can’t reuse it).

1 Like

Interesting. For me, zip remains significantly faster, but less so. With maxlen=7, zip is still slightly faster, and with maxlen≥8, pairwise becomes slightly faster (but even with maxlen=50 it’s only slightly faster)