Why is `pairwise` slower than `zip`?

pochmann · September 21, 2023, 1:05pm

Times for them creating a million (None, None) pairs:

 23.17 ± 0.83 ms  zip_
 33.27 ± 0.63 ms  pairwise_

Python: 3.11.4 (main, Sep  9 2023, 15:09:21) [GCC 13.2.1 20230801]

The tested things:

def it():
    return repeat(None, 10**6)

def pairwise_():
    return pairwise(it())

def zip_():
    return zip(it(), it())

Why is zip so much faster? For each pair it has to get elements from two input iterators, whereas pairwise reuses one element. The latter should be faster.

Comparing zip_next and pairwise_next, the only potential reason I see is the latter’s PyTuple_Pack(2, old, new). Is that it? Is PyTuple_Pack harmfully slow? Should it maybe be PyTuple_Pack(old, new), i.e., use a version for exactly two elements?

Benchmark script

Attempt This Online!

from timeit import timeit
from statistics import mean, stdev
from collections import deque
from itertools import pairwise, repeat
import sys


def it():
    return repeat(None, 10**6)

def pairwise_():
    return pairwise(it())

def zip_():
    return zip(it(), it())


funcs = pairwise_, zip_

consume = deque(maxlen=1).extend

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e3 for t in sorted(times[f])[:5]]
    return f'{mean(ts):6.2f} ± {stdev(ts):4.2f} ms '
for _ in range(25):
    for f in funcs:
        t = timeit(lambda: consume(f()), number=1)
        times[f].append(t)
for f in sorted(funcs, key=stats):
    print(stats(f), f.__name__)

print('\nPython:', sys.version)

steven.rumbalski · September 21, 2023, 1:27pm

I cannot duplicate with 3.10 on Windows 10.

15.93 ± 0.12 ms  pairwise_
16.14 ± 0.17 ms  zip_

Python: 3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC v.1916 64 bit (AMD64)]

pochmann · September 21, 2023, 1:29pm

Note that I used consume = deque(maxlen=1).extend to consume them. Times for other consumers:

A for loop keeping a reference shows a similar picture:

def consume(iterable):
    for element in iterable:
        pass

 26.20 ± 0.67 ms  zip_
 37.37 ± 1.12 ms  pairwise_

maxlen=0 would allow zip to reuse its result tuple, an optimization which pairwise doesn’t have:

consume = deque(maxlen=0).extend

  8.79 ± 0.05 ms  zip_
 31.01 ± 0.09 ms  pairwise_

Note that these benchmarks were run independently, so times across benchmark runs aren’t totally comparable because the machine might’ve been differently busy. Only within each benchmark run are times comparable.

pochmann · September 21, 2023, 1:36pm

Thanks. I tried on replit.com’s 3.10.11 now and can reproduce it there:

 22.01 ± 0.35 ms  zip_
 28.33 ± 0.53 ms  pairwise_

Python: 3.10.11 (main, Apr  4 2023, 22:10:32) [GCC 12.2.0]

steven.rumbalski · September 21, 2023, 3:07pm

~~Perhaps a better question is “Why is pairwise slower than zip on Linux?”~~

*James Parrot has duplicated the slowdown on Windows for 3.10 and 3.11.

JamesParrott · September 21, 2023, 7:25pm

Hi Steven,
That would be a good question, but there can still be a difference on Windows machines too. Sorry - I meant to chip in earlier with my results from Python 3.11 on Windows 11 that confirm Stefan’s findings, but thought I’d leave it to you experts. I installed 3.10 just now out of curiosity and to try to recreate your findings, and will report back shortly with the results for 3.9

 36.50 ± 0.30 ms  zip_
 45.76 ± 0.70 ms  pairwise_

Python: 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]

 37.93 ± 0.20 ms  zip_
 44.37 ± 0.56 ms  pairwise_

Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]

steven.rumbalski · September 21, 2023, 7:40pm

Thanks for double-checking the results on your machine.

No need to check as pairwise was added in 3.10.

JamesParrott · September 21, 2023, 8:31pm

Indeed it was. Python 3.10.8 still shows a difference on mine. I assume the speed up on yours is due to your hardware, not to conda-forge.

34.55 ± 0.16 ms zip_
45.55 ± 0.28 ms pairwise_

Python: 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]

hansgeunsmeyer · September 21, 2023, 11:56pm

Apple M2:

 10.41 ± 0.04 ms  zip_
 11.45 ± 0.09 ms  pairwise_

Python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:33:12) [Clang 15.0.7 ]

If I modify

consume = deque(maxlen=1).extend

to

consume = deque(maxlen=3).extend  # or a value > 3

then I consistently get a reversed order:

 11.57 ± 0.01 ms  pairwise_
 12.05 ± 0.02 ms  zip_

jamestwebber · September 22, 2023, 3:11am

There’s also a comment right above that, suggesting that it could re-use the tuple. That’s what zip_next is doing, although the comment references enumobject.c. Seems likely that’s the culprit?

pochmann · September 22, 2023, 10:50am

No, that tuple reuse optimization isn’t it. I mentioned that in my second post here, where I used maxlen=0 to show the effect of that optimization. My real benchmark uses maxlen=1 to avoid that optimization (the deque then keeps a reference to the latest tuple, so zip can’t reuse it).

pochmann · September 22, 2023, 11:07am

Interesting. For me, zip remains significantly faster, but less so. With maxlen=7, zip is still slightly faster, and with maxlen≥8, pairwise becomes slightly faster (but even with maxlen=50 it’s only slightly faster)