Sometimes (example) I use islice
to get a tail part of an iterator/iterable:
it = islice(iterable, start, None)
But for performance, I prefer getting a direct iterator and using islice
(directly or via consume
) only to advance it:
it = iter(iterable)
consume(it, start)
This way I don’t have all tail values going through an islice
iterator, slowing down the iteration.
Benchmark of consuming repeat(None, 10**7)
, using the above two ways to start at index 100:
80.1 ms ± 0.2 ms islice_iterator
38.6 ms ± 0.2 ms direct_iterator
Python version:
3.10.8 (main, Oct 11 2022, 11:35:05) [GCC 11.3.0]
Benchmark script
from itertools import islice, repeat
from timeit import timeit
import collections
from statistics import mean, stdev
import sys
def islice_iterator(iterable, start):
return islice(iterable, start, None)
def direct_iterator(iterable, start):
it = iter(iterable)
consume(it, start)
return it
funcs = islice_iterator, direct_iterator
# Existing recipe
def consume(iterator, n=None):
if n is None:
collections.deque(iterator, maxlen=0)
else:
next(islice(iterator, n, n), None)
# Test arguments
def args():
return repeat(None, 10**7), 100
# Correctness
expect = list(funcs[0](*args()))
for f in funcs[1:]:
print(list(f(*args())) == expect, f.__name__)
# For timing
times = {f: [] for f in funcs}
def stats(f):
ts = [t*1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):6.1f} ms ± {stdev(ts):3.1f} ms '
# Timing
number = 1
for _ in range(25):
for f in funcs:
t = timeit(lambda: consume(f(*args())), number=number) / number
times[f].append(t)
# sys.stdout = open('out.txt', 'w')
# Timing results
for f in sorted(funcs, key=stats, reverse=True):
print(stats(f), f.__name__)
print('\nPython version:')
print(sys.version)
Could islice
perhaps do this by default? I.e., when given only start
, it would advance the iterator (that it creates anyway) and then return that iterator instead of an islice
iterator? The two downsides I see:
- Would need some code to do this.
- Checking whether it is a
start
-only case takes a tiny bit of time.
I acknowledge that my use case is not the most “real world” one: I benchmark things. A lot. And I try to minimize the overhead of the benchmarking itself, in order to more clearly see speed differences of the things I’m actually benchmarking. Optimizations like the above are part of this effort to minimize the overhead. And yes, I can of course keep doing the optimization myself. But I imagine it could also be beneficial for others.