And by pairwise I mean nwise… my plan is to change my implementation of nwise to work in terms of tee. Then, benchmark it to see that I see at least equivalent performance
The reason the current C implementation pairwise
is slower than the python recipe seems to be the usage of Py_TuplePack
(as was already hinted at by @Stefan2 in the linked discussion on pairwise and zip).
Replacing Py_TuplePack
(which uses varargs) with a direct Py_TuplePack2
makes the pairwise
C implementation as fast as the pure python recipe. Branch: Comparing python:main...eendebakpt:putuple_pack2 · python/cpython · GitHub
The Py_TuplePack
with arguments 1 and 2 is used several times in the Python codebase, so perhaps we can add Py_TuplePack1
and Py_TuplePack2
to the python API. I will create an issue to discuss that a bit later.
Update: issue is at Add Py_TuplePack2 and Py_TuplePack1 · Issue #118222 · python/cpython · GitHub
I still think that implementing my solution for nwise in terms of _tee may be worth a try. I do want to make it perform as well as it can and make it more maintainable.
Instead of adding nwise
another option would be to add a second argument n
to pairwise
for the number of items to return (the default being n=2
)
well, then it’s not really pairwise. it’s nwise, also in the current c implementation that’s not possible due to how he only holds the previous value in the struct. You would need a way to hold more values. Which would be entirely changing the pairwise implementation.
I actually can agree with the sentiment that pairwise is a silly function as
nwise(iter, n=2) → is pairwise. I would rather have the ability to set the window or it really limits the use of a pairwise/nwise iterator
what do you guys think of something like this
static PyObject *
nwise_new_impl(PyTypeObject *type, PyObject *iterable, Py_ssize_t n) {
PyObject *it;
nwiseobject *no;
// Get iterator from the input iterable
it = PyObject_GetIter(iterable);
if (it == NULL) {
return NULL;
}
// Allocate memory for nwiseobject
no = (nwiseobject *)type->tp_alloc(type, 0);
if (no == NULL) {
Py_DECREF(it);
return NULL;
}
// Create n independent iterators using tee
PyObject *tee_args = Py_BuildValue("(Oi)", it, n);
if (tee_args == NULL) {
Py_DECREF(it);
Py_DECREF(no);
return NULL;
}
PyObject *tees = itertools_tee_impl(NULL, tee_args);
Py_DECREF(tee_args);
if (tees == NULL) {
Py_DECREF(it);
Py_DECREF(no);
return NULL;
}
// Initialize nwiseobject struct
no->it = it;
no->n = n;
no->tees = tees;
no->current_index = 0;
return (PyObject *)no;
}
I could use tee to give me the iterators… then I can just build a tuple out of the iterator quickly probably even reuse the memory, and it’s essentially wrapping tee, but making it convenient.
Just some nwise
implementations/benchmarks…
10 windows of length 1000:
iterable = list(range(1009))
n = 1000
consume = deque(maxlen=1).extend
0.07 ± 0.00 ms consume(nwise_tuplidoo(iterable, n))
0.07 ± 0.00 ms consume(nwise_deque(iterable, n))
0.33 ± 0.00 ms consume(nwise_staggered(iterable, n))
4.01 ± 0.05 ms consume(nwise_yield_from(iterable, n))
4.10 ± 0.15 ms consume(nwise_zip(iterable, n))
4.11 ± 0.21 ms consume(nwise_original(iterable, n))
Python: 3.12.0 (main, Oct 7 2023, 10:42:35) [GCC 13.2.1 20230801]
100 windows of length 100:
iterable = list(range(199))
n = 100
consume = deque(maxlen=1).extend
0.09 ± 0.00 ms consume(nwise_tuplidoo(iterable, n))
0.11 ± 0.00 ms consume(nwise_deque(iterable, n))
0.12 ± 0.00 ms consume(nwise_staggered(iterable, n))
0.16 ± 0.00 ms consume(nwise_zip(iterable, n))
0.16 ± 0.00 ms consume(nwise_yield_from(iterable, n))
0.17 ± 0.00 ms consume(nwise_original(iterable, n))
1000 windows of length 10:
iterable = list(range(1009))
n = 10
consume = deque(maxlen=1).extend
0.11 ± 0.00 ms consume(nwise_zip(iterable, n))
0.12 ± 0.00 ms consume(nwise_staggered(iterable, n))
0.15 ± 0.00 ms consume(nwise_yield_from(iterable, n))
0.15 ± 0.00 ms consume(nwise_original(iterable, n))
0.27 ± 0.00 ms consume(nwise_deque(iterable, n))
0.29 ± 0.00 ms consume(nwise_tuplidoo(iterable, n))
5000 windows of length 2:
iterable = list(range(5001))
n = 2
consume = deque(maxlen=1).extend
0.25 ± 0.00 ms consume(nwise_zip(iterable, n))
0.25 ± 0.00 ms consume(nwise_staggered(iterable, n))
0.44 ± 0.00 ms consume(nwise_original(iterable, n))
0.44 ± 0.01 ms consume(nwise_yield_from(iterable, n))
1.03 ± 0.01 ms consume(nwise_deque(iterable, n))
1.23 ± 0.03 ms consume(nwise_tuplidoo(iterable, n))
Benchmark script
from timeit import repeat
setup = '''
from itertools import pairwise, tee, islice
from collections import deque
import collections
def nwise_original(lst, n):
iters = tee(lst, n)
for idx, itr in enumerate(iters):
next(islice(itr, idx, idx), None)
for group in zip(*iters):
yield group
def nwise_yield_from(lst, n):
iters = tee(lst, n)
for idx, itr in enumerate(iters):
next(islice(itr, idx, idx), None)
yield from zip(*iters)
def nwise_zip(lst, n):
iters = tee(lst, n)
for idx, itr in enumerate(iters):
next(islice(itr, idx, idx), None)
return zip(*iters)
def nwise_staggered(lst, n):
iters = [None] * n
it = iter(lst)
for i in range(n):
it, iters[i] = tee(it)
next(it, None)
return zip(*iters)
def nwise_deque(iterable, n):
it = iter(iterable)
window = collections.deque(islice(it, n-1), maxlen=n)
for x in it:
window.append(x)
yield tuple(window)
def nwise_tuplidoo(lst, n):
it = iter(lst)
window = None, *islice(it, n-1)
for z in zip(it):
window = window[1:] + z
yield window
'''
case = '''\
iterable = list(range(5001))
n = 2
consume = deque(maxlen=1).extend
'''
setup += case
names = 'nwise_original nwise_yield_from nwise_zip nwise_staggered nwise_deque nwise_tuplidoo'.split()
exec(setup)
for name in names:
func = globals()[name]
print(*func(range(6), 3))
codes = [
f'consume({name}(iterable, n))'
for name in names
]
from timeit import timeit
from statistics import mean, stdev
import sys
import random
print(case)
times = {code: [] for code in codes}
def stats(code):
ts = [t * 1e3 for t in sorted(times[code])[:5]]
return f'{mean(ts):5.2f} ± {stdev(ts):4.2f} ms '
for _ in range(25):
random.shuffle(codes)
for code in codes:
number = 10**1
t = timeit(code, setup, number=number) / number
times[code].append(t)
for code in sorted(codes, key=stats):
print(stats(code), code)
print('\nPython:', sys.version)
That gives a really good goal for my implementation. I will see what I can come up with.
Btw I’d like n=0 to be supported as it once was, before @rhettinger disabled it for claimed “bogus results” that I still don’t believe and which wouldn’t even have been the fault of the recipe but a bug in deque
. (I think he must’ve made some mistake, tested some code different from the actual recipe).
deal! if I understand that should and empty iterator right?
ie
lst = [1, 2, 3, 4]
list(nwise(lst, 0)) # -> []
I’d focus on the API and use cases before being concerned with implementation and performance.
The use cases are for sliding_window problems. It’s a single function. I think it should take nwise(iterable, n). There are 100% times to use nwise.
No, I mean five empty tuples in your example. Like it did previously, and like I discussed back there. So that the general rule “nwise(iterable, n) yields len(iterable)-n+1 tuples” doesn’t break down for n=0. You might think that’s useless because the empty tuples have “no content”, but sometimes I just need the number of things, and then I’d want that general rule followed. Just like
import re
print('1234'.count(''))
print(len(re.findall('', '1234')))
also both print 5
.
That first SO question is rather an argument against this. It’s about the algorithm technique, where the whole point of “sliding” is to avoid the linear time cost of handling each window from scratch, by only updating your state with what the edges of the window have. See the top answer for a good exposition. So this nwise
or sliding_window
function would already by itself violate that goal, as it provides full window tuples which cost linear time to provide. And it encourages using them like that, i.e., encourages inefficient algorithms, and multiple times I’ve seen suggestions rejected with the argument that they encourage inefficient code.
The newest (I didn’t check further) of those 600+ questions is also an example of that. It does not look at full windows but only updates the edges. And not even at the same speed - it considers windows of different sizes. Even further away from nwise
/ sliding_window
.
LeetCode already has more-itertools installed. You just need to import it. Advent of Code, as far as I know, doesn’t have an editor and doesn’t even want your code at all, only your answer that you got by running your code somewhere else. And except for trivial problems, the O(Nn) runtime (where N is the iterable length and n is the window size) would likely be too inefficient for problems on such sites. They usually require the O(N) runtime of a proper sliding window algorithm.
While this is true right now, it’s not once there’s a JIT
Even then it should always be possible, but it might require quite a bit more effort. The JIT can’t exactly perform magic.
It can optimise a hot loop based on the types and functions within it, differently from other loops using the same windowing function. Maybe it won’t right now, but theoretically a JIT can outperform a monolithic C algorithm.
- That isn’t relevant for the example(s) at hand as far as I can tell.
- I didn’t say “monolithic”. Nothing prevents the C code from having dozens of different implementations for different data types. But this is ofcourse unmanageable in practice.
It would still need to switch between them, and wouldn’t have visibility into the code in the loop.
I think it is a little relevant, because moving to C for miniscule gains now may cost performance gains in future.
Right the gains would not really be the point. We have itertools batches. Which takes two lines to implement. I chose the name nwise because of pairwise. It’s a sliding window, but it’s all the same thing. I am working on changing my implementation.