Nwise itertools

lsproule · April 23, 2024, 10:14pm

And by pairwise I mean nwise… my plan is to change my implementation of nwise to work in terms of tee. Then, benchmark it to see that I see at least equivalent performance

eendebakpt · April 23, 2024, 10:32pm

The reason the current C implementation pairwise is slower than the python recipe seems to be the usage of Py_TuplePack (as was already hinted at by @Stefan2 in the linked discussion on pairwise and zip).

Replacing Py_TuplePack (which uses varargs) with a direct Py_TuplePack2 makes the pairwise C implementation as fast as the pure python recipe. Branch: Comparing python:main...eendebakpt:putuple_pack2 · python/cpython · GitHub

The Py_TuplePack with arguments 1 and 2 is used several times in the Python codebase, so perhaps we can add Py_TuplePack1 and Py_TuplePack2 to the python API. I will create an issue to discuss that a bit later.

Update: issue is at Add Py_TuplePack2 and Py_TuplePack1 · Issue #118222 · python/cpython · GitHub

lsproule · April 23, 2024, 10:36pm

I still think that implementing my solution for nwise in terms of _tee may be worth a try. I do want to make it perform as well as it can and make it more maintainable.

eendebakpt · April 23, 2024, 10:38pm

Instead of adding nwise another option would be to add a second argument n to pairwise for the number of items to return (the default being n=2)

lsproule · April 23, 2024, 10:39pm

well, then it’s not really pairwise. it’s nwise, also in the current c implementation that’s not possible due to how he only holds the previous value in the struct. You would need a way to hold more values. Which would be entirely changing the pairwise implementation.

I actually can agree with the sentiment that pairwise is a silly function as
nwise(iter, n=2) → is pairwise. I would rather have the ability to set the window or it really limits the use of a pairwise/nwise iterator

lsproule · April 23, 2024, 11:05pm

what do you guys think of something like this

static PyObject *
nwise_new_impl(PyTypeObject *type, PyObject *iterable, Py_ssize_t n) {
    PyObject *it;
    nwiseobject *no;

    // Get iterator from the input iterable
    it = PyObject_GetIter(iterable);
    if (it == NULL) {
        return NULL;
    }

    // Allocate memory for nwiseobject
    no = (nwiseobject *)type->tp_alloc(type, 0);
    if (no == NULL) {
        Py_DECREF(it);
        return NULL;
    }

    // Create n independent iterators using tee
    PyObject *tee_args = Py_BuildValue("(Oi)", it, n);
    if (tee_args == NULL) {
        Py_DECREF(it);
        Py_DECREF(no);
        return NULL;
    }
    PyObject *tees = itertools_tee_impl(NULL, tee_args);
    Py_DECREF(tee_args);
    if (tees == NULL) {
        Py_DECREF(it);
        Py_DECREF(no);
        return NULL;
    }

    // Initialize nwiseobject struct
    no->it = it;
    no->n = n;
    no->tees = tees;
    no->current_index = 0;

    return (PyObject *)no;
}

I could use tee to give me the iterators… then I can just build a tuple out of the iterator quickly probably even reuse the memory, and it’s essentially wrapping tee, but making it convenient.

Stefan2 · April 23, 2024, 11:32pm

Just some nwise implementations/benchmarks…

10 windows of length 1000:

iterable = list(range(1009))
n = 1000
consume = deque(maxlen=1).extend

 0.07 ± 0.00 ms  consume(nwise_tuplidoo(iterable, n))
 0.07 ± 0.00 ms  consume(nwise_deque(iterable, n))
 0.33 ± 0.00 ms  consume(nwise_staggered(iterable, n))
 4.01 ± 0.05 ms  consume(nwise_yield_from(iterable, n))
 4.10 ± 0.15 ms  consume(nwise_zip(iterable, n))
 4.11 ± 0.21 ms  consume(nwise_original(iterable, n))

Python: 3.12.0 (main, Oct  7 2023, 10:42:35) [GCC 13.2.1 20230801]

100 windows of length 100:

iterable = list(range(199))
n = 100
consume = deque(maxlen=1).extend

 0.09 ± 0.00 ms  consume(nwise_tuplidoo(iterable, n))
 0.11 ± 0.00 ms  consume(nwise_deque(iterable, n))
 0.12 ± 0.00 ms  consume(nwise_staggered(iterable, n))
 0.16 ± 0.00 ms  consume(nwise_zip(iterable, n))
 0.16 ± 0.00 ms  consume(nwise_yield_from(iterable, n))
 0.17 ± 0.00 ms  consume(nwise_original(iterable, n))

1000 windows of length 10:

iterable = list(range(1009))
n = 10
consume = deque(maxlen=1).extend

 0.11 ± 0.00 ms  consume(nwise_zip(iterable, n))
 0.12 ± 0.00 ms  consume(nwise_staggered(iterable, n))
 0.15 ± 0.00 ms  consume(nwise_yield_from(iterable, n))
 0.15 ± 0.00 ms  consume(nwise_original(iterable, n))
 0.27 ± 0.00 ms  consume(nwise_deque(iterable, n))
 0.29 ± 0.00 ms  consume(nwise_tuplidoo(iterable, n))

5000 windows of length 2:

iterable = list(range(5001))
n = 2
consume = deque(maxlen=1).extend

 0.25 ± 0.00 ms  consume(nwise_zip(iterable, n))
 0.25 ± 0.00 ms  consume(nwise_staggered(iterable, n))
 0.44 ± 0.00 ms  consume(nwise_original(iterable, n))
 0.44 ± 0.01 ms  consume(nwise_yield_from(iterable, n))
 1.03 ± 0.01 ms  consume(nwise_deque(iterable, n))
 1.23 ± 0.03 ms  consume(nwise_tuplidoo(iterable, n))

Benchmark script

from timeit import repeat

setup = '''
from itertools import pairwise, tee, islice
from collections import deque
import collections

def nwise_original(lst, n):
    iters = tee(lst, n)
    for idx, itr in enumerate(iters):
        next(islice(itr, idx, idx), None)
    for group in zip(*iters):
        yield group

def nwise_yield_from(lst, n):
    iters = tee(lst, n)
    for idx, itr in enumerate(iters):
        next(islice(itr, idx, idx), None)
    yield from zip(*iters)

def nwise_zip(lst, n):
    iters = tee(lst, n)
    for idx, itr in enumerate(iters):
        next(islice(itr, idx, idx), None)
    return zip(*iters)

def nwise_staggered(lst, n):
    iters = [None] * n
    it = iter(lst)
    for i in range(n):
        it, iters[i] = tee(it)
        next(it, None)
    return zip(*iters)

def nwise_deque(iterable, n):
    it = iter(iterable)
    window = collections.deque(islice(it, n-1), maxlen=n)
    for x in it:
        window.append(x)
        yield tuple(window)

def nwise_tuplidoo(lst, n):
    it = iter(lst)
    window = None, *islice(it, n-1)
    for z in zip(it):
        window = window[1:] + z
        yield window
'''

case = '''\
iterable = list(range(5001))
n = 2
consume = deque(maxlen=1).extend
'''

setup += case

names = 'nwise_original nwise_yield_from nwise_zip nwise_staggered nwise_deque nwise_tuplidoo'.split()

exec(setup)
for name in names:
    func = globals()[name]
    print(*func(range(6), 3))

codes = [
    f'consume({name}(iterable, n))'
    for name in names
]

from timeit import timeit
from statistics import mean, stdev
import sys
import random

print(case)

times = {code: [] for code in codes}
def stats(code):
    ts = [t * 1e3 for t in sorted(times[code])[:5]]
    return f'{mean(ts):5.2f} ± {stdev(ts):4.2f} ms '
for _ in range(25):
    random.shuffle(codes)
    for code in codes:
        number = 10**1
        t = timeit(code, setup, number=number) / number
        times[code].append(t)
for code in sorted(codes, key=stats):
    print(stats(code), code)

print('\nPython:', sys.version)

Attempt This Online!

lsproule · April 23, 2024, 11:42pm

That gives a really good goal for my implementation. I will see what I can come up with.

Stefan2 · April 23, 2024, 11:54pm

Btw I’d like n=0 to be supported as it once was, before @rhettinger disabled it for claimed “bogus results” that I still don’t believe and which wouldn’t even have been the fault of the recipe but a bug in deque. (I think he must’ve made some mistake, tested some code different from the actual recipe).

lsproule · April 23, 2024, 11:56pm

deal! if I understand that should and empty iterator right?

ie

lst = [1, 2, 3, 4]
list(nwise(lst, 0))  # -> []

ericvsmith · April 24, 2024, 12:21am

I’d focus on the API and use cases before being concerned with implementation and performance.

lsproule · April 24, 2024, 12:23am

The use cases are for sliding_window problems. It’s a single function. I think it should take nwise(iterable, n). There are 100% times to use nwise.

Stefan2 · April 24, 2024, 12:34am

No, I mean five empty tuples in your example. Like it did previously, and like I discussed back there. So that the general rule “nwise(iterable, n) yields len(iterable)-n+1 tuples” doesn’t break down for n=0. You might think that’s useless because the empty tuples have “no content”, but sometimes I just need the number of things, and then I’d want that general rule followed. Just like

import re

print('1234'.count(''))
print(len(re.findall('', '1234')))

also both print 5.

Attempt This Online!

Stefan2 · April 27, 2024, 10:57am

That first SO question is rather an argument against this. It’s about the algorithm technique, where the whole point of “sliding” is to avoid the linear time cost of handling each window from scratch, by only updating your state with what the edges of the window have. See the top answer for a good exposition. So this nwise or sliding_window function would already by itself violate that goal, as it provides full window tuples which cost linear time to provide. And it encourages using them like that, i.e., encourages inefficient algorithms, and multiple times I’ve seen suggestions rejected with the argument that they encourage inefficient code.

The newest (I didn’t check further) of those 600+ questions is also an example of that. It does not look at full windows but only updates the edges. And not even at the same speed - it considers windows of different sizes. Even further away from nwise / sliding_window.

LeetCode already has more-itertools installed. You just need to import it. Advent of Code, as far as I know, doesn’t have an editor and doesn’t even want your code at all, only your answer that you got by running your code somewhere else. And except for trivial problems, the O(Nn) runtime (where N is the iterable length and n is the window size) would likely be too inefficient for problems on such sites. They usually require the O(N) runtime of a proper sliding window algorithm.

alicederyn · April 27, 2024, 2:38pm

While this is true right now, it’s not once there’s a JIT

MegaIng · April 27, 2024, 2:39pm

Even then it should always be possible, but it might require quite a bit more effort. The JIT can’t exactly perform magic.

alicederyn · April 27, 2024, 2:41pm

It can optimise a hot loop based on the types and functions within it, differently from other loops using the same windowing function. Maybe it won’t right now, but theoretically a JIT can outperform a monolithic C algorithm.

MegaIng · April 27, 2024, 2:44pm

That isn’t relevant for the example(s) at hand as far as I can tell.
I didn’t say “monolithic”. Nothing prevents the C code from having dozens of different implementations for different data types. But this is ofcourse unmanageable in practice.

alicederyn · April 27, 2024, 2:45pm

It would still need to switch between them, and wouldn’t have visibility into the code in the loop.

I think it is a little relevant, because moving to C for miniscule gains now may cost performance gains in future.

lsproule · April 27, 2024, 4:15pm

Right the gains would not really be the point. We have itertools batches. Which takes two lines to implement. I chose the name nwise because of pairwise. It’s a sliding window, but it’s all the same thing. I am working on changing my implementation.