Should itertools.starmap be complemented with itertools.starfilter and functools.starreduce?

Given that itertools.starmap exists in the standard library, should iterools.starfilter and functools.starreduce also exist?

I appreciate that not every 5-line function should be implemented in the standard library, which is why the docs for itertools has a large collection of “recipes” at the bottom, but having a “star” version of map but not its functional-programming partners filter and reduce feels inconsistent.

I’m a big fan of starmap because is allows expressive variable names in multiple-argument lambdas, rather than having to index a tuple. Both starfilter and starreduce would bring this same benefit.

starfilter

starfilter would essentially be equivalent to

def starfilter(function, iterable):
    for args in iterable:
        if function(*args):
            yield args

so that instead of:

filter(lambda x: x[0] * x[1] > 5, [(1,0), (5,6), (8,9)])

you could write:

starfilter(lambda x, y: x * y > 5, [(1,0), (5,6), (8,9)])

both resulting in an iterator yielding these tuples(5, 6), (8, 9)

starreduce

starreduce would essentially be equivalent to

initial_missing = object()

def starreduce(function, iterable, initial=initial_missing, /):
    it = iter(iterable)
    if initial is initial_missing:
        value = next(it)
    else:
        value = initial
    for element in it:
        value = function(value, *element)
    return value

so that instead of:

reduce(lambda acc, x: acc * x[0] + x[1], [(1,0), (5,6), (8,9)], 9)

you could write:

starreduce(lambda acc, x, y: acc * x + y, [(1,0), (5,6), (8,9)], 9)

both with a result of 417.

Or just as an additional “recipe” in the docs?

Failing support for their inclusion in the standard library - do they deserve to be included as “recipes” in the docs?

Considering that more-itertools has neither of these, they don’t yet appear to be that commonly requested. Maybe add them there first and then see if they are being used?

1 Like

starfilter(func, iterable) is equal to filter(lambda args: func(*args), iterable) or simply (args for args in iterable if func(*args)).

starreduce(func, iterable) is equal to reduce(lambda x, args: func(x, *args), iterable). Anyway, an explicit loop is more preferable than reduce(), so you better use it instead of starreduce() too.

BTW, if you use map() or filter() with a lambda – you do this incorrecly. Generator expressions are clearer, and map() or filter() with a lambda do not even have performance advantage. Instead of

filter(lambda x: x[0] * x[1] > 5, iterable)

use

((x, y) for x, y in iterable if x * y > 5)
2 Likes

Old starfilter proposal for more-itertools:

1 Like

Looks like you’re wrong, your generator is more than twice as slow in my benchmark:

  9.9 ± 0.1 ms  list(filter(lambda x: x[0] * x[1] > 5, iterable))
 21.2 ± 0.4 ms  list(((x, y) for x, y in iterable if x * y > 5))
  7.3 ± 0.1 ms  list((x for x in iterable if x[0] * x[1] > 5))
  7.2 ± 0.2 ms  list((xy for xy in iterable for x, y in [xy] if x * y > 5))

Python: 3.13.0 (main, Nov  9 2024, 10:04:25) [GCC 14.2.1 20240910]

That’s with iterable = [(2, 3)] * 10**5.

benchmark script
from timeit import timeit
from statistics import mean, stdev
import sys
import random

funcs = [
    'filter(lambda x: x[0] * x[1] > 5, iterable)',
    '((x, y) for x, y in iterable if x * y > 5)',
    '(x for x in iterable if x[0] * x[1] > 5)',
    '(xy for xy in iterable for x, y in [xy] if x * y > 5)',
]

funcs = [f'list({f})' for f in funcs]

setup = '''
from collections import deque
from __main__ import f
iterable = [(2, 3)] * 10**5
gc.enable()
'''

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e3 for t in sorted(times[f])[:5]]
    return f'{mean(ts):5.1f} ± {stdev(ts):3.1f} ms '
for _ in range(100):
    for f in random.sample(funcs, len(funcs)):
        t = timeit(f, setup, number=1) / 1
        times[f].append(t)
for f in funcs:
    print(stats(f), f)

print('\nPython:', sys.version)

Attempt This Online!

I have some code that filters on iterables with lambdas, and when I compared it with the equivalent code with generator expressions, there were performance differences which seemed to depend on the iterable type, although I don’t know exactly why: for ranges, for example, it was faster, while for iterables returned by custom object methods, there was hardly any difference.

This is an example for finding all positive integers coprime to n=100, but counting backwards to 1:

In [147]: timeit.Timer(stmt='list(m for m in range(100, 0, -1) if math.gcd(m, 100) == 1)', setup='import math', globals=globals()).timeit(number=1000000)
Out[147]: 4.826646083034575

In [148]: timeit.Timer(stmt='list(filter(lambda m: math.gcd(m, 100) == 1, range(100, 0, -1)))', setup='import math', globals=globals()).timeit(number=1000000)
Out[148]: 7.005942624993622

The generator version is considerably faster, and I’ll probably use this just because it is faster. And it’s probably more readable as well. But I am not sure I would agree that it is “incorrect”: if so, in what way?

Interesting. I only compared with single-variable iteration.

In your examples, the slow case creates 10**5 new tuples, while the fast cases just make 10**5 references to the same tuple. Creation and destruction of 10**5 tuples takes a time. If the iterable produces new tuples (for example, zip(), enumerate() or dict.items()), the difference should be less, or even change the sign.

The lambda version creates a frame for each item. The generator version uses the same frame. yield adds some overhead, but it is smaller.

Because those who write like that are missing out on a better option.

2 Likes