Should itertools.starmap be complemented with itertools.starfilter and functools.starreduce?

MarkRotchell · August 6, 2025, 1:16pm

Given that itertools.starmap exists in the standard library, should iterools.starfilter and functools.starreduce also exist?

I appreciate that not every 5-line function should be implemented in the standard library, which is why the docs for itertools has a large collection of “recipes” at the bottom, but having a “star” version of map but not its functional-programming partners filter and reduce feels inconsistent.

I’m a big fan of starmap because is allows expressive variable names in multiple-argument lambdas, rather than having to index a tuple. Both starfilter and starreduce would bring this same benefit.

starfilter

starfilter would essentially be equivalent to

def starfilter(function, iterable):
    for args in iterable:
        if function(*args):
            yield args

so that instead of:

filter(lambda x: x[0] * x[1] > 5, [(1,0), (5,6), (8,9)])

you could write:

starfilter(lambda x, y: x * y > 5, [(1,0), (5,6), (8,9)])

both resulting in an iterator yielding these tuples(5, 6), (8, 9)

starreduce

starreduce would essentially be equivalent to

initial_missing = object()

def starreduce(function, iterable, initial=initial_missing, /):
    it = iter(iterable)
    if initial is initial_missing:
        value = next(it)
    else:
        value = initial
    for element in it:
        value = function(value, *element)
    return value

so that instead of:

reduce(lambda acc, x: acc * x[0] + x[1], [(1,0), (5,6), (8,9)], 9)

you could write:

starreduce(lambda acc, x, y: acc * x + y, [(1,0), (5,6), (8,9)], 9)

both with a result of 417.

Or just as an additional “recipe” in the docs?

Failing support for their inclusion in the standard library - do they deserve to be included as “recipes” in the docs?

MegaIng · August 6, 2025, 1:33pm

Considering that more-itertools has neither of these, they don’t yet appear to be that commonly requested. Maybe add them there first and then see if they are being used?

storchaka · August 6, 2025, 1:38pm

starfilter(func, iterable) is equal to filter(lambda args: func(*args), iterable) or simply (args for args in iterable if func(*args)).

starreduce(func, iterable) is equal to reduce(lambda x, args: func(x, *args), iterable). Anyway, an explicit loop is more preferable than reduce(), so you better use it instead of starreduce() too.

BTW, if you use map() or filter() with a lambda – you do this incorrecly. Generator expressions are clearer, and map() or filter() with a lambda do not even have performance advantage. Instead of

filter(lambda x: x[0] * x[1] > 5, iterable)

use

((x, y) for x, y in iterable if x * y > 5)

Stefan2 · August 6, 2025, 4:38pm

Old starfilter proposal for more-itertools:

github.com/more-itertools/more-itertools

Itertool proposal: `starfilter` anolog to `starmap`

opened 08:43PM - 25 Feb 23 UTC

closed 08:51PM - 31 Aug 23 UTC

DamianB-BitFlipper

deferred

### Description I have found myself using `starmap` more and more often wheneve…r it makes sense. It mainly makes the code cleaner where if I am iterating over a list of tuples, the lambda function can simply have the values of the inner elements directly rather than relying on some cryptic indexing to get the values I desire. It makes the code easier to write and less error prone. Well, recently I found myself looking for a `starfilter`, the analog to `starmap` but for filter. I did not find it anywhere. I want to open this issue with the intention of discussing introducing this function to `more-itertools`. I would be more than happy to write the PR for this. ### References Real problem: I was iterating over a class that implements a grid -- imagine a 2D array. This is a custom class I wrote for a different problem would return an iterable that yields a tuple, the coordinate location and value there: `(coord, pixel)`. The coordinate is in turn a `dataclass` with a `row` and `column` value, but that's not important. So I wished to filter this iterable over the pixel value, which resulted in this mildly ugly code: `filter(lambda val: val[1] is not None, grid)`. ### Examples More cleanly, the code could look like: `starfilter(lambda _, pixel: pixel is not None, grid)`

Stefan2 · August 6, 2025, 5:36pm

Serhiy Storchaka:

BTW, if you use map() or filter() with a lambda – you do this incorrecly. Generator expressions are clearer, and map() or filter() with a lambda do not even have performance advantage. Instead of
filter(lambda x: x[0] * x[1] > 5, iterable)
use
((x, y) for x, y in iterable if x * y > 5)

Looks like you’re wrong, your generator is more than twice as slow in my benchmark:

  9.9 ± 0.1 ms  list(filter(lambda x: x[0] * x[1] > 5, iterable))
 21.2 ± 0.4 ms  list(((x, y) for x, y in iterable if x * y > 5))
  7.3 ± 0.1 ms  list((x for x in iterable if x[0] * x[1] > 5))
  7.2 ± 0.2 ms  list((xy for xy in iterable for x, y in [xy] if x * y > 5))

Python: 3.13.0 (main, Nov  9 2024, 10:04:25) [GCC 14.2.1 20240910]

That’s with iterable = [(2, 3)] * 10**5.

benchmark script

from timeit import timeit
from statistics import mean, stdev
import sys
import random

funcs = [
    'filter(lambda x: x[0] * x[1] > 5, iterable)',
    '((x, y) for x, y in iterable if x * y > 5)',
    '(x for x in iterable if x[0] * x[1] > 5)',
    '(xy for xy in iterable for x, y in [xy] if x * y > 5)',
]

funcs = [f'list({f})' for f in funcs]

setup = '''
from collections import deque
from __main__ import f
iterable = [(2, 3)] * 10**5
gc.enable()
'''

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e3 for t in sorted(times[f])[:5]]
    return f'{mean(ts):5.1f} ± {stdev(ts):3.1f} ms '
for _ in range(100):
    for f in random.sample(funcs, len(funcs)):
        t = timeit(f, setup, number=1) / 1
        times[f].append(t)
for f in funcs:
    print(stats(f), f)

print('\nPython:', sys.version)

Attempt This Online!

sr-murthy · August 6, 2025, 7:12pm

I have some code that filters on iterables with lambdas, and when I compared it with the equivalent code with generator expressions, there were performance differences which seemed to depend on the iterable type, although I don’t know exactly why: for ranges, for example, it was faster, while for iterables returned by custom object methods, there was hardly any difference.

This is an example for finding all positive integers coprime to n=100, but counting backwards to 1:

In [147]: timeit.Timer(stmt='list(m for m in range(100, 0, -1) if math.gcd(m, 100) == 1)', setup='import math', globals=globals()).timeit(number=1000000)
Out[147]: 4.826646083034575

In [148]: timeit.Timer(stmt='list(filter(lambda m: math.gcd(m, 100) == 1, range(100, 0, -1)))', setup='import math', globals=globals()).timeit(number=1000000)
Out[148]: 7.005942624993622

The generator version is considerably faster, and I’ll probably use this just because it is faster. And it’s probably more readable as well. But I am not sure I would agree that it is “incorrect”: if so, in what way?

storchaka · August 6, 2025, 9:32pm

Interesting. I only compared with single-variable iteration.

In your examples, the slow case creates 10**5 new tuples, while the fast cases just make 10**5 references to the same tuple. Creation and destruction of 10**5 tuples takes a time. If the iterable produces new tuples (for example, zip(), enumerate() or dict.items()), the difference should be less, or even change the sign.

The lambda version creates a frame for each item. The generator version uses the same frame. yield adds some overhead, but it is smaller.

Because those who write like that are missing out on a better option.