Itertools.ilen(iterable)

Meh, that doesn’t work for unhashable elements and seems rather slow:

 2.2 ns/element  deque_extend(repeat(0, n))
13.0 ns/element  set_isdisjoint(repeat(0, n))
 6.5 ns/element  for _ in repeat(0, n): pass
 6.5 ns/element  all(_ for _ in repeat(0, n) if False)
12.0 ns/element  all(zip(repeat(0, n)))

What did you measure where it was fast? Please share the code and results.

Script
from timeit import repeat 

setup = '''
from collections import deque
from itertools import repeat
deque_extend = deque(maxlen=0).extend
set_isdisjoint = set().isdisjoint
n = 10**6
'''

for code in [
    'deque_extend(repeat(0, n))',
    'set_isdisjoint(repeat(0, n))',
    'for _ in repeat(0, n): pass',
    'all(_ for _ in repeat(0, n) if False)',
    'all(zip(repeat(0, n)))',
] * 3:
    t = min(repeat(code, setup, number=1, repeat=25)) * 10**3
    print(f'{t:4.1f} ns/element ', code)

Attempt This Online!

I did measure:

a = range(1, 100_000)
%timeit coll.deque(iter(a), maxlen=0)               # 1.4 ms
%timeit set().isdisjoint(iter(a))           # 2.0 ms

But unhashable items makes it unfit for the problem.

1 Like

Also risks becoming broken if someone gets the idea to optimize set().isdisjoint to return True without iterating.

3 Likes

Maybe this is the thing that merits inclusion in the standard library? Yes, we can all follow a recipe like

import collections
consumer = collections.deque(maxlen=0)
consume = consumer.extend

but it’s really not the most obvious thing in the world. Plus, code that wants to take advantage of this needs to have a clear place to put it, and it’s not at all difficult to mess things up and lose the desired performance by re-creating the deque. (Not to mention all the toy example code you’ll find out there that doesn’t optimize by grabbing the extend method - even in the more_itertools documentation, as already discovered. I reject the hypothesis that they “don’t prioritize speed oversimplicity” because the example is extensively documented and commented to explain that they’ve explicitly selected approaches that iterate at C speed and don’t collect the skipped-over items.)

I know, “not every three-line recipe needs to be in the standard library”. But in my mind, this particular example ticks all the boxes that e.g. random.choice does - maybe more.

3 Likes

Well I’ve been following / participating in more-itertools development for a while and have seen speed improvements rejected for that reason.

Let’s find out:slight_smile:

1 Like

Such change isn’t pure speed benefit. It is generally nicer way to do it. Instead of calling deque(...maxmasjdsa=12312), which suffers from readability and general confusion to anyone who sees it for the first time, now one can define it to be something “nice”. E.g.:

black_hole = deque(maxlen=0).extend

And performance is a benefit - no initialization cost and it has no wrapper overhead.

To me, this is mostly gain with little to no cost.

Maybe simplicity is not the right word here. What it does result is in consumer not being independent anymore, so the function is less portable. If someone wants to copy it, there is an extra step of finding black_hole.

Which is an aspect of simplicity I guess, but whatever amount of simplicity it subtracts here, it adds it back in other aspects.

For what it’s worth, I welcome all sorts of ideas to more-itertools. “It’s somewhat faster on this synthetic benchmark, but makes the code more complex” is usually something I’ll ultimately pass on. But if you come with examples from GitHub of people running whatever function in tight loops, I’ll be much more receptive.

6 Likes

Was looking at: os.path.commonprefix.

def commonprefix(m):
    ...
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1
def commonprefix(m):
    ...
    s1 = min(m)
    s2 = max(m)
    i = bti.ilen(itl.takewhile(bool, map(opr.eq, s1, s2)))
    return s1[:i]