Itertools.ilen(iterable)

More ilen implementations with benchmark (using your length 100000).

Thanks, I like the innovativeness of the recipe. However, it suffers from one extra parameter, which I would not want to ever think about when using such a basic function. To add it to current benchmarks:

a = range(100000)
import more_itertools
%timeit sum(1 for _ in iter(a))                     # 6.1 ms
%timeit sum(map(lambda i: 1, iter(a)))              # 5.5 ms
%timeit coll.deque(enumerate(iter(a)), maxlen=1)    # 4.4 ms
%timeit more_itertools.ilen(iter(a))                # 3.5 ms
%timeit sum(map(len, batched(iterable, 5)))         # 2.9 ms
%timeit len(list(iter(a)))                          # 2.6 ms
%timeit sum(map(len, batched(iterable, 10)))        # 2.5 ms
%timeit sum(map(len, batched(iterable, 1000)))      # 2.2 ms
%timeit ilen(iter(a))                               # 1.4 ms
1 Like

I think the utility of consuming an iterator immediately to find its length, without regard to the contents, is greater than some might expect. After all, there’s a Unix utility (wc) dedicated to exactly that.

2 Likes

Meh, 1000 is my go-to number for such things :-). I also just added two more solutions also using 1000, although that’s just ints, not elements from the iterable (1000 of those could cause a memory problem, but at least it’s not unlimited like the list solution).

1 Like

The very fact that there are already numerous ways invented to do this by now, 2 main iterator libraries implemented it (one had a process of selecting the best pure python recipe, another implemented it in C)…

There still is a live benchmarking thread on stack and this thread.

To me it seems like a fairly good indicator that “one obvious way to do it” is not. Not sure if its ilen, but its my best guess ATM. :slight_smile:

2 Likes

One more use case for it - microbenchmarking iterators:

%timeit deque(map(bool, range(10)), maxlen=0)   # 771   # high construction overhead (possible to address this of course, but ilen is nicer to work with)
%timeit list(map(bool, range(10)))              # 480   # memory becomes a factor for large iterators
%timeit ilen(map(bool, range(10)))              # 460   # Ideal