All of my efforts to implement it in CPython have failed:
a) ilen was rejected as it would be the only function in itertools returning scalar so doesn’t fit.
b) count.consume was rejected too (and reasonably so). Although I was happy with functionality (atomic counting at C speed), but itertools is just not the good place for it. I would have probably retracted it myself even if it was accepted tbh.
Challenge
But the question remains! How to count iterator items while consuming at C speed?
There are 2 main paths (and I failed on both so far):
a) A clever recipe using existing Python toolkit.
b) Addition to CPython, which would provide it (directly or indirectly) and would be sensible. I.e. would not be rejected.
Criteria. Memory and Performance
Memory Optimal: Should not hold more than 1 iterator element in memory at a time.
Performance: The faster the better
Examples:
a = [1] * 100_000
%timeit more_itertools.ilen(iter(a))
# 2.8 ms (currently fastest optimal memory pure python recipe)
%timeit len(list(iter(a)))
# 0.27 ms (C speed, memory inefficient)
%timeit sum(map(len, itertools.batched(iter(a), 1000)))
# 0.25 ms (C speed, memory inefficient, although not fully)
Will be compared on Python 3.12 “Attempt This Online” or if any CPython extensions, then all will be compared on 3.12 (compiled locally on Unix).
import math, timeit
import statistics as sts
def time_to_count(ilen_func, weights=(0.7, 0.2, 0.1)):
lengths = [100_000, 100, 10]
results = list()
for n in lengths:
norm = 100_000 // n
lst = [1] * n
test_func = lambda: ilen_func(iter(lst))
num = norm * 100
res = timeit.repeat(test_func, number=num)
res = sts.mean(res) / num * 1000 # 1000 -> ms
res = res * norm # -> time per 100K
results.append(res)
print([round(r, 2) for r in results])
print(math.sumprod(results, weights))
def ilen(a):
return sum(1 for _ in a)
time_to_count(ilen)
There is also c) an implementation in C, available as a library on PyPI for those that need this very specific functionality. I would argue that this is the most popular path.
from itertools import *
from operator import indexOf
def ilen_s5(iterable):
milestones = cycle((False,) * 99 + (True,))
total = 99
for _ in compress(iterable, milestones):
del _
total += 100
return total - indexOf(milestones, True)
Yes, but before the assignment, the old and the new element are both held in memory. And in that solution, they’re not even adjacent elements but up to 100 apart.
Ok then:
def ilen_s6(a):
return sum(map(len, batched(a, many - 1)))
In CPython, yes, but the current binding of _ keeps its referent alive until _ is rebound by the next loop value. So, after it gets started, two objects from the iterable can be kept alive.
A subtlety is that this can actually have a major effect on speed, although rarely so. For a very few iterables, like itertools.combinations(), the C code reuses the “immutable” result tuple if its reference count is 1. This saves the expense of allocating a new tuple object, which, while cheap, has significant cost compared to to the typically very little computation it takes to advance to the next combination.
For example, on my box, this:
for x in combinations(range(100), 95):
pass
takes nearly 3 times longer than
for x in combinations(range(100), 95):
del x
In the first loop, a new tuple with 95 elements has to be completely filled in on each iteration. In the second loop, the very same 95-element tuple object is mutated in-place, and typically only needs to replace the final element. The latter is “safe” because the implementation knows it owns the only reference. The client can’t see the difference, unless it’s looking at the tuple’s memory address (id()).
Should you worry about that? Up to you. I would not .