What short discussion has boiled down to is that simple ilen
might be worth adding to itertools
.
It is as basic as list.__len__
, although much less frequently used. Nevertheless, it can be useful as pure python implementation is more than 2x slower, which can become significant difference for long iterables.
Apart from basic usage, it is useful for short-circuiting. Such as “truth counting”:
rng = [False] * 50_000 + [True] * 2 + [False] * 50_000
%timeit sum(filter(None, iter(rng))) == 1 # 319 µs
# replacing with:
%timeit ilen(itl.islice(filter(None, iter(rng)), 2)) == 1 # 176 µs
Also, it is efficient iterator consumer with similar efficiency to collections.deque
but arguably more intuitive and less verbose:
rng = range(100_000)
%timeit coll.deque(iter(rng), maxlen=0) # 1.45 ms
%timeit ilen(iter(rng)) # 1.45 ms (tp_iternext)
%timeit ilen(iter(rng)) # 1.55 ms (PyIter_Next)
This also has extra benefit of knowing how many items have been consumed. This can be useful for cases where there is no convenient way to know it by default.
Implementation is trivial:
static PyObject*
ilen(PyObject* self, PyObject* args)
{
PyObject* iterable;
if (!PyArg_ParseTuple(args, "O", &iterable)){
return NULL;
}
PyObject *iter = PyObject_GetIter(iterable);
PyObject *item;
Py_ssize_t i = 0;
// while ((item = Py_TYPE(iter)->tp_iternext(iter))) {
while ((item = PyIter_Next(iter))) {
Py_DECREF(item);
if (i == PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_TypeError,
"`iterable` for `ilen` is too long to count.");
Py_DECREF(iter);
return NULL;
}
i++;
}
Py_DECREF(iter);
return PyLong_FromSsize_t(i);
}
Speed comparisons:
import more_itertools as mitl
from iteration_utilities import count_items
a = range(100_000)
%timeit mitl.ilen(a) # 3.6 ms
%timeit len(list(iter(a))) # 2.6 ms
%timeit ilen(iter(a)) # 1.6 ms
%timeit count_items(iter(a)) # 1.45 ms
iteration_utilities
is slightly faster as it uses tp_iternext
instead of PyIter_Next
Costs / benefits: Benefits are some and there is little to no cost.
INITIAL POST:
I am not sure if there aren’t any workarounds for this. Would be happy to hear if you know some good methods to improve performance for the following.
Ok, the issue:
a = range(100000)
%timeit len(list(iter(a))) # 2.6 ms
%timeit more_itertools.ilen(iter(a)) # 3.5 ms
I have searched stack and thought about it and there simply aren’t any better ways. While theoretically it should be even faster than len(list(iter(a)))
as there is no need to construct new list for this.
more_itertools solution (with some alternatives that are slower):
def ilen(iterable):
"""
Ref:
more_itertools. Direct copy. GitHub tracker: #236, #230.
Examples:
>>> ilen(iter(range(10)))
10
XXX:
a = range(100000)
%timeit sum(1 for _ in iter(a)) # 6.1 ms
%timeit sum(map(lambda i: 1, iter(a))) # 5.5 ms
%timeit coll.deque(enumerate(iter(a)), maxlen=1) # 4.4 ms
%timeit ilen(iter(a)) # 3.5 ms
%timeit len(list(iter(a))) # 2.6 ms
"""
counter = itl.count()
map(counter.__next__)
coll.deque(zip(iterable, counter), maxlen=0)
return next(counter)
Proposal. Implementing following in C:
def call_n_times(func, n_or_iterable, *args):
if isinstance(n_or_iterable, Integral):
n_or_iterable = range(n_or_iterable)
for _ in n_or_iterable:
func(*args)
Although this wouldn’t provide the most optimal solution for ilen
above, but it would improve it in both simplicity and hopefully speed as one component is eliminated from equation:
def ilen(iterable):
counter = itl.count()
call_n_times(counter.__next__, iterable)
return next(counter)
Couple of directly related stack threads that would reach “one obvious way to do it”.
Summary:
Although this doesn’t add anything new and there already exist many ways to achieve what the proposed function does, I think this could provide more efficient and more intuitive solution for iterator recipes and and avoiding loops and complex inefficient recipes where callable simply needs to be repeated.