Itertools.ilen(iterable)

dg-pb · May 9, 2024, 11:30pm

What short discussion has boiled down to is that simple ilen might be worth adding to itertools.

It is as basic as list.__len__, although much less frequently used. Nevertheless, it can be useful as pure python implementation is more than 2x slower, which can become significant difference for long iterables.

Apart from basic usage, it is useful for short-circuiting. Such as “truth counting”:

rng = [False] * 50_000 + [True] * 2 + [False] * 50_000
%timeit sum(filter(None, iter(rng))) == 1   # 319 µs
# replacing with:
%timeit ilen(itl.islice(filter(None, iter(rng)), 2)) == 1   # 176 µs

Also, it is efficient iterator consumer with similar efficiency to collections.deque but arguably more intuitive and less verbose:

rng = range(100_000)
%timeit coll.deque(iter(rng), maxlen=0)      # 1.45 ms
%timeit ilen(iter(rng))    # 1.45 ms (tp_iternext)
%timeit ilen(iter(rng))    # 1.55 ms (PyIter_Next)

This also has extra benefit of knowing how many items have been consumed. This can be useful for cases where there is no convenient way to know it by default.

Implementation is trivial:

static PyObject*
ilen(PyObject* self, PyObject* args)
{
    PyObject* iterable;
    if (!PyArg_ParseTuple(args, "O", &iterable)){
        return NULL;
    }
    PyObject *iter = PyObject_GetIter(iterable);
    PyObject *item;
    Py_ssize_t i = 0;
    // while ((item = Py_TYPE(iter)->tp_iternext(iter))) {
    while ((item = PyIter_Next(iter))) {
        Py_DECREF(item);
        if (i == PY_SSIZE_T_MAX) {
            PyErr_SetString(PyExc_TypeError,
                            "`iterable` for `ilen` is too long to count.");
            Py_DECREF(iter);
            return NULL;
        }
        i++;
    }
    Py_DECREF(iter);
    return PyLong_FromSsize_t(i);
}

Speed comparisons:

import more_itertools as mitl
from iteration_utilities import count_items
a = range(100_000)
%timeit mitl.ilen(a)         # 3.6 ms
%timeit len(list(iter(a)))   # 2.6 ms
%timeit ilen(iter(a))        # 1.6 ms
%timeit count_items(iter(a)) # 1.45 ms

iteration_utilities is slightly faster as it uses tp_iternext instead of PyIter_Next

Costs / benefits: Benefits are some and there is little to no cost.

INITIAL POST:
I am not sure if there aren’t any workarounds for this. Would be happy to hear if you know some good methods to improve performance for the following.

Ok, the issue:

a = range(100000)
%timeit len(list(iter(a)))                  # 2.6 ms
%timeit more_itertools.ilen(iter(a))        # 3.5 ms

I have searched stack and thought about it and there simply aren’t any better ways. While theoretically it should be even faster than len(list(iter(a))) as there is no need to construct new list for this.

more_itertools solution (with some alternatives that are slower):

def ilen(iterable):
    """
    Ref:
        more_itertools. Direct copy. GitHub tracker: #236, #230.
    Examples:
        >>> ilen(iter(range(10)))
        10
    XXX:
        a = range(100000)
        %timeit sum(1 for _ in iter(a))                     # 6.1 ms
        %timeit sum(map(lambda i: 1, iter(a)))              # 5.5 ms
        %timeit coll.deque(enumerate(iter(a)), maxlen=1)    # 4.4 ms
        %timeit ilen(iter(a))                               # 3.5 ms
        %timeit len(list(iter(a)))                          # 2.6 ms
    """
    counter = itl.count()
    map(counter.__next__)
    coll.deque(zip(iterable, counter), maxlen=0)
    return next(counter)

Proposal. Implementing following in C:

def call_n_times(func, n_or_iterable, *args):
    if isinstance(n_or_iterable, Integral):
        n_or_iterable = range(n_or_iterable)
    for _ in n_or_iterable:
        func(*args)

Although this wouldn’t provide the most optimal solution for ilen above, but it would improve it in both simplicity and hopefully speed as one component is eliminated from equation:

def ilen(iterable):
    counter = itl.count()
    call_n_times(counter.__next__, iterable)
    return next(counter)

Couple of directly related stack threads that would reach “one obvious way to do it”.

Summary:
Although this doesn’t add anything new and there already exist many ways to achieve what the proposed function does, I think this could provide more efficient and more intuitive solution for iterator recipes and and avoiding loops and complex inefficient recipes where callable simply needs to be repeated.

Melendowski · May 9, 2024, 11:38pm

Can you give an example where you’d call the same function repeatedly, with no arguments (or no changes in arguments). I’ve done it once, with a for loop, and it was for some gui test for pyside6 and i was not happy with what I did.

dg-pb · May 9, 2024, 11:50pm

Apart from iterator recipes, I would see myself mostly using this for “iter” methods.

E.g. I am currently working on global optimizer:

class Optimizer:
    def __init__(self):
        ...
    def __next__(self):
        ...
    def run(self, niter=1000):
        call_n_times(self.__next__, niter)

Also, fast way to call a function multiple times could be very useful where function runtime is very low and repetition overhead is preferably as small as possible. E.g. instantiation speed:

timer.timeit(call_n_times(itertools.count, 1000))

If runtime is < 50 nanosecods the current overhead of timer distorts comparisons quite a lot.

template = """
def inner(_it, _timer{init}):
    {setup}
    _t0 = _timer()
    for _i in _it:
        {stmt}
        pass
    _t1 = _timer()
    return _t1 - _t0
"""

Also, I don’t like the name of a function. Maybe repeat_call would be appropriate.

Having that said, I think large part of use cases would be iterator recipes. One sure improvement is above in proposal. But even before I got this idea (earlier today) I was looking for a way to do exactly this when thinking of more efficient implementations for windowed function. (related to nwise thread). I am not sure if it would have led somewhere given the function existed, but it was surely not the first time when I was writing itertools and wanted to grab efficient function which does exactly this and couldn’t find it.

mikeshardmind · May 10, 2024, 12:02am

def repeat(f, k=1):
    islice(iter(f, object()), k)

seems to perform well for this

dg-pb · May 10, 2024, 12:11am

Nice one. Of course would need to wrap it in list to execute it.

However, this doesn’t solve the case when type(n_or_iterable) == Iterable

It would change current solution to:

coll.deque(zip(iter(counter.__next__, object()), a), maxlen=0) # 6.37 ms

Which unfortunately is much slower.

kknechtel · May 10, 2024, 12:52am

Not really, no, it doesn’t work that way in practice. Similarly, sum will typically be faster with a list comprehension than the corresponding generator expression.

Either way the elements of the source have to be iterated over and copied. A list will be resized several times, but it’s O(lg N) times, and generally AFAIK the larger allocations are not really slower than the smaller ones. The pointers also get copied during a resize, but that can use a fast memcpy. The iteration approach could probably beat the list approach easily if it, too, were implemented directly in C. But more_itertools has to retrieve objects from the iterator at the Python layer and use Python integers to count.

It took me quite a while to get my head around the transformation you’re applying to the original problem, actually. And if you only hope it will be faster, you’ll have a hard time getting someone else to implement it and see if it is.

Trying to make this more flexible while also aiming for performance in C seem at cross purposes, too. I’d rather just see the specialized iterable-counter.

ZeroIntensity · May 10, 2024, 1:16am

Would this be an adequate C implementation? It might be easier to just put this in your project as a C extension rather than an addition to itertools

static PyObject*
call_n_times(PyObject* self, PyObject* args)
{
    PyObject* func;
    PyObject* n_or_iterable;
    PyObject* func_args = NULL;

    if (!PyArg_ParseTuple(
        args,
        "OOO",
        &func,
        &n_or_iterable,
        &func_args
        ))
        return NULL;

    // note: this doesn't do overflow checking
    Py_ssize_t count;

    if (PyLong_Check(n_or_iterable)) {
        count = PyLong_AsSsize_t(n_or_iterable);
        if (count == -1 && PyErr_Occurred())
            return NULL;
    } else {
        count = PySequence_Length(n_or_iterable);
        if (count == -1)
            return NULL;
    }

    for (Py_ssize_t i = 0; i < count; ++i)
    {
        PyObject* result = PyObject_Call(
            func,
            func_args,
            NULL
        );
        if (result == NULL)
            return NULL;

        Py_DECREF(result);
    }

    Py_RETURN_NONE;
}

dg-pb · May 10, 2024, 1:17am

I am almost sure it will be faster than the current solution, but I don’t think it would be faster than len(list(iter

Maybe I will do it myself if I see the interest in this. In that case would also compare specialised solution and more flexible (proposed) one.

I agree that this is a bit trying to shoot 2 birds at once. Personally, would need to compare performance to see if this makes sense.

dg-pb · May 10, 2024, 2:01am

Much thanks for this.

So this function:

def ilen2(iterable):
    counter = itl.count()
    call_n_times(counter.__next__, iterable, ())
    return next(counter)

%timeit ilen(iter(a))                               # 3.5 ms
%timeit len(list(iter(a)))                          # 2.6 ms
%timeit ilen2(a)                              # 2.4 ms

Is faster than len(list(

However, it doesn’t deal with iterables, only sequences.

while ((item = PyIter_Next(iter))) {
            PyObject_Call(func, func_args, NULL);
            Py_DECREF(item);
        }
%timeit ilen2(iter(a))  # 4.9 ms

And if simply implementing specialised ilen via:

while ((item = PyIter_Next(iter))) {
            i += 1;
            Py_DECREF(item);
        }
%timeit ilen_specialised(a)    # 1.56 ms

It seems that calling counter.__next__ explicitly is not efficient.

mikeshardmind · May 10, 2024, 2:51am

If we just focus on a fast way to call f n number of time, and split the concern over determining the number of times when given an iterator as a separate problem (because it is…)

In [6]: %timeit f()
20.4 ns ± 0.083 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

In [7]: %timeit next(batched(iter(f, object()), 100_000))
2.63 ms ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit [f() for _ in range(100_000)]
2.72 ms ± 7.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [9]: %timeit for _ in range(100_000):  f()
2.67 ms ± 18.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit list(islice(iter(f, object()), 100_000))
2.8 ms ± 2.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The best solution that I have here with itertools.batched has ~5ns of overhead per call of f*. I’m not sure you’re chasing a gain worth chasing.

* (2.67ms - (20.4ns x 100000)) / 100000

dg-pb · May 10, 2024, 2:58am

No, me neither. It seems that what could make sense is implementing simple ilen as suggested by @kknechtel.

Doesn’t seem like adding anything else would be justified from performance perspective.

And yes, I think your proposal is most likely the best recipe for simply calling f() n times.

dg-pb · May 10, 2024, 4:09am

Just to correct myself. There is a faster way to consume iterable than batched:

n = 100_000
%timeit next(itl.islice(iter(f, object()), n, n), None)     # 4.5 ms
%timeit next(itl.batched(iter(f, object()), n))   # 4.72 ms
%timeit for _ in range(n):  f()               # 5 ms

Rosuav · May 10, 2024, 5:35am

Also try the obscure-but-fast collections.deque(it, maxlen=0) for comparison.

dg-pb · May 10, 2024, 5:46am

Slightly slower in this case.

f = lambda: 1
n = 100_000
it = iter(f, object())
%timeit next(itl.islice(it, n, n), None)            # 4.5 ms
%timeit coll.deque(itl.islice(it, n), maxlen=0)     # 4.7 ms
%timeit next(itl.batched(it, n))                    # 4.7 ms
%timeit for _ in range(n):  f()                     # 5 ms

Nineteendo · May 10, 2024, 5:54am

And with from collections import deque? (Probably just a couple nanoseconds faster).

dg-pb · May 10, 2024, 5:57am

Updated initial thread to match the changed (narrowed down) proposal. Didn’t think creating a new one is a good idea.

dg-pb · May 10, 2024, 6:28am

deque(maxlen=0) is negligibly slower than next(islice(..., n, n), None)

deque is better for full consumption and next(islice is for partial consumption.

However, there actually is a significantly faster approach:

%timeit coll.deque(itl.starmap(f, itl.repeat((), n), maxlen=0) # 3.4ms

dg-pb · May 10, 2024, 6:39am

And finally, solution with ilen implemented in C:

%timeit ilen(itl.starmap(f, itl.repeat((), n)))     # 3.4 ms (tp_iternext)
%timeit ilen(itl.starmap(f, itl.repeat((), n)))     # 3.6 ms (PyIter_Next)

So ilen implemented in C can also be a convenient iterator consumer on par with deque(maxlen=0) and next(islice()).

Stefan2 · May 10, 2024, 8:30pm

github.com/python/cpython

iterator length

opened 04:16PM - 09 Aug 10 UTC

closed 04:19PM - 09 Aug 10 UTC

555347da-5839-4ee5-b894-6d4c570d7aa9

type-feature

BPO | [9547](https://bugs.python.org/issue9547) --- | :--- Nosy | @rhettinger, @…benjaminp <sup>*Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.*</sup> <details><summary>Show more details</summary><p> GitHub fields: ```python assignee = None closed_at = <Date 2010-08-09.16:19:41.060> created_at = <Date 2010-08-09.16:15:59.983> labels = ['type-feature'] title = 'iterator length' updated_at = <Date 2010-08-09.18:04:43.711> user = 'https://bugs.python.org/AlexandruMooi' ``` bugs.python.org fields: ```python activity = <Date 2010-08-09.18:04:43.711> actor = 'rhettinger' assignee = 'none' closed = True closed_date = <Date 2010-08-09.16:19:41.060> closer = 'benjamin.peterson' components = [] creation = <Date 2010-08-09.16:15:59.983> creator = 'Alexandru.Mo\xc8\x99oi' dependencies = [] files = [] hgrepos = [] issue_num = 9547 keywords = [] message_count = 3.0 messages = ['113421', '113422', '113434'] nosy_count = 3.0 nosy_names = ['rhettinger', 'benjamin.peterson', 'Alexandru.Mo\xc8\x99oi'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = None status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue9547' versions = ['Python 2.6', 'Python 2.5', 'Python 3.1', 'Python 2.7', 'Python 3.2', 'Python 3.3'] ``` </p></details>

This has been rejected before. The core issue is that it is not a very useful operation because it consumes the iterator.

Also, this isn’t an operation worth optimizing because most iterators that do something interesting are already slower than a genexp inside a sum().

dg-pb · May 10, 2024, 9:15pm

Thanks for this.

First point is very valid. This is the main reason why its usage is low.

Second point is valid from POV of when iterators are used in producer/consumer manner - i.e. iterating over hefty (especially in memory) payloads.

However, iterators are also being used in a casual manner in various algorithms, where elements of iterators are simply numbers or strings. ilen in these cases is a small building component and is used in other recipes. Thus, its performance has a wider impact. E.g.:

# Call function n times (iterator consumer application instead of `deque`)
%timeit ilen(itl.starmap(f, itl.repeat((), n)))
# C: 3.4 ms (ilen here takes 1.4ms)
# Py: 5.6 ms (ilen here takes 3.6ms >50%)

# Append elements iterator to list, where it needs to be done element-by-element
def foo(rng):
       a = list()
       ilen(map(a.append, rng))
%timeit foo(range(100_000))
# C: 3.8 ms (ilen: 1.4ms 50%)
# Py: 5 ms (ilen: 3.6ms >70%)

Not trying to argue that there are very big benefits, but I am looking at this from perspective of benefit/cost ratio, and cost here is negligible:
a) Implementation cost - minimal
b) Maintenance cost - minimal

Also, there are no ambiguities in naming or implementation. ilen function name is not going to be more sensible for anything else.

Given the cost is so low, I think that the judgement has to account for it appropriately.

I think this at least deserves another consideration. After all, I see that this has been proposed at least 2 times already.