Itertools.count enhancements

dg-pb · July 11, 2024, 11:46pm

This is a synthesis of work that I did regarding 3 different, but related efficient functionality needs.

My proposal for itertools.ilen which was rejected
Exposing items_seen in itertools.count
Search for more efficient more_functools.countable See Comments

After itertools.ilen was rejected I started thinking what else does counting and could potentially be extended and made more useful while including functionality of ilen.

So I thought if collections.deque is special-cased to be efficient consumer, maybe in a similar spirit count can be functionally extended to count items.

Also, maxlen=0 deque has no benefit beyond consuming, which is not the job of deque at all, while this extension does offer sensible extension to count class and consume-counting is a valid functionality of a counter.

So the proposal is 4 new methods to itertools.count:

class count:
    def __repr__(self):
        args = str(self._value)
        if self.step != 1:
            args += f', {self.step}'
        return f'{type(self).__name__}({args})'

    def __init__(self, firstval=0, step=1):
        self._value = firstval
        self.step = step

    def __iter__(self):
        return self

    def __next__(self):
        _value = self._value
        self._value += self._step
        return _value

    @property
    def value(self):
        # NEW
        return self._value

    @value.setter
    def value(self, value):
        # NEW
        self._value = value

    def consume(self, iterable):
        # NEW
        for _ in iterable:
            self._value += self._step

    def along(self, iterable):
        # NEW
        for el in iterable:
            next(self)
            yield el

Performance benefits

This provides efficient methods for 2 operations with C-speed counting which is by current solutions is being done with pure python objects.

Namely: more_itertools.ilen and more_itertools.countable.

Expected performance of count.consume:

a = range(100_000)
%timeit more_itertools.ilen(a)    # 3.5 ms
%timeit PR.ilen(a)                # 1.45 ms
%timeit count.consume(a)          # will be slightly higher than 1.45 ms due to thread safety overhead

Expected performance of count.along

consume = collections.deque(maxlen=0).extend
%timeit consume(more_itertools.counter(a)    # 12 ms
%timeit consume(efficient_recipe_counter(a)  # 3.8 ms
%timeit consume(count.along(a))              # > 1.45 ms & < 2 ms

for efficient recipe, see Allow accessing / retrieving the current item of `itertools.count` - #7 by Stefan2

Will be very similar to the one of count.consume as the only difference is extra function call and item return. Of course, if items are returned to python and not being burned by consumer inside C. But this overhead is absent from the above more_itertools.counter example as well.

Result

More functional counter with thread safe operations.

Example

counter = count()
print(next(counter))                # 0
print(counter.value)                # 1
counter.consume([0, 1, 2])
print(counter.value)                # 4
for el in counter.along(['a', 'b', 'c']):
    pass
print(counter.value)                # 7

Threading example

import itertools, threading, time

def task1(counter):
    while (i := counter.value) % 2:
        time.sleep(0.1)
    print(f'task1: {i}')

def job1(counter, n):
    counter.consume(map(task1, itertools.repeat(counter, n)))

def task2(counter):
    while not (i := counter.value) % 2:
        time.sleep(0.2)
    return i

def job2(counter, n):
    for i in counter.along(map(task2, itertools.repeat(counter, n))):
        print(f'task2: {i}')

counter = count()
t1 = threading.Thread(target=job1, args=(counter, 5))
t2 = threading.Thread(target=job2, args=(counter, 5))
t1.start()
t2.start()
t1.join()
t2.join()
# task1: 0
# task2: 1
# task1: 2
# task2: 3
# task1: 4
# task2: 5
# task1: 6
# task2: 7
# task1: 8
# task2: 9
print(counter.value)    # 10

JamesParrott · July 12, 2024, 10:04am

self._value += self.step

The example implementation is incorrect - it ignores self.step

The word “efficient” has been used twice, but I’ve not seen any timing benchmarks.

I don’t see the point in the getter and setter at all on something this simple, if it’s not important for thread safety. Why not just expose .value ?

.consume does exactly the same thing as .along (for the current __next__) except for a second time, it incorrectly implements the basic purpose of itertools.count, by not using self.step


        for _ in iterable:
            self._value += 1

I think this one is best put to bed.

AndersMunch · July 12, 2024, 11:36am

That’s too bad. Having a standard name for this common need would have been nice. What reason did Raymond give?

This new proposal is just complication, I think. Making it less palatable, not more. In particular I wouldn’t want it mixed up with itertools.count, because count is a natural name for what you call ilen, creating opportunity for confusion.

By the way, I would have called the function consume or exhaust instead of ilen, to emphasize the side-effect. ilen sounds too much like the name of a pure function.

dg-pb · July 12, 2024, 1:36pm

Thank you. Errors. They all should use step. Corrected.

You can look at PR. Basically general fit to itertools. Naming, purpose, etc.

At least part of it is addressed in this proposal.

dg-pb · July 12, 2024, 2:41pm

Apologies, I assumed that performance benefits will be clear from linked previous work and simply by looking at implementations of current alternatives, where overhead over optimal C implementation can be seen.

Updated with indications of expected run times.

GotoRoto · July 15, 2024, 11:16pm

A possible concern is that the count class may already be used by developers to provide a way to iterate over an arithmetic progression whose terms are in the set of rational numbers.

This could’ve arose from the fact that the documentation for it specifies that ''Make an iterator that returns evenly spaced values beginning with start" and the fact that Fraction and Decimal objects are valid arguments to the count constructor.

The proposed implementation of the count.consume method adds onto self._value by the product of self.step and the length of the iterable, which doesn’t make much sense for determining the length of an iterable if self.step is anything other than 1.

One could argue “Then, users should just specify the step argument in the constructor if they need to use an arithmetic progression and shouldn’t specify it when they need to find the length of an iterable.”

The root concern is that now, the proposed count class has a confusing behavior (the aforementioned behavior when a count instance has a step argument that isn’t 1 and has the consume method is called on it) and has two distinct, desired, and equally sensical behaviors, and the behavior taken depends on the step argument.

Even if the consume method only worked for count instances with a step argument of 1, the two sensical and distinct behaviors would still exist in the same class. I argue they are so distinct that it would be less preferable to implement the consume method then to leave the count class untouched and create a new function or class that finds the length of an iterable, on the basis of the Single responsibility principle.

I’m not sure how to address the along method, so I’ll leave that for other users to talk about.

It also would be very helpful for other users if you put a written explanation for your Threading example. The explanation could ideally include explaining what the code is meant to do, how is the code is doing what was intended, and how this could’ve or has been implemented without your proposal, and maybe more.

GotoRoto · July 15, 2024, 11:28pm

I meant the aforementioned behavior when self.step isn’t 1 and when calling the consume method. I apologize for my poor phrasing.

dg-pb · July 15, 2024, 11:29pm

What is the “nonsensical” behaviour you are referring to?

GotoRoto · July 15, 2024, 11:30pm

I’m sorry for my poor phrasing. I meant what happens if the consume method is called on an object with a step argument that isn’t 1. I’ll edit it right now

dg-pb · July 15, 2024, 11:31pm

I imagine you are referring to this.

Why doesn’t it make sense?

It isn’t called “count”, it is called “consume”, meaning it consumes the iterator with whatever step was provided.

GotoRoto · July 15, 2024, 11:37pm

You’re correct in that I am referring to this. The consume method has the primary side effects of incrementing the count object by the length of the provided iterable and consuming the iterator from that iterable. Why would one need to count and consume the iterable in terms of rational numbers like Fraction and Decimal, which are valid startval and step arguments to the count constructor?

dg-pb · July 15, 2024, 11:42pm

I have shown that it is useful with integers.

And that it is correct both technically and conceptually with any step.

I have never used it with fractions. Could you come up with the case where it could be useful in practice with fractions as well?

It would surely help support this idea.

GotoRoto · July 15, 2024, 11:42pm

I also wish to ask why a developer would need to consume and count the number of items in an iterable in steps? Edit: I’m referring to the consume method.

dg-pb · July 15, 2024, 11:49pm

What do you mean by “in steps”? Could you give a specific example what you mean by that?

GotoRoto · July 15, 2024, 11:55pm

How are the count objects used now? To generate an arithmetic progression. So I imagine there maybe are developers that use Fractions and Decimals in their arithmetic progressions in libraries that use math and maybe more. I don’t have evidence that points to Decimal and Fraction being used or not being used a lot as arguments to the count constructor, but we shouldn’t assume, move on, and act as if they don’t exist. An important point is that if there are a lot of use cases that depend on the constructor argument accepting the Decimal and Fraction objects as values, then it may be better to separate the proposed functionality to another class or function. It’s a potential concern, that is all.

dg-pb · July 16, 2024, 12:02am

But the fact that it is conceptually and theoretically sound with any type of step object is not a concern, it is a bonus.