This is a synthesis of work that I did regarding 3 different, but related efficient functionality needs.
- My proposal for
itertools.ilen
which was rejected - Exposing
items_seen
initertools.count
- Search for more efficient
more_functools.countable
See Comments
After itertools.ilen
was rejected I started thinking what else does counting and could potentially be extended and made more useful while including functionality of ilen
.
So I thought if collections.deque
is special-cased to be efficient consumer, maybe in a similar spirit count
can be functionally extended to count items.
Also, maxlen=0
deque has no benefit beyond consuming, which is not the job of deque
at all, while this extension does offer sensible extension to count
class and consume-counting is a valid functionality of a counter.
So the proposal is 4 new methods to itertools.count
:
class count:
def __repr__(self):
args = str(self._value)
if self.step != 1:
args += f', {self.step}'
return f'{type(self).__name__}({args})'
def __init__(self, firstval=0, step=1):
self._value = firstval
self.step = step
def __iter__(self):
return self
def __next__(self):
_value = self._value
self._value += self._step
return _value
@property
def value(self):
# NEW
return self._value
@value.setter
def value(self, value):
# NEW
self._value = value
def consume(self, iterable):
# NEW
for _ in iterable:
self._value += self._step
def along(self, iterable):
# NEW
for el in iterable:
next(self)
yield el
Performance benefits
This provides efficient methods for 2 operations with C-speed counting which is by current solutions is being done with pure python objects.
Namely: more_itertools.ilen
and more_itertools.countable
.
Expected performance of count.consume
:
a = range(100_000)
%timeit more_itertools.ilen(a) # 3.5 ms
%timeit PR.ilen(a) # 1.45 ms
%timeit count.consume(a) # will be slightly higher than 1.45 ms due to thread safety overhead
Expected performance of count.along
consume = collections.deque(maxlen=0).extend
%timeit consume(more_itertools.counter(a) # 12 ms
%timeit consume(efficient_recipe_counter(a) # 3.8 ms
%timeit consume(count.along(a)) # > 1.45 ms & < 2 ms
for efficient recipe, see Allow accessing / retrieving the current item of `itertools.count` - #7 by Stefan2
Will be very similar to the one of count.consume
as the only difference is extra function call and item return. Of course, if items are returned to python and not being burned by consumer inside C. But this overhead is absent from the above more_itertools.counter
example as well.
Result
More functional counter
with thread safe operations.
Example
counter = count()
print(next(counter)) # 0
print(counter.value) # 1
counter.consume([0, 1, 2])
print(counter.value) # 4
for el in counter.along(['a', 'b', 'c']):
pass
print(counter.value) # 7
Threading example
import itertools, threading, time
def task1(counter):
while (i := counter.value) % 2:
time.sleep(0.1)
print(f'task1: {i}')
def job1(counter, n):
counter.consume(map(task1, itertools.repeat(counter, n)))
def task2(counter):
while not (i := counter.value) % 2:
time.sleep(0.2)
return i
def job2(counter, n):
for i in counter.along(map(task2, itertools.repeat(counter, n))):
print(f'task2: {i}')
counter = count()
t1 = threading.Thread(target=job1, args=(counter, 5))
t2 = threading.Thread(target=job2, args=(counter, 5))
t1.start()
t2.start()
t1.join()
t2.join()
# task1: 0
# task2: 1
# task1: 2
# task2: 3
# task1: 4
# task2: 5
# task1: 6
# task2: 7
# task1: 8
# task2: 9
print(counter.value) # 10