Struggling to rename a "parallel" accumulate function

Background:

So, I was looking at Python’s itertools accumulate class and thought to myself, “I wish there was a way I could run accumulate on the same iterator, with multiple different functions so that I don’t have to make different copies of the iterator and then create accumulate objects from them for each function.” And thus, I made the following code:

from collections.abc import Callable, Generator, Iterable
from typing import Optional, TypeVar
from itertools import chain
from operator import add, mul, sub

A = TypeVar("A")

def parallel_accumulate(
    iterable: Iterable[A],
    *funcs: Callable[[A, A], A],
    initials: Optional[tuple[A, ...]] = None
) -> Generator[tuple[A, ...], None, None]:

  given_iterator = iter(iterable)
  primary_iterator: Iterator[A]

  try:
    first_element = next(given_iterator)
  except StopIteration:
    return

  if initials is None:
    totals = tuple(repeat(first_element, len(funcs)))
    primary_iterator = given_iterator

  else:
    totals = initials
    primary_iterator = chain([first_element], given_iterator)

  yield totals
  for element in primary_iterator:
    totals = tuple(
        func(total_element, element)
        for func, total_element in zip(funcs, totals))
    yield totals

My Problem
The name of the function is misleading because that implies the function has any form of parallelism, when it is in reality sequential. But I have no idea as to how I am supposed to rename this function such that the new name accurately describe its behavior. If this function has such a niche or no use-case that a name is impossible to form, please do tell me. Do note that the code is a toy example made out of boredom, and until I myself find a good use case for it, I won’t be using it for bigger projects.

I’m not sure what to call this thing (maybe accumulate_map?), but here (if I understood you correctly) is a much more concise implementation:

from collections.abc import Callable, Generator, Iterable
from typing import Optional, TypeVar
from itertools import accumulate, repeat, tee

# A wrapper to make it have only positional arguments,
# so as to be suitable for use with the builtin `map`.
def accumulate_positional(iterable, func, initial):
    return accumulate(iterable, func, initial=initial)

A = TypeVar('A')

def accumulate_map(
    iterable: Iterable[A],
    *funcs: Callable[[A, A], A],
    initials: Optional[tuple[A, ...]] = None
) -> map:
    return zip(*map(
        accumulate_positional,
        tee(iterable, len(funcs)),
        funcs,
        initials or repeat(None)
    ))

We use itertools.tee to create independent iterators over the original (it internally caches elements so that they can be used multiple times); use map to accumulate each of the input functions on a separate input iterator; then zip so that the results from the accumulators are grouped appropriately.

Let’s test it:

>>> from operator import add, mul
>>> from itertools import count, islice
>>> list(islice(accumulate_map(count(1), add, mul, initials=(0, 1)), 10))
[(0, 1), (1, 1), (3, 2), (6, 6), (10, 24), (15, 120), (21, 720), (28, 5040), (36, 40320), (45, 362880)]
1 Like

Hmm, I’m not too confident in unpacking the map object because zip is a function that takes an arbitrary number of arguments and thus it need to compute and load the values all at once, thereby exhausting memory and going against a strength of iterators as memory saving tools. Then again, I suppose it isn’t too difficult to circumvent that with the below transpose function (I’m sure better ones exist.)

from collections.abc import Generator, Iterable, Iterator
from itertools import chain
from typing import TypeVar


A = TypeVar("A")

def transpose(
    matrix: Iterable[Iterable[A]]) -> Generator[tuple[A, ...], None, None]:

  def flat_zip(_first_row: Iterable[A],
      _matrix_iter: Iterator[Iterable[A]]) -> Generator[A, None, None]:
    
    row_iters = (iter(row) for row in chain([_first_row], _matrix_iter))
    try:
      for row_iter in cycle(row_iters):
        yield next(row_iter)
    except StopIteration:
      return
  
  matrix_iter = iter(matrix)
  first_row = tuple(next(matrix_iter))
  
  F = flat_zip(first_row, matrix_iter)
  while column:=tuple(islice(F, len(first_row))):
    yield column

No, it lazily takes in values from the iterators passed to it, assembling one output tuple at a time:

>>> def seq(name):
...     print(name)
...     yield 1
...     print(name)
...     yield 2
... 
>>> x = list(zip(seq('a'), seq('b')))
a
b
a
b
>>> x
[(1, 1), (2, 2)]

Yes, one result from each accumulation is in memory all at once. But that would have to be the case anyway, because that equally describes the tuple you want to create.

My apologies for not properly clarifying the issue I had with your code in your previous example. It’s not that I think that the zip instance prematurely loads all the values after it is constructed or that each accumulation tuple is not in memory all at once. It’s that I think that before construction of the zip instance was complete, the unpack operator *, in the process of passing arguments to the zip constructor, immediately computed and loaded every accumulation tuple at the same time from the map instance.

For my transpose function, the while loop can be substituted with a proper use of itertools.batched in Python version 3.11 and above, while the current implementation is compatible with Python version 3.8 and above.

It creates each itertools.accumulate iterator (by running the map) and passes those as separate arguments to zip. But that only does setup work; nothing from the underlying iterable has been consumed yet and no application of the funcs has been done. itertools.accumulate is, itself, also lazy, just like repeat and everything else in the library.

I see. Thank you for the clarification you provided.

Interesting function! I don’t think I’ve seen it before.

Some name ideas:

  • accumulate_together - evokes multiple operations happening at the same time. “more_itertools” uses the word “together” in a few places but I don’t know if their usage matches
  • accumulate_n - instead of accumulating 1 thing, it accumulates “N” things (any number of things).
  • multi_accumulate - similar to the above, different phrasing
  • aggregate - similar to using aggregate columns in a SQL query. It’s a little odd IMO because “aggregate” is both a noun and a verb.

“accumulate_together” → The word “together” is ambigious because it isn’t clear what the subject of “together” is. Is it implying the accumulate_together function will return the accumulating results of multiple iterables together with the same given function or is it implying the accumulate together function will return the accumulating results of an iterable with multiple given functions?

“accumulate_n or multi_accumulate” → Same ambiguity as above, though it arises from wondering what “n” or “multi” is referring to here.

“aggregate” → In other contexts, it has the misleading implication that individual level data has been combined into high-level data. This would work well for a reduce function that takes an iterable and multiple functions and returns a collection of reduced results, but I think given reduce’s rare usage, the resulting function would be even more rare in usage.

In all honesty, I am probably being too semantic here.

On second thought, I think that there could be some neat albeit niche and few use cases for the “reducing” aggregate function that I mentioned here. For instance, in statistics, it’s pretty common to want to calculate the min, max, average, mean and standard deviation for a given distribution. Maybe the aggregate function could be a handy and elegant way of doing that, instead of having to load the entire distribution in memory and reducing for those statistics that way.

accumulate_many
or
tee_accumulate, after functools.tee. As you could look at it as you are using tee, then accumulate on each iterable.