Add istarmap function. Combination between imap and starmap.

Feature or enhancement

In python multiprocessing module we have three built in functions:

  1. map - apply function to each element in iterable and return a list of the result.
  2. starmap - like map method but the element of the iterable are expected to be unpacked as arguments. It’s used for function with multiple parameters)
  3. imap- like map method but return an iterable result and not list. It’s used when we want to see the result of the functions before all the function are finish.

You can see reference of multiprocessing pool here.

I want to add a new method to multiprocessing module called istarmap.
istarmap will be combination of starmap and imap.
istarmap will get iterable that are expected to be unpacked as arguments like starmap.
istarmap will return and iterable of the result like imap.

Pitch

istarmap will be useful in same cases that imap is useful but will support unpacked arguments for function with multiple arguments like starmap.

You can see a question in StackOverflow exactly on this case here.

Example

import multiprocessing as mp
import time

def func(x, y):
    time.sleep(10)
    return f"{x=}, {y=}"

with mp.Pool() as pool:
    arguments = [(1, 2), (5,3), (8,4)]
    results = pool.istarmap(func, arguments)
    for r in results:
        print(r)

We can see here example of use of istarmap.
We have a function with multiple parameters and we want to work with multiprocess and print each result that we have to the screen.

In addiction, I would like to say that I will be happy to add this feature as my first contribute to cpython project.

2 Likes

istarmap won’t be adding any new functionality: you can wrap fn with a single-argument function which unpacked the arguments, then use imap:

def unpack(args):
    return fn(*args)

for r in pool.imap(unpack, arguments):
    print(r)

Of course this argument extends to the existing starmap, so really you’re arguing for consistency. In this regard, I would suggest soft-deprecating (ie removing from the documentation, but no change in API) starmap.

You can also use concurrent.futures.ProcessPoolExecutor with concurrent.futures.as_completed to get your proposed behaviour.

1 Like

This part in itself (presumably also genericized to accept the fn) is useful enough in enough contexts that I can imagine it (and the corresponding version for **kwargs, and some other such variants) being provided in a library.

Yes, it’s about as short as a function gets. But you still have to name it every time. Importing from the standard library is shorter and clearer. We get things like random.choices for a reason.

Won’t a lambda work here?

A lambda will work but the benefit of the itertools type functions is mainly for speed when you can compose builtins together without including any Python functions.

I have often wanted unpack but probably more like:

def unpack(f):
    def fn(args):
         return f(*args)
    return fn

Similarly the opposite

def pack(f):
    def fn(*args):
        return f(args)
    return fn

In the situations where I have wanted these it is generally because I wanted to compose builtins. This would be something like:

In [60]: data = [(1, 2, 3), (4, 5, 6)]

In [61]: totals = list(map(pack(sum), *data))

In [62]: totals
Out[62]: [5, 7, 9]

In this situation I compose many builtins: list, tuple, sum, map but then one Python function pack that is harder to optimise. I would prefer that to be an optimised builtin.

To me the speed of composing builtins is the reason for using itertools et al in the first place. Otherwise I think it’s usually clearer to write your code with things like list comprehensions:

In [64]: [data[0][i] + data[1][i] for i in range(len(data[0]))]
Out[64]: [5, 7, 9]

In [65]: [a + b for a, b in zip(*data)]
Out[65]: [5, 7, 9]
2 Likes