Differences between `Pool.map`, `Pool.apply`, and `Pool.apply_async`

bsn · January 11, 2021, 2:55am

I read the following sentence “The Pool.map and Pool.apply will lock the main program until all processes are finished, which is quite useful if we want to obtain results in a particular order for certain applications.
In contrast, the async variants will submit all processes at once and retrieve the results as soon as they are finished.” from this link: An introduction to parallel programming using Python's multiprocessing module
Yet, I found it not easy to understand. I was wondering, could anyone please provide a simple example to illustrate the point?

EpicWink · January 13, 2021, 6:42am

The async variants return a promise of the result. Pool.apply_async and Pool.map_async return an object immediately after calling, even though the function hasn’t finished running. This object has a get method which will wait for the function to finish, then return the function’s result.

Pool.apply: when you need to run a function in another process for some reason (and you want to use a process pool instead of creating a new process to run the function).
Pool.map: run a function over a set of arguments in parallel.
Pool.apply_async: run a function in another process, but allow the main thread to keep running. Use this when you don’t need the result right now.
Pool.map_async: run a function over a list of arguments in parallel, but allow the main thread to keep running. Use this when you don’t need the results right now.

Of further note is Pool.imap_unordered, which is like running Pool.apply_async over a list of arguments, and acting on each result-promise as they arrive.

Here is a tabulated example:

from time import sleep
from multiprocessing import Pool

def f(t):
    sleep(t)
    return t

p = Pool()

call	result	took (s)
`p.apply(f, (0.1,))`	`0.1`	0.102
`p.map(f, [0.3, 0.1, 0.2])`	`[0.3, 0.1, 0.2]`	0.302
`r = p.apply_async(f, (0.1,))`	`<ApplyResult object>`	0.0
`r.get()`	`0.1`	0.104
`r = p.map_async(f, [0.3, 0.1, 0.2])`	`<MapResult object>`	0.0
`r.get()`	`[0.3, 0.1, 0.2]`	0.303
`r = p.imap(f, [0.3, 0.1, 0.2])`	`<IMapIterator object>`	0.0
`list(r)`	`[0.3, 0.1, 0.2]`	0.302
`r = p.imap_unordered(f, [0.3, 0.1, 0.2])`	`<IMapUnorderedIterator object>`	0.0
`list(r)`	`[0.1, 0.2, 0.3]`	0.302

Note the result of list(p.imap_unordered(...) is not in the same order: you can act on finished calls as they arrive (eg for logging).

SiqingYu · January 13, 2021, 5:02pm

As @EpicWink has well illustrated. I also want to put the simple concepts into context:

Pool.apply and Pool.map are blocking, meaning when you are calling them, you have to wait until the processes are finished.
Pool.apply_async and Pool.map_async are asynchronous. You don’t wait for them to return the executed processes’ results to you, but instead a temporary result (AsyncResult) immediately. But when they do finish, you can call get(), ready() and successful() on AsyncResult.