is nesting processes bad? i am launching multiple tasks in different processes, but some of those tasks can use further parallelism within the tasks themselves, the easiest way (and it works) is to create another more processpoolexecutor within the code and submit subtasks from a task
here’s some example code that works, is this a do or don’t? i can’t find much online about if this is bad
note, i am doing this because it’s not possible to pass the same executor as a argument, if i pass the same executor it will raise an error because it cant be pickled
import os
import time
from concurrent.futures import ProcessPoolExecutor
def c():
print('I am c():', os.getpid())
time.sleep(1)
def b():
print('I am b():', os.getpid())
with ProcessPoolExecutor() as executor:
# more parallelism is needed in this task,
# start even more processes
for i in range(5):
executor.submit(c)
def a():
with ProcessPoolExecutor() as executor:
# start tasks for some work
futures = [executor.submit(b) for i in range(3)]
for future in futures:
future.result()
if __name__ == '__main__':
a()
It’s not a huge problem, if it’s working. The main thing I’d worry about is launching so much stuff at once that you use all your memory and there are too many jobs for the number of cores you have. If the OS is spending a lot of time switching between tasks it will be slower than it would be otherwise. If you run out of memory things get really slow, or crash. It can be tricky to hit the sweet spot of maximizing CPU usage without hitting some other slow down.
In general, I try to stick to a single level of multiprocessing, and specify the number of processes in ProcessPoolExecutor so I have control over how many cores I will use at once. But that’s just because it keeps things simple.
i’ll expand on my scenario, i have function cpu_heavy(),
the other functions are wraps_cpu_heavy_a(), wraps_cpu_heavy_b() and wraps_cpu_heavy_c()
in my main() i take a list of arguments and run one of the wrapper functions for each argument in a subprocess.
however wraps_cpu_heavy_c() takes its argument and splits it up even further into smaller chunks, then calls cpu_heavy() multiple times, here i found that if i parallelize each call to cpu_heavy() i gain tons of speed.
i am not worried about resource exhaustion, more some rare bugs or locks that could happen, but it sounds like i’m good!
It sounds like you have a good handle on how many processes you’re launching, which makes it easier to make sure you stay inside resource constraints.
If you’ve found that cpu_heavy is the real culprit in terms of performance, it might be possible to rewrite your code so that you only really need to make subprocesses for that level–i.e. you don’t gain very much by giving the wrapping functions their own process, because they aren’t doing much on their own.
This might not make any difference in performance, but it could simplify your code.
yes that’s right, i will do that. i see how i can refactor it to create only one executor, it will require some work but is probably for the best. thanks a ton!