ProcessPoolExecutor() with *max_workers* set to default crashes on Rocks 7.0 (Manzanita)

I’m using Python 3.9.19 on Rocks 7.0 (Manzanita) , and when I do not manually set max_workers to a value less than 16, it crashes. I have 64 cpu cores. Could the issue be that it is just using too much memory and then crashing?

While debugging I set the number_workers parameter to 2 and it was working fine, just too slow, so using only 2 is not practically feasible for me.

I am currently running it with 8 workers and it is running fine for now.
I tried using 16 workers, it leads to the same issue.
The issue does not seem to be with cpython because when I run it in the base conda environment , it runs fine, atleast till my breakpoint in the code.On further testing if i receive any other crashes I will update them here.

“Default value of max_workers is changed to min(32, os.cpu_count() + 4).”

(Quoted from the official documentation)

In my case it is leading to the error message below with a value of 16 or more for max_workers.

Traceback (most recent call last):
File "/data1/xyz/Displace2024_baseline/speaker_diarization/SHARC_check/wespeaker/diar/spectral_clusterer.py", line 264, in main
for (subsegs, labels) in zip(subsegs_list,
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
for element in iterable:
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
yield fs.pop().result()
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 446, in result
return self.__get_result()
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Above is the error message I receive when I set the max_workers parameter to "default" or more than(or equal to) 16 while using my conda environment. If needed I can list the libraries installed in my environment.

When I logged my CPU usage , I got the following message:

Edit: Even in the base environment , it leads to the following issue , the CPU usage in my log does not even cross 55% so I really do not think its due to CPU overloading.

2024-06-10 10:18:34,426 ERROR:Error occurred: A process in the process pool was terminated abruptly while the future was running or pending.
Traceback (most recent call last):
  File "/data1/xyz/Displace2024_baseline/speaker_diarization/SHARC_check/wespeaker/diar/spectral_clusterer.py", line 287, in main
    for (subsegs, labels) in zip(subsegs_list,
  File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
2024-06-10 10:18:35,938 INFO:CPU usage: 50.2%

ProcessPoolExecutor might be using spawn this could lead to a lot of memory being needed (basically amount used by one process: M times the number of workers N).

Do you also hit issues using the executor with a simple hello world or only when using a big array?

From your image, it looks like gigabytes are needed. Are you running out of RAM?

I’ll try both of those things and let u know. I have not tried using the executor for a simple hello world yet , but I’ll try it out.

Have you tested this with any model other than multiprocessing? numpy is multithreading-friendly.

I do not understand what you mean exactly by “Have you tested this with any model other than multiprocessing?”, could you clarify?

The multiprocessing module is only one way to spread work over multiple CPU cores, and it’s the most isolated - which also means the most memory-hungry. Alternatives include the multithreading module (or ThreadPoolExecutor), and simply letting numpy itself handle the threads (write your own code as single-threaded). It’s entirely possible that one or both of these will be capable of doing what you want; it’s also entirely possible that they won’t, but IMO it’s still worth testing.

1 Like

I am using the ThreadPoolExecutor library currently , I was thinking of trying out the multiprocessing module at a later time, due to the issue mentioned in my post which I cannot seem to solve.

I am not running out of RAM , I have logged my RAM usage and it is all around 2.5GB , the system I am working on has 100’s of GB. I also noticed this issue to be probabilistic, the program crashes sometimes with max_workers set to 8 but does not during another run with the exact same parameters.