I’m using Python 3.9.19 on Rocks 7.0 (Manzanita) , and when I do not manually set max_workers
to a value less than 16, it crashes. I have 64 cpu cores. Could the issue be that it is just using too much memory and then crashing?
While debugging I set the number_workers
parameter to 2 and it was working fine, just too slow, so using only 2 is not practically feasible for me.
I am currently running it with 8 workers and it is running fine for now.
I tried using 16 workers, it leads to the same issue.
The issue does not seem to be with cpython because when I run it in the base conda environment , it runs fine, atleast till my breakpoint in the code.On further testing if i receive any other crashes I will update them here.
“Default value of
max_workers
is changed tomin(32, os.cpu_count() + 4)
.”(Quoted from the official documentation)
In my case it is leading to the error message below with a value of 16 or more for max_workers
.
Traceback (most recent call last):
File "/data1/xyz/Displace2024_baseline/speaker_diarization/SHARC_check/wespeaker/diar/spectral_clusterer.py", line 264, in main
for (subsegs, labels) in zip(subsegs_list,
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
for element in iterable:
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
yield fs.pop().result()
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 446, in result
return self.__get_result()
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Above is the error message I receive when I set the max_workers
parameter to "default"
or more than(or equal to) 16 while using my conda environment. If needed I can list the libraries installed in my environment.
When I logged my CPU usage , I got the following message:
Edit: Even in the base environment , it leads to the following issue , the CPU usage in my log does not even cross 55% so I really do not think its due to CPU overloading.
2024-06-10 10:18:34,426 ERROR:Error occurred: A process in the process pool was terminated abruptly while the future was running or pending.
Traceback (most recent call last):
File "/data1/xyz/Displace2024_baseline/speaker_diarization/SHARC_check/wespeaker/diar/spectral_clusterer.py", line 287, in main
for (subsegs, labels) in zip(subsegs_list,
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
for element in iterable:
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
yield fs.pop().result()
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 446, in result
return self.__get_result()
File "/data1/xyz/.conda/envs/wespeaker/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
2024-06-10 10:18:35,938 INFO:CPU usage: 50.2%