Concurrent features vs multiprocessing vs suprocess

acampove · March 5, 2024, 11:26am

Dear Experts,

I am having problems with memory usage and I would like to make sure that my program, which is meant to run several programs, releases memory as soon as possible. For this I am using something like:

with cf.ProcessPoolExecutor(max_workers=1) as executor:
    executor.submit(fun)

such that all the memory used by fun is released to be used by the next function. Is this the best approach? Are multiprocessing or suprocess a better way?

Cheers.

alicederyn · March 5, 2024, 11:48am

I would recommend using a memory profiler like Memray to see why your peak memory usage is so high before jumping into processes

acampove · March 5, 2024, 1:32pm

Dear Alice,

Thank you for your reply. The thing is that I have tried memray and had a hard time understanding these flamegraphs and going from there to actually finding the root of the problem. I just ran memray and from what I see:

I seem not to be going beyond 116Mb of memory usage, which is strange, given that when I run it in our computing cluster the usage goes beyond 8Gb.

Cheers.

effigies · March 5, 2024, 1:48pm

Computing clusters often have strict overcommit policies when multiple jobs are permitted to run on a single physical compute node. The OS generally can’t allow you to overcommit and still constrain your processes to stay within a memory limit.

If you’re using numpy, then the virtual memory usage from import numpy alone scales with the OMP_NUM_THREADS variable, which will default to the number of available cores if you don’t set it. Most of these pages never get loaded, so it’s more an accounting quirk than an actual problem on a development system, but it’s pretty rough on systems that don’t allow overcommitting.

acampove · March 5, 2024, 1:57pm

Dear Chris,

Thanks for your reply. I am not an expert on this issue, but from what you said and from:

I can conclude that I should do:

export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1

in the machine that will run my code, in the cluster, before starting the data processing. I will try that and see what happens.

Cheers.

acampove · March 12, 2024, 4:36pm

Dear Chris,

Those variables do not seem to be making any difference, I see:

and if I use htop I see:

for pretty much as long as the process runs. Does anyone have any other idea?

Cheers.