concurrent.futures.ProcessPoolExecutor doesn't run on Windows Server

Odyssey2001 · May 31, 2023, 5:24pm

Hello everyone

I have an issue with multiprocessing and I would appreciate if you could kindly make some clarification.

Consider the following code:

import time
import concurrent.futures


def say_hello(proc_num):
    time.sleep(5)
    print("Hello World")
    time.sleep(5)


def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = executor.map(say_hello, [1, 2, 3, 4, 5, 6, 7, 8])
        
        for result in results:
            pass


if __name__ == "__main__":
    main()

The above code creates 8 processes, where each one waits 5 seconds and then prints a “Hello world” message and again waits five seconds. This is indeed just a toy program because the real project is very huge to write here, nonetheless it reflects the idea.

This program works perfectly well and runs as expected on a normal PC, a laptop with:

OS version : Windows 11, 
Memory : 16 GB RAM
Python version: 3.7 (64 bits)
import os; os.cpu_count() -------> 8

So when I run the above program, if I do CTRL+ALT+DELETE right after that, I can see the 8 created processes in the task list during the execution.

Now if I run the very same program on our datalab environment which is a far more powerful shared environment with the following characteristics:

OS: Windows Server
Memory: 320 GB RAM
Python version: 3.7.9
import os; os.cpu_count() ------------> 64

Then I see a very strange behaviour, instead of 8 processes, dozens of processes are created and non of them do anything, also I get the following error message:

Exception in thread QueueManagerThread:
Traceback (most recent call last):
  File "C:\Program Files\Python37\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python37\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Program Files\Python37\lib\concurrent\futures\process.py", line 361, in _queue_management_worker
    ready = wait(readers + worker_sentinels)
  File "C:\Program Files\Python37\lib\multiprocessing\connection.py", line 869, in wait
    ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
  File "C:\Program Files\Python37\lib\multiprocessing\connection.py", line 801, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 63

After Googling about this error message, it seems that there is an issue in Python in Windows Server with machines having a processor with a lot of (logical) cores. There is already an issue: n° 71090 on the Github.

But I’m not sure whether I understood properly. In this page, someone had suggested to use multiprocessing.Pool instead of concurrent.futures.ProcessPoolExecutor. Actually I gave it a try but the result was not good. The new version with multiprocessing.Pool was extremely slow compared to concurrent.futures.ProcessPoolExecutor. Besides among created processes, almost only one process was working all the time and others remained rather idle.

Therefore, I preferred to ask the question here, and to see, whether others have already encountered the same problem and what solutions they might have found so far to tackle this problem.

Thanks in advance

Rosuav · May 31, 2023, 5:35pm

Hmm, this is definitely interesting; does the problem still occur on more recent versions of Python?

It’s probably not relevant but I am curious as to why your laptop’s Python reports that it’s a 64-bit one, but the server’s Python doesn’t. What’s the value of sys.maxsize on each of those Pythons? I’d expect 9223372036854775807 on the laptop; it’s probably the same on the server.

Odyssey2001 · May 31, 2023, 6:01pm

I think it’s 64 bits on both sides. It was me who added (64 bits) in my comment above, but both environments have the same major version of Python: 3.7.

The problem is that we have very limited access rights on our datalab environment and in particular, we cannot even create our own Python virtual environments. Indeed I thougth maybe the problem has already been resolved in more recent versions.

Tomorrow, I am going to ask the administrator to see whether there could be any possibility of having a newer version of Python.

Rosuav · May 31, 2023, 6:09pm

Yeah, I thought it would most likely be like that.

Cool cool. If it helps with the request, Python 3.7 has moved beyond binary releases, and is soon to stop having source releases too (the final source-only security update to Python 3.7 is scheduled for June 2023). The server is quoting its version as 3.7.9, so unless that’s a modified version, that hasn’t had even security updates since 2020. This may help the cost-benefit analysis push towards using a newer and fully supported version of Python.

Of course, I still don’t know whether this will actually make a difference to the issue you’re seeing; but given the rapid improvements to asyncio and concurrent.futures during the last few Python versions, it wouldn’t surprise me.

Odyssey2001 · June 2, 2023, 4:28pm

I can confirm that the problem is related to older versions of Python, at least, the version 3.7 in my case. Today our administrator gave me access to a Conda environment with Python 3.10 in our datalab envrionment and everything worked pretty well without even modifying anything at all in my code. So for those who may have encountered the same problem and read this thread I suggest that you try to run your code with a more recent version of Python. Thanks for the help and this discussion.

Rosuav · June 2, 2023, 9:07pm

That’s great to know. It still might be of curiosity value to delve into this, but your main system should be working now.

Plus, woohoo, bonus new features from three Python versions!