Get worker PIDs from ProcessPoolExecutor

fonini · July 9, 2023, 11:01pm

In the documentation for concurrent.futures.ProcessPoolExecutor, it is mentioned that, by default, “worker processes will live as long as the pool”. I want to know the PIDs of the worker processes (in my use case, I want the main process to log the worker PIDs to stdout so that I can eventually monitor the workers using 3rd-party system monitoring tools from top to py-spy). Right now, I see two limitations:

Worker processes are not started until they are necessary. This means that they only even have a PID assigned after I first call submit or map.
I can find no way of retrieving the current PIDs of the workers. I thought about executor.map(os.getpid, range(N_WORKERS)), but I’d have no guarantee that the map would be bijective (e.g. the first worker can be re-used if it runs os.getpid fast enough).

I’m working around (1) by calling executor._adjust_process_count() to force starting all N_WORKERS. (I know, private API, and I also know that it does what I want in Python 3.7 but not anymore, and I haven’t researched when this changed, and I know 3.7 is EOL, etc.) Also, I’m working around (2) by looking at executor._processes.keys() after all workers are up (I know, private API again).

How should I go about making PIDs available, but using only supported API?

da-dada · July 10, 2023, 10:32pm

in the initializer of the task you may ask os.getpid() (Windows) like

import time
import random
import os

QUEUE = None
ABORT = None

def initializer(queue, abort):
    print(os.getpid())
    global QUEUE, ABORT
    QUEUE = queue
    ABORT = abort

class Task():
    def __init__(self, number):
        self.number = number
        self.sleep = random.randint(1, 10) / 10
        self.rate = int(random.randint(10, 100) / 10)

    def __call__(self):
        completed = 0
        while completed < 100:
            time.sleep(self.sleep)
            completed = completed + self.rate
            QUEUE.put((self.number, completed))
            if ABORT.is_set():
                break
        return completed

fonini · July 10, 2023, 10:38pm

So obvious in hindsight, thanks!

But still, I’d rather not have to wait until the whole pool has started. I still can’t think of a method that forces all workers to spawn so that I can gather all of those PIDs. I guess I’ll spawn a thread that waits on a message from each of the initializers, and when it has gathered all N PIDs it can log them all at once to stdout.

da-dada · July 11, 2023, 6:38pm

here is a visualization of the tasks progress (in wxPython)

Topic		Replies	Views
concurrent.futures.ProcessPoolExecutor doesn't run on Windows Server Python Help	5	2600	June 2, 2023
Concurrent usage of concurrent.futures.ThreadPoolExecutor Python Help	2	949	December 8, 2023
How to break out of a Thread/ProcessPoolExecutor Python Help help	3	354	March 3, 2024
As_completed not yielding futures that raised until pool exits? Python Help	1	938	November 30, 2023
Is the .add_done_callback in Future thread-safe? Python Help documentation , help	0	149	April 18, 2024

Get worker PIDs from ProcessPoolExecutor

Related Topics