Get worker PIDs from ProcessPoolExecutor

In the documentation for concurrent.futures.ProcessPoolExecutor, it is mentioned that, by default, “worker processes will live as long as the pool”. I want to know the PIDs of the worker processes (in my use case, I want the main process to log the worker PIDs to stdout so that I can eventually monitor the workers using 3rd-party system monitoring tools from top to py-spy). Right now, I see two limitations:

  1. Worker processes are not started until they are necessary. This means that they only even have a PID assigned after I first call submit or map.
  2. I can find no way of retrieving the current PIDs of the workers. I thought about executor.map(os.getpid, range(N_WORKERS)), but I’d have no guarantee that the map would be bijective (e.g. the first worker can be re-used if it runs os.getpid fast enough).

I’m working around (1) by calling executor._adjust_process_count() to force starting all N_WORKERS. (I know, private API, and I also know that it does what I want in Python 3.7 but not anymore, and I haven’t researched when this changed, and I know 3.7 is EOL, etc.) Also, I’m working around (2) by looking at executor._processes.keys() after all workers are up (I know, private API again).

How should I go about making PIDs available, but using only supported API?

in the initializer of the task you may ask os.getpid() (Windows) like

import time
import random
import os

QUEUE = None
ABORT = None

def initializer(queue, abort):
    print(os.getpid())
    global QUEUE, ABORT
    QUEUE = queue
    ABORT = abort

class Task():
    def __init__(self, number):
        self.number = number
        self.sleep = random.randint(1, 10) / 10
        self.rate = int(random.randint(10, 100) / 10)

    def __call__(self):
        completed = 0
        while completed < 100:
            time.sleep(self.sleep)
            completed = completed + self.rate
            QUEUE.put((self.number, completed))
            if ABORT.is_set():
                break
        return completed
1 Like

So obvious in hindsight, thanks!

But still, I’d rather not have to wait until the whole pool has started. I still can’t think of a method that forces all workers to spawn so that I can gather all of those PIDs. I guess I’ll spawn a thread that waits on a message from each of the initializers, and when it has gathered all N PIDs it can log them all at once to stdout.

here is a visualization of the tasks progress (in wxPython) :rofl: