In the documentation for concurrent.futures.ProcessPoolExecutor, it is mentioned that, by default, “worker processes will live as long as the pool”. I want to know the PIDs of the worker processes (in my use case, I want the main process to log the worker PIDs to stdout so that I can eventually monitor the workers using 3rd-party system monitoring tools from top to py-spy). Right now, I see two limitations:
- Worker processes are not started until they are necessary. This means that they only even have a PID assigned after I first call
submit
ormap
. - I can find no way of retrieving the current PIDs of the workers. I thought about
executor.map(os.getpid, range(N_WORKERS))
, but I’d have no guarantee that the map would be bijective (e.g. the first worker can be re-used if it runs os.getpid fast enough).
I’m working around (1) by calling executor._adjust_process_count()
to force starting all N_WORKERS. (I know, private API, and I also know that it does what I want in Python 3.7 but not anymore, and I haven’t researched when this changed, and I know 3.7 is EOL, etc.) Also, I’m working around (2) by looking at executor._processes.keys()
after all workers are up (I know, private API again).
How should I go about making PIDs available, but using only supported API?