Use case
Deep within the codebase, I want to load a file, split it into parts, process it in parallel and return some dicts to the parent.
Problem
All imports from the module where I submit the tasks are being re-loaded, some of which have a large memory footprint/are slow to load.
Python supports different start methods in the multiprocessing library:
fork is fast but can lead to deadlocks.
forkserver and spawn are safer, but all imports will still be re-run (once for forkserver if set_forkserver_preload is used , N times for spawn).
This makes me think, I don’t really need any of the context from the parent process in my subprocess. I don’t want that baggage. My subprocesses just need to import the processing function once, and then process the content each time data is received.
Is there a lightweight way of doing that? Is this be what PEP 734 would have been for?
The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
Sounds like you need to carefully inspect your package structure, payload and data to understand why unwanted “context” is imported.
# main.py
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor, as_completed
from . import slow_import # noqa
from .helper import hello
if __name__ == '__main__':
mp.set_start_method('spawn')
with ProcessPoolExecutor(max_workers=3) as pool:
futures = [pool.submit(hello, i) for i in range(3)]
for future in as_completed(futures):
print(future.result())
There’s an example in the documentation, although I think you have to modify the call to find_spec and pass in package=__package__ to use relative imports.
Also, keep in mind (apologies if I’m pointing out the obvious) that you only have to pay this cost once per worker, not for each submitted task. In your example you’re creating 3 workers and submitting 3 tasks and being subjected to 3 slow imports, but if you were to keep the pool around and submit 1000 tasks, the slow import would still only occur the first 3 times.
I dont think this plays well with type annotations (one will get into the if TYPE_CHEKING: world, etc).
My point is that this feels like a gap in the python standard library. I should not need to do convoluted things to avoid reimporting all the modules when wanting to interact with a subprocess.
As @pulkin shows with this “early return” code pointer, what I think should be provided is more granular control when starting a process. But I am still wondering whether I am missing something obvious that allows one to do that.
Joking aside, why shouldn’t I be able spawn a simple clean process without refactoring a messy app? One shouldn’t need to design for every possible requirement upfront.
I think it is a valid concern: documentation says that it is tight on inheriting parent resources while it seems to import the main module unnecessarily.
Important disclaimer: this is valid for a specific example on my platform and in my python.
well, modules are the basic organizational entities in Python (namespace etc) and (usually) hold common functinonality together (they are definitely not just a collection of code)
multiprocressing cannot know what (global) state your application needs setup for the functions you are using to work correctly - so it executes the main module[1] in the hopes that this sets up all needed state in the new process.
It is now on you, the user of multiprocressing to minimize the work being done. This means moving unnecessary, slow imports in your main file into the if __name__ == '__main__' block - this is not something the python stdlib can do for you.
IMO the non configurable behavior to run parent’s main in worker process (_fixup_main_from_name_fixup_main_from_path in multiprocessing/spawn.py) is more or less unfriendly. If it was not the only possible option (I can’t tell), I would say it’s assuming too much.
I think we can all agree it’s impossible to automatically import what are strictly needed.
But not importing enough can be worked around by a user. Importing too many cannot.
Ok, but then it isn’t your program and the overhead is not your problem. You just need to document “this library uses multiprocessing, make sure your main is compatible with that”.
I don’t think that is the right answer, it can lead to frustrations and extra work. It seems better to allow the developer to opt-in to more control.
Can we have an example from you to work around OP’s case?
Unless of course it requires a rewrite of parent’s entrypoint, because not everyone gets to control the entrypoint. There are cases like library authors, plugin authors, framework users etc.