WHY does multiprocessing not work properly in `__main__.py`

The following is the first example in “multiprociessing” page of Python doc.

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

I have a project root with directory:

-- __main__.py
-- file.py

Copy paste the above example code into each file.

Running python file.py or python -m file works; prints [1,4,9].

But running python foo or python -m foo doesn’t. And will actually trap you in an “infinite KeyboardInterrupt loop” or something:

AttributeError: Can't get attribute 'f' on <module '__main__' (built-in)>
--- Ctrl + C ---
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-7:
Process SpawnPoolWorker-6:
Process SpawnPoolWorker-5:
Process SpawnPoolWorker-4:
<Some multi-line TraceBack>
--- Ctrl + C ---
And you are stuck forever with "Process SpawnPoolWorker-N" where N keeps increasing.

I just want to know why __main__.py is considered a special case.

From the “Using a pool of workers” section there is a note:

This means that some examples, such as the multiprocessing.pool.Pool examples will not work in the interactive interpreter.

I know that __file__ doesn’t exist in an interactive interpreter:

> python
# enters interpreter
> __file__
NameError: name '__file__' is not defined

but it exists when running a file with python path/to/file.py

So I wondered if __main__.py is somehow treated as an interpreter, but obviously it’s not and prints the path to the file: path/to/__main__.py.

Please tell me what’s going on. I mean, if this multiprocessing package is treating __main__.py differently, it should say somewhere in the documentation. Or if there is a page that explains how __main__.py is treated differently which could cause multiprocessing package to treat it differently and I missed it.

I’ve read a doc page on __main__ but it doesn’t point out any edge case of __main__.py.

I’m dying to know please help.

When a script is run directly, __name__ is set to __main__, whereas when that same script is imported __name__ is set to the script’s name minus the extension. Therefore, if you name the script __main__, there’s no way it can tell whether it was called directly or imported.

The multiprocessing module imports the script in the subprocesses it starts, so any initialisation that should be done only in the main script and not in the subprocesses should be protected by the if __name__ == '__main__': idiom. Naming the script __main__ will lead the subprocesses to think they are the main process and should start new subprocesses.

I guess this explains why I get Process SpawnPoolWorkder-N where N continues to increase when I try to Exit, and why I can’t exit from this. Python “multiprocessing” doc warns about this:

(If you try this it will actually output three full tracebacks interleaved in a semi-random fashion, and then you may have to stop the parent process somehow.)

It’s interesting how naming a file __main__.py has the same effect as not having the “main guard” (if __name__ == "__main__". I guess it’s because when a subprocess is created, the __name__ for that subprocess would be "__main__" if the file name is __main__.py but for file.py, __name__ == "__main__" only when run as a top level script, but when multiprocessing spawns a subprocess, __name__ is file or something?