Switching default multiprocessing context to "spawn" on POSIX as well

itamarst · December 12, 2022, 10:23pm

Hi, I wanted to have a more public discussion of multiprocessing's default start method of fork()-without-exec() is broken · Issue #84559 · python/cpython · GitHub, since it’s been open for a while without a final decision.

The problem

Right now, on Windows and macOS the default multiprocessing context is “spawn”. On Linux and other POSIX platforms, it’s “fork”.

The problem:

fork() without execve() is fundamentally broken when threads are in use (see below).
This is an implementation detail many users aren’t aware of.
Many libraries use threads under the hood. Anytime you import NumPy for example, there’s a thread pool in the background. So even in the unlikely event people notice this warning in the documentation, they might still have deadlocks and not know why. Most recent examples are PyTorch and grpc: see the comments in the issue for links.

The result then is users who have unexplained, mysterious deadlocks, possibly due to third party code they didn’t even write themselves. Things that break at a distance are no fun.

Why is fork() without execve() broken?

When fork() happens, all threads from the parent no longer exist in the child. This means that:

Any locks that don’t specifically handle this situation may now be locked, leading to deadlocks.
The data protected by that lock may only be partially updated, so it may no longer be semantically valid even if the lock is manually released post-fork(). Conceivably it could just be complete garbage.

So if you have C libraries that start thread pools you might for example end up with a locked, corrupted static work queue. When the subprocess tries to start things up again, it won’t go well.

A solution

Switching to “spawn” as the default method would fix this.

The cost, of course, is that this will result in some backwards incompatibility:

There is some performance impact in some situations.
“spawn” has some requirements re if __name__ == '__main__' if you’re using a single script to run everything; “fork” doesn’t have this requirement.

However, the error in this case is fairly straightforward:

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

So the experience would be “upgrade to Python ~~3.12~~ 3.14, get an error message, grumble, follow the instructions and fix the code, move on”.

In contrast, the current failure mode is your Python process deadlocking at random, with no explanation. This can be impossible to debug for some people.

So both in the current situation and in the proposed change some people have problems, but the kind of problem will be much less significant and much easier to fix. And for those people who really want “fork”, it will still be available.

(As a side benefit, in the current situation code written on Linux might fail with that RuntimeError on macOS/Windows; that will no longer be the case.)

What do you all think?

barry · December 13, 2022, 12:27am

ObBlog: How fork without exec is broken on macOS.

thomas · December 13, 2022, 12:51am

It would have to be 3.14, with a DeprecationWarning about the default being used in 3.12/3.13. I agree that mixing fork and threads is a real and all-to-common problem.

In addition, what about adding a warning to multiprocessing.popen_fork’s use of fork() – or even just os.fork() – when Python knows there are other threads? This wouldn’t catch all cases of code mixing fork with threads, but I imagine it could catch quite a few of them.

gpshead · December 13, 2022, 9:20pm

I’m fine with us shifting the default to ‘spawn’ for the posix platforms. Using a standard deprecation cycle with a warning shown when the default rather than explicitly set start method is used. Various issues exist about this kind of thing. Lets use multiprocessing's default start method of fork()-without-exec() is broken · Issue #84559 · python/cpython · GitHub to track that.

As for warning on fork when we know of threads… I’ve thought about that many times. I always got hung up with the worry that a partial signal that only warns in some cases could be a misleading sense of security with people assuming no warning means “all good”. It turns out that determining if there are any other threads running in a process is non-trivial on many platforms. It can be done some of the time, but not all of the time. If we’re happy with a best effort attempt to quickly determine if threads exist as a warning trigger, we could implement it (cost: slowing down os.fork() - something that was once simple).

guido · December 13, 2022, 9:48pm

We all know that the absence of errors or warnings should never be seen as proof that code is correct. I would be more worried about the case where a warning is printed when there is nothing wrong with the problem.

In the case of threads, as long as the warning doesn’t trigger when there are no other threads, it seems that the warning will be useful even if it cannot catch all cases where threads are created by 3rd party extensions/libraries.

ppwwyyxx · December 13, 2022, 9:48pm

Want to add an extra point of incompatibility:

“spawn” requires more user code to be pickleable than “fork”. When switching to “spawn” I often see errors about how some closures are not pickleable. When this happens users don’t get a clear error message AFAIK.

petersuter · December 13, 2022, 11:01pm

+1 This has been a big source of problems for years in many scientific projects, that are very difficult to understand and fix when first running into it.

Random examples:

Making more things pickleable by default would maybe be a good idea to reduce the burden of the incompatibility change. E.g. using pathos / dill pickle extensions GitHub - uqfoundation/dill: serialize all of python is one common solution.

pitrou · December 14, 2022, 2:19pm

I don’t understand why you’re proposing “spawn” here instead of “forkserver”, which is similarly safe but provides much better performance.

I’ll add that while threads and fork() are a bad mix, there are ways to make the mix reasonably safe. POSIX even provides an API to help with that (see pthread_atfork).

itamarst · December 14, 2022, 3:30pm

“forkserver” is constrained enough that it’s probably fine, and it would mitigate some of the performance impacts, yes. So might be better than “spawn”, yes.
Theoretically there are ways to make threads + fork() safe, yes, but random Python users aren’t going to be using pthread_atfork (nor will they be able to use it to fix other libraries, usually), and many library authors aren’t using it for their native code.

Even CPython isn’t making fork() safe when interacting with threads. Consider this program that deadlocks pretty reliably for me on Linux:

import threading
import sys
from multiprocessing import Process

def thread1():
    for i in range(1000):
        print("hello", file=sys.stderr)

threading.Thread(target=thread1).start()

def foo():
    pass

Process(target=foo).start()

So from a Python user’s perspective, that theoretical possibility isn’t very helpful.

pitrou · December 14, 2022, 4:13pm

Agreed! I was just mentioning this for the sake of completeness.

gpshead · December 15, 2022, 5:04pm

Consistent behavior across all platforms seems nice. I’d also be okay with forkserver.

Rosuav · December 15, 2022, 5:53pm

Consistency is nice, but it’s always felt like Windows was the more restrictive option - “can’t use multiprocessing on Windows unless you follow these rules” - and now it’s going to be a hard-and-fast rule for all multiprocessing. That introduces a barrier to anyone who’s trying to compare or migrate threaded code; currently, you could pretty much switch things out one-for-one and get equivalent behaviour, and thus compare their differences. Now, there’s another thing that absolutely has to be done.

It may be a necessary evil, but IMO it’s an evil nonetheless.

pitrou · December 15, 2022, 6:01pm

The visible behavior for “forkserver” as well as “spawn” is that process targets and arguments need to be picklable (unlike “fork”). So in that sense they’re similar. “forkserver” should mostly be a transparent performance improvement over “spawn”.

itamarst · December 15, 2022, 6:04pm

There are some potential edge cases where "forkserver" can result in threads being started before fork():

Python has an API, ForkServer.set_forkserver_preload(), to add modules to import in the parent process that gets fork()ed for workers; I’ve seen people recommend this specifically for large scientific computing/numeric libraries. It’s not documented in the Python documentation though, perhaps it could just be deprecated too.
__main__ is always preloaded by default, for some definition of __main__. I am not certain whether or not this is a problem; it’s a bunch of code and I don’t really understand what it’s doing at first glance (there’s even a one-off hack to handle an ipython bug from 2013!).
Code that runs via .pth files; I do this for the Python profiler I’m working on, I believe manhole does this. This is fairly rare I suspect. My profiler does make an effort to handle fork() without execve().

gpshead · December 15, 2022, 7:41pm

set_forkserver_preload() should just be documented. Regardless anyone calling that should be aware of what cannot be done before forking and we can cover that in the docs and generate a relevant Warning if we detect threads started after the preload.

Otherwise for most purposes, forkserver and spawn behave the same by default with the same restrictions.

gpshead · February 7, 2023, 10:21pm

My PRs to begin the deprecation process for the fork default are in. Using a DeprecationWarning for fork when used as the default wound up too disruptive to the wrong people (thanks for testing on the nightlys!) so that bit was reverted and it’s a documentation only deprecation for now.

multiprocessing.set_forkserver_preload() is now documented in 3.12 (with an accurate versionadded of 3.4). If someone wants to backport that multiprocessing doc update to the 3.11 and earlier docs, feel free, I can approve the PRs.

My gut feeling is that “forkserver” is a good default for non-macOS POSIX systems when 3.14 rolls around. We could make it “spawn” for complete consistency, but given how long we’ve lived with the default being “fork” and not having most people complain about the issues it can have, forkserver makes a great compromise: most of the performance by default, but safe when the main process is threaded. That is probably best for the world.

We don’t have to make that exact default start method decision until 2025 before 3.14 ships.

pitrou · February 8, 2023, 10:42am

As expressed on the PR, I disagree with this decision and I would like the reversion to be reverted. A forthcoming breaking change should not be getting only a mention in the documentation. This is going to surprise many people when the trigger is finally pulled.

ronaldoussoren · February 9, 2023, 2:45pm

It might also be a good default on macOS, unless users can schedule for code to run in the fork server proces itself. That is something that needs to be tested though.

The problems with the “fork” spawn method on macOS arise from using higher-level APIs, just using low-level posix APIs should be fine (famous last words…)

gpshead · February 9, 2023, 8:26pm

macOS already uses spawn by default in 3.8+ because macOS is incompatible with fork() as its system libraries spawn threads. fork and forkserver still exist on macOS because we never disabled them there, but anyone using them there is likely to experience problems.

ronaldoussoren · February 10, 2023, 2:23pm

Has anyone tested this for “forkserver”? I know there are problems with “fork”, but that’s because higher-level APIs (approximately anything other than POSIX-APIs) cause problems. Multiprocessing switched to “spawn” to avoid those problems.

AFAIK should be safe if (a) the intermediate “fork server” process itself is started using “spawn” and (b) there is no way to load code into the “fork server” process. When those restrictions are met we have full control over the “fork server” proces and can be sure that it doesn’t anything that’s considered dodgy on macOS.

The reason I mention this is that “fork server” is documented as being faster than “spawn”.