Switching default multiprocessing context to "spawn" on POSIX as well

To answer myself, according to this message there is a set_forkserver_preload for fork server, which would mean that we’ll have to continue using “spawn” on macOS. That API is not documented in the multiprocessing documentation.

1 Like

I acknowledge that, and I disagree that “many” is a significant number at this point. I made the decision that the original warning I added would do more harm than good for now based on user experience combined with past Python warnings shipping in releases and showing up for the wrong users experience. Without evidence that the majority of Linux/BSD users are going to be broken as opposed to not care, I’m unlikely to change my mind.

Why did I decide that? Warnings that show up to the wrong people at the wrong point in time are much more harmful. When there are more wrong people and wrong times than the right people and right times, it shouldn’t be a warning. We’ve learned this lesson the hard way in the past. I really wanted that not to be the case when coding up this warning, but it already proved itself otherwise from projects testing against nightly builds.

The default start method was flipped on macOS in 3.8 without advanced notice and without users showing up with pitchforks. (we didn’t want to do it that way but had no choice).

I predict the amount of unpleasant surprise as likely to be far lower than the amount of non-broken user frustration that telling literally everybody on Linux to add a few lines of questionable logic to some already working code (not necessarily owned by them) that declares that they’d like it to keep working without seeing a warning will generate.

The lines of code that we could ask them to add in a 3.7-3.14 compatible way (as the short lived warning did) do not clearly match the intent we’d like to see expressed. There is no clean understandable way to express “I don’t care, the default is fine, give me my processes” as a multiprocessing start method context. Telling people to explicitly pick a specific start method would move the worlds code backwards: Most users do not care but cannot express that without forcing a potentially inferior over-specific choice. What we want is only for the few that do care to be explicit about their actual requirement.

The fix for anyone surprised when the default changes away from "fork" is to add 1 line of code to explicitly request it. I see this as the least disruptive way forward. It is normal to need to make some changes on occasion when adopting a new Python release, this will be one of them for that presumed minority of code.

There are still user helpful improvements that can be made to ease the default transition. We’ve still got time before 3.12 to explore these - in TODO list form:

  • 3.12 raises a DeprecationWarning from os.fork() on Linux & macOS when it is able to detect that the process is multithreaded. It is not guaranteed to be able to do so, but in common configurations it works. multiprocessing can trigger that warning internally. As is, this serves as a heads up to people already in sketchy situations - One reason why we’re changing the default in the first place.
  • As noted in the default change issue, error messages in common failure modes when switching some code from "fork" to "spawn" or "forkserver" could probably be made more informative to the user about potential reasons and actions they need to take.
  • It seems reasonable to finish up exposing an API to try to find out how many threads the process has and use that to add a more directly informative DeprecationWarning when the presence of threads in the multiprocessing-forking process is detected. Because those already get the new os.fork() warning. This way it’d be specific text tailored to what actions can be taken in the context of multiprocessing or concurrent.futures API use rather than blaming their internals without directed advice.
  • Consider adding the abstract ability to express what semantics are required or preferred from the start method rather than having code specify a specific start method.

I’ll probably be pasting that list into the issue or opening individual issues for some of that.

If we have a huge numbers of things breaking without the ability to adapt to this change during 3.14 alpha and beta period we can reconsider the default flip. I don’t expect that to happen. (Spanish inquisition meme goes here)

-gps

1 Like

We made the decision that DeprecationWarning was invisible by default precisely to avoid such issues… I struggle to understand why you’re making this argument.

Agreed that it’s an interesting data point… but “users showing up with pitchforks” is unhelpful hyperbole. Most users don’t show up at all when they suffer a regression; that doesn’t mean they didn’t suffer it.

That is true, but users are more likely to understand which line of code they have to add if a warning message instructs them precisely about that.

2 Likes

Sorry to revive this old conversation, but I ran into this problem with the new “spawn” method on macOS after I upgraded to macOS 14 (Sonoma) last week. An old piece of Python multiprocessing code that has worked for ten years on Linux and macOS suddenly stopped working on macOS. It throws the error described in the original post about the need to call “freeze_support”. I still have access to a Linux machine that I can run the program on, but it is a hassle to have to move my data from my desktop Mac to the other Linux machine and then bring back the results of the program.

I had to search quite a bit in Google before I found this post that explained what had changed. Someone in the thread mentioned that the users did not complain, but in my case the change to “spawn” only hit me in the last few days.