Concerns regarding deprecation of "fork()" with alive threads

gpshead · September 12, 2023, 12:03am

While that is a great thing for you to do, unfortunately on POSIX doing enough isn’t actually possible. It is impossible to safely code threads that can be used in an application that calls fork() in Python because the CPython runtime itself is by definition unsafe for use after fork(). No C API other than those listed as “async-signal-safe” in signal-safety(7) - Linux manual page and the related POSIX standard(s) can be presumed compatible with use in a forked child process. The CPython runtime requires way more.

“It seems to work well so far” is unfortunately what people have been living with for ages despite not accurately reflecting execution safety.

The new DeprecationWarning is intended to make developers realize that they have a potential problem to look into. Being a DeprecationWarning, it isn’t supposed to show up to end users of applications.

Q1: Is this showing up to end users of applications? Can you give us a reproducable example?

Q2: Would it be useful to improve the text of the DeprecationWarning coming from os.fork() to help developers who see it? Got suggestions?

Developers should respond by adjusting their use of multiprocessing or concurrent.futures to explicitly specify either the "forkserver" or "spawn" start methods via a context. So while I understand what you feel… The message you don’t want to hear is that you are not technically “Losing compatibility with fork()” because, without knowing, you never actually had it in the first place. The warning is in no way intended to be seen as blaming you!

The ultimate goal of the warning is to get Python developers who’s applications are already existing risky situations to be aware of that and consider appropriate corrective coding, such as if using multiprocessing or concurrent.futures, switch start methods ASAP.

Q3: Would silencing this warning and thus delaying developer knowledge they likely need to adapt their code not use the “fork” start method until the default multiprocessing start method can be changed away from “fork” per the deprecation process that has started help the world more than harm it?

Such code is already at risk and always has been. In some environments it already doesn’t work. This would be hiding a truthful warning from developers.

Unfortunately, there is no possible implementation of OS-based threading that could call a Python thread “safe” for use when something is forking. The CPython runtime internals will never be async-signal-safe for use after fork(). Thus any Python based thread is fundamentally unsafe at fork time. So adding APIs to mark Python threads as “okay for fork” vs “unknown” would be adding an API that cannot honestly make such guarantees because it doesn’t matter what the Python code design is, the very fact that it uses the Python VM in the thread at all is the problem. So no matter what we all wish were true, valid use cases aren’t possible.

Q4: So what’s the resolution here? The warning could be silenced until the default flip. BUT… That’d keep the existing hidden problem status quo of “your code has a potentially serious problem that you never realized” that existing code has had for ages and only occasionally been tripped up by when the execution environment or system libraries change - requiring a systems-level debugging session to attempt to understand the likely deadlock. What people have been relying on “seeming to work fine” already doesn’t work on various platform configurations; I expect that to increase over time.

Silencing an accurate warning is not without precedent. I originally had a much more verbose warning in place for any Linux user using the “fork” default context that we rapidly determined was too noisy, because most code just uses the default, and thus rolled it back.

In terms of deciding on whether to remove this os.fork() warning in the presence of threads in 3.12.0 or not, lets let our release manager @thomas make that call.