I appreciate all your responses; they have helped me gain a better understanding of the rationale behind the deprecation. Thank you very much.
Initially, I believed that the combination of Lock
and os.register_at_fork()
was a safe solution, especially considering its early use by CPython to address issues related to multi-threading and forking: bpo-6721: Hold logging locks across fork().
However, it has become clear that this solution is challenging to implement correctly and can lead to subtle deadlocks (64aa6d2
, 4c3da78
, 99b00ef
).
As a result, I concluded that the deprecation of multi-threading and fork had occurred because of its high error-prone nature.
Now, I understand from this discussion that this workaround is actually not compliant with the latest POSIX standards (see pthread_atfork(3)
, pthread_atfork(3p)
, fork(2)
, fork(3p)
and also this defect report). The problem is summarized in the z/OS documentation here:
The intended purpose of pthread_atfork()
is to provide a mechanism for maintaining the consistency of mutex locks between parent and child processes. The handlers are expected to be straightforward programs, designed simply to manage the synchronization variables and must return to ensure that all registered handlers are called. Historically, the prepare handler acquired needed mutex locks, and the parent and child handlers released them. Unfortunately, this usage is not practical on the z/OS platform and is not guaranteed to be portable in the current UNIX standards. When the parent process is multi-threaded (invoked pthread_create()
at least once), the child process can only safely call async-signal-safe functions before it invokes an exec()
family function. This restriction was added to the POSIX.1 standard in 1996. Because functions such as pthread_mutex_lock()
and pthread_mutex_unlock()
are not async-signal-safe, unpredictable results may occur if they are used in a child handler.
While multi-threading and “lock protected fork” may be viable on some operating systems, it cannot be relied upon in the general case.
The warning may appear to end users at least in two scenarios:
- When they launch their application with warnings enabled (e.g.,
python -W default app.py
).
- When they execute the test suite of my library (since I’m unit-testing the multi-threading + fork case, and
pytest
captures the warnings).
It’s true that in the most common case, they won’t see anything.
For example, the very last snippet in the multiprocessing
documentation executed with -Wd
would display a warning on Linux. It can be simplified as follows:
import multiprocessing
def worker(queue):
while not queue.empty():
item = queue.get()
if __name__ == "__main__":
queue = multiprocessing.Queue()
data = [1, 2, 3]
for val in data:
queue.put(val)
process = multiprocessing.Process(target=worker, args=(queue,))
process.start()
process.join()
Every usage of multiprocessing.Queue()
with "fork"
start method now needs careful consideration. On Linux, any program that calls put()
before starting a process will trigger a warning internally. I suppose this applies to other parts of the standard library, such as logging.QueueHandler
for example.
I’m not entirely sure if this idea is both useful and doable, but here’s what I’m thinking off the bat:
- Suggesting to use the
"spawn"
start method instead of "fork"
for multi-threaded programs.
- Including the name of one of the background thread (e.g.,
"QueueFeederThread"
), which is especially important if it originates from a third-party library. The user may not be directly using threading, which could make it challenging for them to pinpoint the source of the warning.
Additionally, it’s important to emphasize in the documentation that "fork"
must not be used in a multi-threaded application. Currently, there’s only a faint warning: “Note that safely forking a multithreaded process is problematic.” We could explain why this combination isn’t a good idea and make it crystal clear that it’s likely incorrect and definitely not portable.
I guess no… data:image/s3,"s3://crabby-images/67824/67824c396ffa63f507154b63c327eed28d090390" alt=":slight_smile: :slight_smile:"
Originally, my objective was to suppress the warning in what I believed to be a secure scenario, not silencing it globally. I concur that it’s crucial to promptly inform users about the risks associated with "fork"
and initiate a transition to "spawn"
.
The fact that I’m here today serves as evidence of its effectiveness.
If we decide that combining multi-threading and forking should be universally disallowed, regardless of whether someone chooses to do so due to implicit OS support or any other reason, it logically follows that we should not offer an option to disable the warning.
The initial intent of flagging threads was to grant users and library authors the flexibility to make their own choice, along with the corresponding responsibilities and safety considerations. It’s not clear whether it’s “not safe at all” or “not safe on some OS”. It felt to me as if we were adding a quite strict rule for a problem that in practice can actually be worked around most of the time.
Still, I won’t deny that offering a way to break the standard recommendations isn’t a great idea. Therefore, the solution for library authors is surely to implement the feature differently or drop support for forking.