Concerns regarding deprecation of "fork()" with alive threads

gpshead · September 13, 2023, 11:36pm

I understand the pain but I don’t think it is possible to surface such a warning at the ideal place all of the time. This warning serves as an indication of the fundamental low level problem. The why behind that and which code deserves “blame” for choosing something that happens to use os.fork and thus what workarounds or solutions are needed is rarely so simple and clear cut.

I don’t see that high bar of where it shows up criteria for enabling the warning ever being possible to meet.

The list of things in the stdlib that use os.fork() is very small:

os.fork() - blamed directly on use if threaded.
os.forkpty() - blamed directly on use if threaded.
socketserver.ForkingMixIn - socketserver.py ForkingMixIn.process_request’s os.fork call can be blamed. TODO: We can update the documentation to mention or link to the fork+threading problem.
http.server.CGIHTTPRequestHandler - http/server.py CGIHTTPRequestHandler.run_cgi called from its do_POST method can be blamed. But as far as I can tell, nobody uses this ancient thing so I’m proposing we just deprecate CGIHTTPRequestHandler like we already did with the cgi and cgitb modules.
multiprocessing and concurrent.futures.ProcessPool when using the multiprocessing “fork” start method (posix platform default start method until 3.14).
pty.fork() - The pty.py pty.fork() function’s os.forkpty or os.fork call can be blamed. Those seem obvious enough already. TODO: We can update documentation to mention this.
Various places in the CPython Lib/test/ test do use os.fork() - but these are outside of the stdlib - they are internal only for us and we can deal with any of those that actually trigger a warning as we see fit, I doubt most of them are a problem. It remains a TODO for us to remediate those that remain in test code internally.

In general, adding a wrapper to catch and re-warn at a different stacklevel would be boilerplate code that otherwise slows everything down for normal non-warning use. ie: pty.py could take that approach but it isn’t code I’d want.

3.12 supports something slightly better than “stacklevel=” from our warnings framework with the new warn(skip_file_prefixes=) feature. That lets code attempt to give a warning a more logical code location to show up. That is only usable via a Python level API call, not from any PyErr_Warn* APIs. Making use of it from C code is thus pretty messy, though it might be usable to provide a stdlib API surface for where a warning appears (ie: blame the user of pty.fork instead of pty.py calling os.fork might be feasible that way?)

I do not expect a skip_file_prefixes= warn call could meaningfully handle the socketserver.ForkingMixIn use case as that is a situation where you really want to store the location a class derived from that was constructed from outside of the standard library and issue a warning blaming that long ago not in the current call stack instance construction or even a class definition from within the much later on ForkingMixIn.process_request code where it calls os.fork() triggering the warning if threads exist. Nothing can do that kind of warn about some other code at a distance today. (and it’d be a pretty esoteric feature request…)

There’s also this relevant issue: Support full stack trace extraction in warnings. · Issue #87693 · python/cpython · GitHub - warnings being focused on a single stack frame often misses the point and hides their utility. Why are we so single frame focused? What people often need to see is a stack trace so they can follow that, like a traceback, and identify the actual entry point where a change might make sense. Without having to rerun code with some unique setup where one specific warning is selected to turn into a raised exception.