Are all descriptions of the "caveats", listed near the end of _thread module's docs, up to date?

zuo · June 8, 2024, 8:34pm

Near the end of the documentation of the _thread module there is a list of caveats. It contains, among others, the following statements:

Threads interact strangely with interrupts: the KeyboardInterrupt exception will be received by an arbitrary thread. (When the signal module is available, interrupts always go to the main thread.)

Is that still true? If it is, on what platforms? (Are there any platforms for which _threads is available and signal is not?)

When the main thread exits, it does not do any of its usual cleanup (except that try … finally clauses are honored), and the standard I/O files are not flushed.

Is that still true? If it is, does that behavior (missing cleanups) occur only if the program started any threads (besides the main one), or always if the current platform supports threading?

fonini · June 10, 2024, 2:07am

This one:

When the main thread exits, it is system defined whether the other threads survive. On most systems, they are killed without executing try … finally clauses or executing object destructors.

…also strikes me as odd. I though this was the whole purpose of daemon vs non-daemons threads? To specify which threads should not be killed? From the threading docs:

A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.

I read that as: as long as there is still one non-daemon thread (which may be a non-main thread) alive, the Python process will not exit, and no thread will be killed.

zuo · June 10, 2024, 6:53pm

Daemon thread is a term related to the high-level stuff of threading, not to the low-level stuff of _threads I was referring to above.

When it comes to the threading’s Thread objects (high-level ones), their behavior on Python exit is quite clear:

Python waits until all non-daemon threads exit themselves.
And only then any custom atexit.register()-registered callbacks are executed (if any). So, if you need to have a non-main thread to exist at this stage (e.g. to trigger its exit and wait for it…), it must be a daemon thread.
And then Python exits – without waiting for any daemon threads. What happens to them depends on the platform. Typically (e.g., on Linux), they just die immediately – in a brutal way (i.e., without any finally/etc. cleanups). As the docs say:

Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.

So you are right that:

as long as there is still one non-daemon thread (which may be a non-main thread) alive, the Python process will not exit, and no thread will be killed.

But it seems nobody who reads this forum (and cares enough to write a reply) knows the answers to my two questions about the _threads’s low-level stuff.

Rosuav · June 10, 2024, 7:05pm

Seems likely. Most of us haven’t used the low-level module.

elis.byberi · June 10, 2024, 7:06pm

It is platform-specific, platforms change, so you have to test them.

zuo · June 10, 2024, 7:32pm

The point is, the threading module, under the hood, makes use of _threads – so it seems that those statements apply to threading as well. So it is quite vital to know whether those statements are still true (and in what circumstances). Maybe they are just outdated artifacts from some early version of this module’s docs, say, from 25 years ago? If this is the case, an update would be helpful, and I’d be happy to propose a PR with a fix.

In particular, this one bothers me the most:

When the main thread exits, it does not do any of its usual cleanup (except that try … finally clauses are honored), and the standard I/O files are not flushed.

It seems really odd, as it seems to mean that for platforms with threading the main thread is never properly cleaned up (and that’s obviously seems not true!), or that spawning any thread breaks something in that cleanup machinery (which also seems unlikely). Shouldn’t this statement be either removed or updated somehow? And, if the latter, how should it be updated?

elis.byberi · June 10, 2024, 8:03pm

I may be overlooking something, but on my system, it is still true:

import _thread
import time

def thread_task():
    try:
        print("Secondary thread started.")
        time.sleep(2)  # Simulate some work
        print("Secondary thread finished.")
    finally:
        print("Secondary thread cleanup executed.")

def main():
    _thread.start_new_thread(thread_task, ())
    time.sleep(1)
    print("Main thread exiting.")

if __name__ == "__main__":
    main()

Result:

Secondary thread started.
Main thread exiting.

It would be helpful if you could provide some examples for testing purposes.

zuo · June 10, 2024, 8:09pm

@elis.byberi

My question concerns the statement about the main thread.

What you refer to, is expected for the low-level _thread module’s threads which – when it comes to Python exit – behave just like threading module’s daemon threads (i.e., are abruptly stopped).

Rosuav · June 10, 2024, 8:09pm

The higher level module offers additional guarantees that you don’t have when using the low level interface.

zuo · June 10, 2024, 8:24pm

Yes, threading is a pure-Python module which, indeed, offers many useful mechanisms and guarantees.

But they do not seem to concern interpreter cleanup or flushing I/O files, the aforementioned statement refers to.

zuo · June 10, 2024, 8:38pm

Hm, maybe somebody focused on Documentation has an idea whether/how those 2 statements should be removed/updated? (Note: I’d be happy to create an issue + PR with a fix – when it is clear what, if any, changes should be applied)

Rosuav · June 10, 2024, 8:52pm

Have you tried it?

import threading, time
threading.Thread(target=time.sleep, args=(10,)).start()

Takes 10 seconds to finish because the secondary thread holds things active.

import _thread, time
_thread.start_new_thread(time.sleep, (10,))

Takes almost no time to finish because the secondary thread is abruptly cancelled. You can explore threading.py to figure out how it does this if you like.

zuo · June 10, 2024, 9:02pm

Thanks, I know how it works and how it is implemented (in particular, as I wrote in the reply to the @elis.byberi’s post, I know it is expected and perfectly OK, that low-level _threads.start_new_thread()-spawned threads are abruptly stopped, just like threading’s daemon threads).

But please note that that’s unrelated to what I focus on in this forum thread: the fragment of the documentation regarding the behavior of the main thread when it comes to (the pronounced lack of) its usual cleanup and I/O flushing.

Rosuav · June 10, 2024, 9:18pm

Again, have you tested it?

zuo · June 10, 2024, 10:25pm

Ad 1.:

Threads interact strangely with interrupts: the KeyboardInterrupt exception will be received by an arbitrary thread. (When the signal module is available, interrupts always go to the main thread.)

I have no idea what you mean by testing this one.

On my system (with signal accessible), SIGINT is (quite obviously) handled in the main thread. It does not answer, however, whether the above statement is true/up-to-date.

Ad 2.:

When the main thread exits, it does not do any of its usual cleanup (except that try … finally clauses are honored), and the standard I/O files are not flushed.

I’ve done a quick test (see below) which shows that, as i surmised, this statement is not true when it comes to my (Linux) system. But, of course, the test cannot prove that the statement is not true for every platform and that it should be removed from the docs.

The only sure thing about the above statement is that, at least for me, it is unclear: is the cleanup/flushing limitation supposed to apply always on platforms with threads, or only if a thread has been spawned?

I suppose that a core developer familiar with this part of the implementation could easily tell whether each of the two cited statements is true/up-to-date at all (universally, not just for my platform) – that’s why I ask these questions here.

As I stressed, I’d be happy to create a PR with a docs fix – but it seems it does not make sense to create one until somebody more competent than me confirms that a fix is needed indeed.

<my shell prompt>$ python3.11
Python 3.11.8 (main, Apr 10 2024, 21:47:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import atexit, sys, _thread
>>> f = open('spam', 'xb')
>>> atexit.register(f.write, b'1')
<built-in method write of _io.BufferedWriter object at 0x7fecc8ed8e00>
>>> _ = _thread.start_new_thread(sys.__stdout__.write, ('foo',))
>>> sys.exit()
foo<my shell prompt>$ cat spam
1

<my shell prompt>$ python3.14 -W ignore
Python 3.14.0a0 (heads/main:e83ce850f4, Jun  5 2024, 20:35:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import atexit, sys, _thread
>>> f = open('spam42', 'xb')
>>> atexit.register(f.write, b'42')
<built-in method write of _io.BufferedWriter object at 0x7fe6764816d0>
>>> _ = _thread.start_new_thread(sys.__stderr__.write, ('foo',))
>>> sys.exit()
foo<my shell prompt>$ cat spam42
42

elis.byberi · June 10, 2024, 10:59pm

I believe this is the correct example:

import _thread

def io():
    with open('test', 'wb') as w:
        w.write(b'Hello world!')

_thread.start_new_thread(io, ())

zuo · June 10, 2024, 11:22pm

Unfortunately, no. It just confirms the expected (and already mentioned in some previous posts) fact that _thread.start_new_thread()-spawned threads behave on Python exit in the same way as threading’s daemon threads (which, under the hood, are based on the former…), i.e., that they stop abruptly.

PS The tests from my previous post can be reformulated in a more readable way, e.g.:

import atexit, sys, time, _thread

def stdio_flush_test():
    sys.stdout.write('b') 
    time.sleep(10)

@atexit.register
def atexit_callback_test():
    with open('atexit', 'w') as f:
        print("Hello!", time.time(), file=f)

_thread.start_new_thread(stdio_flush_test, ()) 
time.sleep(1)

eryksun · June 11, 2024, 10:19am

The signal module should always be available on the currently supported platforms. Note that on Windows, the C runtime library just emulates a few signals, since Windows does not implement POSIX signals^[1].

That’s not right. If the main thread exits normally back to Py_RunMain() or if it raises SystemExit, then Py_FinalizeEx() gets called, which, among other things, shuts down the threading module and calls Python atexit functions. If the process exits via C exit() or _exit(), then Py_FinalizeEx() isn’t called, so threading isn’t shut down and Python atexit functions aren’t called.

SIGBREAK and SIGINT are based on the corresponding console control events. SIGSEGV, SIGILL, and SIGFPE are based on the corresponding OS exceptions. SIGABRT and SIGTERM are emulated just for use with C raise() and abort(). There isn’t support in Python for handling SIGSEGV, SIGILL, SIGFPE, and SIGABRT due to the design of the C signal handler, which just sets a flag and returns. Also, handling SIGINT and SIGBREAK is broken when reading from the console/terminal since EOFError is raised instead of restarting the read. ↩︎

zuo · June 11, 2024, 10:23am

Therefore, do you agree that those two statements should be removed from the docs of _thread?

eryksun · June 11, 2024, 10:29am

I think that the two statements should be corrected and qualified.