Deadlock in python queue module

zdenek · September 9, 2021, 4:40am

I have a program that deadlocks from time to time and it very much looks like it is deadlocking in the standard python queue, which should not be possible given its multi-producer, multi-consumer properties. The code is approximately like this:
[the hacky 3-thread solution is because libusb does not support select]

thread 1
while True:
q.put(socket.recv())

thread 2
while True:
q.put(usbdev.read())

thread 3
while True:
process(q.get())

It is pretty much impossible to reproduce the problem on demand, but it does usually happen within several days of continuous running. When hung, it looks like all three threads are deadlocked on what I guess is the queue mutex:
pi@raspberrypi:~ $ sudo strace -p 554
strace: Process 554 attached
futex(0x75c005b8, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, FUTEX_BITSET_MATCH_ANY
^Cstrace: Process 554 detached
<detached …>
pi@raspberrypi:~ $ sudo strace -p 562
strace: Process 562 attached
futex(0x75c22510, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, FUTEX_BITSET_MATCH_ANY
^Cstrace: Process 562 detached
<detached …>
pi@raspberrypi:~ $ sudo strace -p 568
strace: Process 568 attached
futex(0x752005d0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, FUTEX_BITSET_MATCH_ANY
^Cstrace: Process 568 detached
<detached …>

(gdb) i th
Id Target Id Frame

1 Thread 0x76ff2ad0 (LWP 554) “python3” 0x76f5e088 in futex_abstimed_wait_cancelable (
private=0, abstime=0x0, expected=1, futex_word=0x75c005b8)
at …/sysdeps/unix/sysv/linux/futex-internal.h:205
2 Thread 0x765c8460 (LWP 562) “python3” 0x76f5e088 in futex_abstimed_wait_cancelable (
private=0, abstime=0x0, expected=1, futex_word=0x75c22510)
at …/sysdeps/unix/sysv/linux/futex-internal.h:205
3 Thread 0x75bff460 (LWP 568) “python3” 0x76f5e088 in futex_abstimed_wait_cancelable (
private=0, abstime=0x0, expected=1, futex_word=0x752005d0)
at …/sysdeps/unix/sysv/linux/futex-internal.h:205

Note that this is a Pi distribution:
pi@raspberrypi:~ $ python3 --version
Python 3.7.3

I am looking for ideas. Thanks

cameron · September 9, 2021, 7:17am

I have a program that deadlocks from time to time and it very much
looks like it is deadlocking in the standard python queue, which should
not be possible given its multi-producer, multi-consumer properties.
The code is approximately like this:
[the hacky 3-thread solution is because libusb does not support select]

thread 1
while True:
q.put(socket.recv())

thread 2
while True:
q.put(usbdev.read())

thread 3
while True:
process(q.get())

Indeed that should never deadlock.

It is pretty much impossible to reproduce the problem on demand, but it does usually happen within several days of continuous running. When hung, it looks like all three threads are deadlocked on what I guess is the queue mutex:
pi@raspberrypi:~ sudo strace -p 554 strace: Process 554 attached futex(0x75c005b8, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, FUTEX_BITSET_MATCH_ANY ^Cstrace: Process 554 detached <detached ...> pi@raspberrypi:~ sudo strace -p 562
strace: Process 562 attached
futex(0x75c22510, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, FUTEX_BITSET_MATCH_ANY
^Cstrace: Process 562 detached
<detached …>
pi@raspberrypi:~ $ sudo strace -p 568
strace: Process 568 attached
futex(0x752005d0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, FUTEX_BITSET_MATCH_ANY
^Cstrace: Process 568 detached
<detached …>

That looks naively like 3 distinct locks. (Of course a u eue will have
more than one internal lock.)

Are you sure it is locked on the queue? If you separate out the q.put
and its arguments?

while True:
    print("recv")
    data = socket.recv()
    print("got", len(data), "bytes")
    q.put(data)
    print("put completed")

and similarly for the other loops. Is it definitely blocked in the queue
operation?

Have you any opportunity to try this against the latest release Python?

Cheers,
Cameron Simpson cs@cskk.id.au

zdenek · September 9, 2021, 12:53pm

Well, no I don’t know how to exactly interpret the hung condition, other than guessing. All three threads are deadlocked the same way in the same piece of code:

(gdb) thr 1
[Switching to thread 1 (Thread 0x76ff2ad0 (LWP 554))]
#0 0x76f5e088 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1,
futex_word=0x75c005b8) at …/sysdeps/unix/sysv/linux/futex-internal.h:205
205 …/sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
#0 0x76f5e088 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1,
futex_word=0x75c005b8) at …/sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem@entry=0x75c005b8, abstime=0x0) at sem_waitcommon.c:115
#2 0x76f5e1f4 in __new_sem_wait_slow (sem=0x75c005b8, abstime=0x0) at sem_waitcommon.c:282
#3 0x000f9028 in PyThread_acquire_lock_timed ()
#4 0x000cbc44 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thr 2
[Switching to thread 2 (Thread 0x765c8460 (LWP 562))]
#0 0x76f5e088 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1,
futex_word=0x75c22510) at …/sysdeps/unix/sysv/linux/futex-internal.h:205
205 in …/sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt
#0 0x76f5e088 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1,
futex_word=0x75c22510) at …/sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem@entry=0x75c22510, abstime=0x0) at sem_waitcommon.c:115
#2 0x76f5e1f4 in __new_sem_wait_slow (sem=0x75c22510, abstime=0x0) at sem_waitcommon.c:282
#3 0x000f9028 in PyThread_acquire_lock_timed ()
#4 0x000cbc44 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thr 3
[Switching to thread 3 (Thread 0x75bff460 (LWP 568))]
#0 0x76f5e088 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1,
futex_word=0x752005d0) at …/sysdeps/unix/sysv/linux/futex-internal.h:205
205 in …/sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt
#0 0x76f5e088 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1,
futex_word=0x752005d0) at …/sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem@entry=0x752005d0, abstime=0x0) at sem_waitcommon.c:115
#2 0x76f5e1f4 in __new_sem_wait_slow (sem=0x752005d0, abstime=0x0) at sem_waitcommon.c:282
#3 0x000f9028 in PyThread_acquire_lock_timed ()
#4 0x000cbc44 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I am not familiar with the implementation so all I am considering is the user-visible stuff – the two producer threads should be waiting in the respective syscalls (which they are until the thing deadlocks). I tried instrumenting the code, too, but all I get is the expected output until the deadlock, then all is hung.

Milton_Mobley · September 24, 2021, 12:09am

I hope this may help:

In your original post you give only a simplified version of the code’s action.

The threads might appear to be blocked if socket.recv() and usbdev.read() are not actually producing
messages. What data do you expect socket.recv() to produce? Same for usbdev.read(). I mean not contents, but size and frequency of messages. You could write simple programs to measure these,
and verify the data you expect is being produced.

I looked at the Python source for the queue module. It looks pretty straightforward for queue code, and is probably not the source of the problem.

Otherwise, you could set some gdb breakpoints in the C backend code. I see that PyThread_acquire_lock_timed() is implemented in the source in the file Python/thread_pthread.h.
Perhaps you could find some place around there to set breakpoints, in an error path that you do not expect to be taken, not the main (success) path. The semaphore get operations are supposed to time out.

Alternately, you could try to use the queues in the multiprocessing module, which use the semaphores differently.

zdenek · September 24, 2021, 1:08am

What I showed is really not simplified. The code is almost exactly the way I showed – it is a simple brute force workaround around libusb not supporting select – literally these three threads just merge data produced by the two [dissimilar] syscalls.
I can generate the requisite USB events on demand and the socket is producing a continuous stream of [16-byte] messages. There is no doubt in my mind that the code is not waiting in either or both of the syscalls.
The most telling to me is the fact that all of the threads end up being stuck in the same function: futex_abstimed_wait_cancelable(). That is what I interpret as a deadlock of some sort although I don’t understand the actual arguments to those calls.
Note that when things are working fine (which is most of the time as it takes many days for this problem to occur) I can see the threads to be suspended in the respective syscalls exactly as expected.
Unfortunately, I can’t afford to spend more time debugging this platform. It is the second problem I found in a relatively short period of time, which makes me suspect that this platform (ARM) may not be as stable as the common x86 distribution. It doesn’t help that I am running code that is likely not exercised very often (CANsocket, and libusb) and I already verified that the other problem I found exists only in the ARM distribution.

Milton_Mobley · September 24, 2021, 5:50pm

Since you say the code works on x86 platforms, the only other suggestion I have is to switch to some other ARM platform besides raspberry pi

zdenek · September 24, 2021, 6:41pm

Yeah, I understand I don’t have too many choices here.

But in name of accuracy – unlike the first RPi problem I had which I traced down to a platform header error, this particular problem has not been diagnosed yet. I don’t have an identical setup on the x86 and on RPi it takes a very long time to reproduce. So I don’t want to make any conclusive platform statements (what if it took a month to reproduce on x86?). Synchronization mistakes (if that’s what this is) are in my opinion the hardest software problems to diagnose (only competing with hardware marginal design problems).

Milton_Mobley · September 25, 2021, 1:03am

Your gdb backtraces show that the call sequence is going from:
PyThread_acquire_lock_timed (in python backend C source)
__new_sem_wait_slow (not in python source)
do_futex_wait (not in python source)
futex_abstimed_wait_cancelable (not in python source)

This suggests that Python is going down successfully into system library support to access the locks, which are ultimately implemented in the kernel. The fact that gdb can display the system names suggests you have a debug version of those libraries. If you can get the source, you could add in
some print statements to see what is going on. There is probably already such a print system there.

PyThread_acquire_lock_timed is instrumented with a function ‘dprintf’, but you would have to make a debug version of Python to see its output.

These may be very complicated to do. So I suggest again:

Try the queues in the multiprocessing module as they seem to be accessing locks differently.

Switch to another target besides raspberry pi

zdenek · September 25, 2021, 5:18am

BTW you don’t need a debug version (DWARF) of an executable in order to display symbols. As long as the symbols are not stripped from the executable, you can display them.

I am actually not familiar with the Python multiprocessing module. Interestingly, a quick read says that its queue “is a near clone of queue.Queue”, but I may experiment with that as any change however slight may make all of the difference.

There is nothing I can do about the fact that the code runs on raspberry Pi – the Pi is part of a custom hardware solution that I am not ready to redesign. So as bad as it sounds my current solution is a watchdog process that restarts the app when it stops producing data (i.e., when it hangs).

Milton_Mobley · September 25, 2021, 11:18pm

I see that other people are having similar problems with Raspberry Pi.

See Application hangs - Raspberry Pi Forums, where the code also seems to be hung in futex_abstimed_wait_cancelable().

In another example I found, which used some proprietary code, the code was going down to the bottom of glibc to set up an ioctl to do a syscall to the kernel, where the mutex/semaphore locks actually live. In that example the developer found he had memory leaks in his code.

Here is another one specific to Python and RPi: Bug #40156: deadlock on MonCommandCompletion - mgr - Ceph

Perhaps you should go to the RPi forums to see if any fix has been found.

zdenek · September 26, 2021, 1:28am

Thanks for the suggestions and for doing the digging. I have little experience with the RPi platform. I chose it because of the $35 price tag, but I am starting to regret that decision. I have been somewhat turned away by the high level of misinformation stemming from the platform secrecy (they get away with it by targeting kids and unskilled hobbyists). As a result I shied away from their forums. So, my bad – I’ll try to be more open-minded. Either that, or I’ll end up revisiting your suggestion to switch to different hardware

Milton_Mobley · September 26, 2021, 1:39am

Since last message I found other examples that do not involve RPi.

Along the way I looked at the current source for glibc, where the Changelog file says that
futex_abstimed_wait_cancelable function is removed, after having been rewritten in the previous version. It is also not in the source. Are you using an old glibc? If so, you should probably upgrade.

zdenek · September 26, 2021, 2:54am

pi@raspberrypi:~ uname -a Linux raspberrypi 5.4.83-v7+ #1379 SMP Mon Dec 14 13:08:57 GMT 2020 armv7l GNU/Linux pi@raspberrypi:~ ldd --version
ldd (Debian GLIBC 2.28-10+rpi1) 2.28

out of curiosity, my desktop happens to run the same glibc:

uname -a Linux zr-mx 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux ldd --version
ldd (Debian GLIBC 2.28-10) 2.28

in what version was the function removed?

Milton_Mobley · September 26, 2021, 10:48pm

The “fix” appears in 2.34, the latest stable release. It would be very tedious to find out when it was done. This might require studying a lot of git logs for relevant changes, etc.

Again, my reasons for thinking the problem is in glibc code are:

I found a number of bug reports complaining of similar behavior, i.e. all threads appear to be blocked after some long, random runtime. All but one of the reports showed this function
futex_abstimed_wait_cancelable as the last function entered. Some of the reports were about RPi targets, others were not.

Search for info about this function revealed some concern by glibc developers that its transfer of control to and from the kernel was not atomic. ChangeLog files indicate they made several attempts to fix it, and then replaced it with a function __futex_abstimed_wait_cancelable64, which passes a token and the count of the number of waiting threads atomically on both 32 and 64 bit systems. The 32/64 distinction may explain why your code appeared to work on x86 systems.

Of course having to wait for the fix to appear in RPi releases may not help you much.

If you know the maximum size of incoming messages, you could replace your queue by ring buffers.
These are used extensively in Linux and NT kernels and in network software, in part because they do not require synchronization when there is only one reader and one writer.

You could replace your queue by one ring for each read source and have your processing thread alternate between the rings to test for available data, and sleeping for short intervals.

Milton_Mobley · September 27, 2021, 11:48pm

Good news! It looks like raspbian upgraded glibc to 2.32 two days ago. It appears to have the fixed code. Installing the latest version may solve your problem.

zdenek · September 28, 2021, 1:43am

Yeah, I just found out today. I’ll give it a try. Really appreciate your pointers and advice.

Milton_Mobley · September 28, 2021, 6:58pm

You are welcome. The process increased my knowledge of glibc usefully.

BrendanSimon · July 18, 2022, 2:09am

@zdenek Did glibc 2.32 solve your issue?

glibc 2.34 is only available in Debian Sid at the moment (July 2022).
glibc 2.33 is in Bookworm (not released yet).
glibc 2.21 is in Bullseye (stable).

I’m running a Buster OS on an embedded system. I would like to migrate to Bookworm (but it’s not released yet, so has more risk), but migrating to Bullseye might not resolve this issue (I see similar FUTEX messages in my strace so I am assuming I might be suffering the same issue, but not really sure).