Details of process.wait() deadlock

xitop · October 28, 2024, 3:05pm

Inspired by a recent SO question, I’d like to ask for more details in explanation of a possible deadlock in asyncio.subprocess.Process.wait(). This function is defined the usual way “Wait for the child process to terminate.”:

Documentation quote:

Note

This method can deadlock when using stdout=PIPE or stderr=PIPE and the child process generates so much output that it blocks waiting for the OS pipe buffer to accept more data. Use the communicate() method when using pipes to avoid this condition.

My understanding was: When the executed process is blocked by a full pipe, it cannot finish its work and then to exit. In other words: the fact the process does not terminate makes the process.wait() block. The wait() does the right thing and not reading the pipe (i.e. not communicating properly) is the primary cause.

However, the process.wait will block even when the process in such state receives a signal and terminates. A full buffer in the process.stdout stream buffer blocks the process.wait. Draining the buffer unblocks the wait. I think that in this case wait() does not behave correctly. It should have returned when the process had exited - regardless of buffer full condition.

barry-scott · October 28, 2024, 7:49pm

My untested assumption is that if you kill the process then wait() will return as wait() is not interested in the stdin/stdout pipes.

As you describe the issue is the process writes to stdout and after that completes will exit. But if the pipe is not read the write never completes in the process and you wait for ever.

The subprocess.Popen.communicate function exists in the sync world to fix this issue. And is used by subprocess.run I assume.

I would assume that you can setup the async process to have its stdout/stderr read as well as waiting for process exit. That would
fix the problem as well.

xitop · October 28, 2024, 9:03pm

My assumption was the same. But in reality it behaves in the opposite way. wait does not return when the process is killed (and the buffer is still full) which means it cares about the buffer being full even after the process is gone. This was the reason I posted here.

barry-scott · October 28, 2024, 10:16pm

Do you have a small test case to show the problem?

Are you doing this on linux? Of so there are lots of tools to find out what is happening. I would try strace on the python parent for example.

xitop · October 29, 2024, 7:52am

Please find below a small Linux test program.

Preparation: create a 512kB file ./datafile, e.g.

 dd if=/dev/zero of=datafile bs=64k count=8

Code:

import asyncio

async def killproc(proc):
    await asyncio.sleep(1)
    print(f"{proc.returncode=}")
    proc.terminate()
    print("signal sent")
    await asyncio.sleep(1)
    print(f"{proc.returncode=}")
    await asyncio.sleep(2)
    print("reading pipe buffer")
    await proc.stdout.read()

async def main():
    proc = await asyncio.create_subprocess_exec(
        "/usr/bin/cat", "./datafile",
        stdout=asyncio.subprocess.PIPE)
    print("process started")
    asyncio.create_task(killproc(proc))
    print("wait() start")
    await proc.wait()
    print("wait() stop")

asyncio.run(main())

The output + blank lines where are delays + my comments:

-----
process started
wait() start

proc.returncode=None  # process 'cat' exists
signal sent

proc.returncode=-15   # process 'cat' terminated by signal 15
                      # (and the Python is aware of that)

reading pipe buffer   # <-- without this the 'wait' blocks
wait() stop
-----