Calling read on a subprocess pipe makes select block even when data is available

SimonMaracine · November 11, 2024, 5:33pm

Hello!

I have been using Python to launch child processes, to read from their output and to write to their input. I have been almost successfully doing that for a while, until I encountered some problems with graceful shutdown of the parent process. I decided then to use select in order to avoid blocking reads, but I’m having difficulties with that as well.

My problem is that select returns negative after a call to read on stdout, even though I know for a fact that there is still data available to read.

Here is an example:

child.py

#! /usr/bin/env python3
    
if __name__ == "__main__":
    print("READY", flush=True)
    input()

parent.py

#! /usr/bin/env python3
    
import subprocess
import select
    
def main():
    process = subprocess.Popen(["./child.py"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True)
    
    # First char read
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.read(1)
    print(data)
        
    # Second char read (select returns false)
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.read(1)
    print(data)
        
    # Third char read (select returns false)
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.read(1)
    print(data)
        
    process.stdin.write("QUIT\n")
    process.stdin.flush()
        
   process.wait()
    
    
if __name__ == "__main__":
    main()

You can see that I erroneously ignore the result of select and read from the pipe anyway. The reading operation doesn’t block, as there is data. But select tells me that the read should block, that there is no data.

This was just an example. Actually I need to read whole terminated lines (with readline) and write whole terminated lines. The child program may sometimes output multiple lines per command from parent (one line of input). This issue makes it impossible to read more than one line of output without blocking, because after reading the first line, or even just one character, select returns false. select correctly returns true after I read the single line of output from the child.

child2.py

#! /usr/bin/env python3
    
if __name__ == "__main__":
    print("READY", flush=True)
    input()
    print("THING1", flush=True)
    input()
    print("THING2", flush=True)
    input()
    print("THING3", flush=True)
    # These two lines don't exist for select
    print("OH_NO1", flush=True)
    print("OH_NO2", flush=True)
    input()

parent2.py

#! /usr/bin/env python3
    
import subprocess
import select
    
def main():
    process = subprocess.Popen(["./child2.py"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True)
    
    # Read READY
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.readline()
    print(data)
        
    process.stdin.write("NEXT\n")
    process.stdin.flush()
        
    # Read first THING
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.readline()
    print(data)
        
    process.stdin.write("NEXT\n")
    process.stdin.flush()
        
    # Read second THING
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.readline()
    print(data)
        
    process.stdin.write("NEXT\n")
    process.stdin.flush()
        
    # Read third THING
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.readline()
    print(data)
        
    # Read first OH_NO (select returns false)
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.readline()
    print(data)
        
    # Read second OH_NO (select returns false)
    result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
    print(bool(result))
    
    data = process.stdout.readline()
    print(data)
        
    process.stdin.write("QUIT\n")
    process.stdin.flush()
        
    process.wait()
    
if __name__ == "__main__":
    main()

In my situation I don’t know in advance how many lines of output will the child send after a certain input. In the end, select becomes useless.

My question is if this is the normal behavior for select in Python, because it seems very odd to me. I have written subprocessing code in C and C++ as well in the past and the select system call wasn’t behaving like that.

I’m using Python 3.12.7, Linux 6.11.5.

onePythonUser · November 11, 2024, 6:53pm

Hello,

Just one quick observation. It appears that you are missing the opening parenthesis.

On a related note, I have just started studying the subprocess library package module. From its documentation, it states the following:

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

From the assignment above, you’re using neither. Is there a reason for this? Since I am new to this library, I am curious as to why this approach has been taken.

SimonMaracine · November 11, 2024, 7:11pm

Sorry! That was a mistake. I fixed it.

I am actually using Popen. I know that there are other ways of running processes, but in my situation I cannot use anything other than Popen, because I need continuous communication between the two programs. The child program receives text commands, does something and sends back text responses. This needs to happen as long as the parent program hasn’t sent a “quit” command. This scenario is not something new.

SimonMaracine · November 11, 2024, 7:25pm

Before, I was reading from stdout from a separate thread in a loop, as it was blocking, and I was putting the messages in a thread safe queue. The pipes were buffered readers, because text wrappers are not thread safe according to the docs. It was working perfectly except that sometimes, after the parent process terminated the child gracefully, the read operation on stdout was not returning with an exception and the second thread was stuck running. For two days I could not find out why was this happening and because I wanted graceful shutdown of everything, I decided to try using select instead in order to know when can I read without blocking, thus avoiding threads.

I then asked here for help, because after three days I still don’t have a working solution, and because I noticed that select in Python looks wrong.

onePythonUser · November 11, 2024, 7:32pm

I see that you have added the Popen. Not sure if this:

reflects your actual script, but it is still missing the opening parenthesis.

SimonMaracine · November 11, 2024, 7:35pm

Thank you. Maybe you can tell by now that I’m very tired.

I don’t have syntax or semantic errors in my actual scripts.

onePythonUser · November 11, 2024, 7:37pm

can you add this argument to the function:

stderr=subprocess.PIPE

SimonMaracine · November 11, 2024, 7:45pm

Sorry, I can’t see how that would help. stderr captures the error output. I know for a fact that the child process successfully wrote “READY\n” and all the other messages, because the parent process read them. Thus, the child couldn’t have crashed and that couldn’t have explained anyway the select problem.

onePythonUser · November 11, 2024, 8:00pm

From your initial post,

the reason that you added select was to circumvent (or workaround) “some problems” during initial testing of your script. If it had worked fine, I gather you would not have included it in the first place.

I think that adding the stderr is good programming practice in the unexpected cases when errors are captured.

SimonMaracine · November 11, 2024, 8:15pm

Yes, getting stuck in the read operation from the thread only sometimes, for no apparent reason was unacceptable to me, even though I could have easily killed the thread with Ctrl+C. Ignoring that symptom meant to ignore a bigger issue somewhere. I couldn’t solve it, so I gave up on threads. Using select was another valid approach.

In my case, actually the child is a C++ program. It doesn’t print anything in stderr. It does its job the same way, by printing and flushing strings in stdout. It also ignores SIGINT, in case the parent process is terminated by the user. I wrote a Python script just for the sake of simplicity.

I don’t want to create a new issue on the cpython’s repository page, as I’m not sure if select has a bug or not. I asked here for a reason.

barry-scott · November 11, 2024, 10:03pm

Select will return when there is some bytes that can be read.
You should not use textmode or readline if you expect not to hang.

Have a loop that reads the bytes into a buffer and check for it containing the delimiter, b’\n’. Then you can convert the bytes to unicode.

The suggestion to handle stderr and stdout is also a good one.
Just add both fds to the select and check which fds are signalled.
Now if something is written to stderr you will not hang the other process because you did not read the stderr pipe.

Make these changes and your code will be robust.

onePythonUser · November 11, 2024, 11:05pm

By the way, you can shorten your code when you are repeating the same algorithm over and over. You can do this by employing a for loop in conjunction with the range built-in function.
For example, your first main function can be shortened to:

def main():

    process = subprocess.Popen(["./child.py"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True)

    for count in range(3):

        result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
        print(bool(result))

        data = process.stdout.read(1)
        print(data)

    process.stdin.write("QUIT\n")
    process.stdin.flush()

    process.wait()

This way, we shave off 7 lines of code.

Regarding this line, assigning two values to two variables (result and *_).

result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)

Is *_ a valid variable name? Per convention, variable names must start with a letter or underscore.

onePythonUser · November 12, 2024, 2:16am

Can you show how you have your code setup prior to using the select alternate workaround?
Otherwise, it seems that you will be using the select function as a band-aid.

SimonMaracine · November 12, 2024, 8:13am

Thank you. I know that. But for the sake of the MCVE (Minimal Complete Verifiable Example), I avoided loops. I instead inserted comments for you.

I need to pack the returned items and discard them, hence the underscore.

SimonMaracine · November 12, 2024, 9:14am

Thank you, Barry, for your help.

That is my issue. I’m using select to know when ca I read from the pipe, but I noticed that in some cases it never returns (or the timeout runs out), even though the pipe was actually ready for reading.

I’m using text mode and readline, because all the transferred data is text ended by new lines (‘\n’). Either the parent or the child may transfer multiple such lines of text. That’s why I can safely expect entire lines of data.

I’ll gladly do that, but I just demonstrated in the first example that for some reason I cannot rely on select to tell me if bytes or characters can be read, because the moment I try to read the second or third, fourth etc. byte or character, it oddly tells me I don’t have anything to read, even though I did have data to read and it read successfully (because I did not listen to it).

At your suggestion, I just tried binary mode instead of text, but select still doesn’t work as expected.

parent.py

#! /usr/bin/env python3

import subprocess
import select

if __name__ == "__main__":
    process = subprocess.Popen(["./child.py"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)

    # Infinite loop
    while True:
        result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
        print(f"select says the pipe is ready for reading: {bool(result)}")

        print("Reading one byte anyway: ", end="")
        data = process.stdout.read(1)
        print(data)

    # Never gets executed
    process.stdin.write(b"QUIT\n")
    process.stdin.flush()

    process.wait()

child.py

#! /usr/bin/env python3

if __name__ == "__main__":
    print("READY", flush=True)
    input()

This is the output:

You can see that for some reason select returns negative on subsequent reads, which doesn’t seem to reflect the behavior of the select system call.

When select seems to work:

parent.py

#! /usr/bin/env python3

import subprocess
import select

if __name__ == "__main__":
    process = subprocess.Popen(["./child.py"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)

    for i in range(4):
        result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
        print(f"select says the pipe is ready for reading: {bool(result)}")

        print("Reading one byte anyway: ", end="")
        data = process.stdout.read(1)
        print(data)

        # Get past input() in child
        process.stdin.write(b"NEXT\n")
        process.stdin.flush()

    for i in range(4):
        result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
        print(f"select says the pipe is ready for reading: {bool(result)}")

        print("Reading one byte anyway: ", end="")
        data = process.stdout.read(1)
        print(data)

    process.stdin.write(b"QUIT\n")
    process.stdin.flush()

    process.wait()

child.py

#! /usr/bin/env python3

if __name__ == "__main__":
    print("A", end="", flush=True)
    input()
    print("B", end="", flush=True)
    input()
    print("C", end="", flush=True)
    input()
    print("D", end="", flush=True)
    input()
    print("E", end="", flush=True)
    print("F", end="", flush=True)  # Oh no
    print("G", end="", flush=True)
    print("H", end="", flush=True)
    input()

The output is:

Here, select finally returns correctly, but, for some reason, only when writing something else or only when the child wait on input(). But again, the child should be able to print multiple messages or multiple bytes.

The question remains: Is this the correct behavior of select in Python?

SimonMaracine · November 12, 2024, 9:20am

The previous solution was working fine except sometimes at shutdown. I don’t want to bring that up yet.

I don’t think using select is just a workaround. Honestly, I should have tried it first.

barry-scott · November 12, 2024, 10:46am

Did you get the EINTR error maybe?
I have never seen what you describe.

What OS is this running on?

I would not code that way as its not robust.
You do not know what reads the text mode and readline will do.
They may well hang your code.

Have you set the FDs to be non-blocking?
In C I would use fcntl(fd, F_SETFL, O_NONBLOCK)
Then if there is no data you will get a 0 read or error (I forget which).

When you are reading in bytes mode then you will need to tell os.read how many bytes it can read. Use a large enough size to empty the pipe.

That will be a buffered read. Use os.read() to get control over exactly what is read.

What I’m saying is that you need to stop using the python textmode/buffered/blocking API and use a non-blocking os.read() level API. Then you will see deterministic behavior.

SimonMaracine · November 12, 2024, 11:04am

I just disabled buffering and now select works as I expected…

#! /usr/bin/env python3

import subprocess
import select

if __name__ == "__main__":
    process = subprocess.Popen(["./child.py"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, bufsize=0)

    while True:
        result, *_ = select.select([process.stdout.fileno()], [], [], 1.0)
        print(f"select says the pipe is ready for reading: {bool(result)}")

        if not result:
            break

        print("Reading: ", end="")
        data = process.stdout.read(3)
        print(data)

    process.stdin.write(b"QUIT\n")
    process.stdin.flush()

    process.wait()

Thank you for your help!

onePythonUser · November 13, 2024, 5:11am

Hello,

awesome that you got it to work. Like I stated in my original response, I have currently started studying this library package - so definitely interested in this thread. One of my take aways or understandings from my current study / research is:

From theory:

A pipe is a unidirectional communication channel that connects one process’s standard
output to another’s standard input. A pipe can connect the output of one command to the
input of another, allowing the output of the first command to be used as input to the
second command.

So, when you create a process, you can either set it up as either with stdin = subprocess.PIPE or stdout = subprocess.PIPE, but not both. The way you have set up your process, is by enabling both the stdin and stdout. Am I interpreting this correctly? Should you have created two distinct processes for this instead?

In the following example, the commands dir and sort /R are combined via two distinct processes, where p1 is the input to the p2 process.

import subprocess

p1 = subprocess.Popen('dir', shell=True, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p2 = subprocess.Popen('sort /R', shell=True, stdin=p1.stdout)

p1.stdout.close()
out, err = p2.communicate()

This example was taken from the following website:

Could setting up a process by enabling both the stdin and stdout have been the cause for the process to behave unexpectedly as you stated in your original post?

barry-scott · November 13, 2024, 8:37am

Not true. You can connect unique pipes to stdin, stdout and strerr.