Subprocess.run unexpected behavior when forking in loop

ShreyasKallingal · November 13, 2023, 1:27am

I have a Python script that uses subprocess.run to run a single compiled C executable, which should fork 3 processes and print the captured output.

I am running Ubuntu 22.04.03 on x86_64, with Python 3.10.12. I’ve tested on different Python versions though, and this hasn’t changed the observed behavior.

Here is a minimal example:

import subprocess
import tempfile
import os

output_file = tempfile.TemporaryFile()
env = os.environ.copy()

command = ["./a.out"]
result = subprocess.run(command, stdout=output_file, stderr=subprocess.PIPE, env=env)

output_file.seek(0)
output = output_file.read(512*1024).decode('utf-8', 'ignore').strip()
print(output)

and the C source code:

#include <unistd.h>
#include <sys/wait.h>
#include <stdio.h>

int main() {
  pid_t child_pids[3];
  int statuses[3];

  for (int i = 0; i < 3; i++) {
    printf("forking\n");
    pid_t id = fork();
    child_pids[i] = id;
    if (child_pids[i] > 0) {
      waitpid(child_pids[i], &statuses[i], 0);
      printf("waited %d \n", i);
    } else if (child_pids[i] == 0) {
      printf("in child %d\n", i);
      return 0;
    } else {
      printf("Fork failed.\n");
      return 1;
    }
  }
  return 0;
}

I would expect the following output (which I do receive if I run the executable in a shell):

forking
in child 0
waited 0 
forking
in child 1
waited 1 
forking
in child 2
waited 2

However, my Python script gives me the following:

forking
in child 0
forking
waited 0 
forking
in child 1
forking
waited 0 
forking
waited 1 
forking
in child 2
forking
waited 0 
forking
waited 1 
forking
waited 2

It seems that only 3 child processes are forked (correctly), but the parent process continues looping excessively. I have tried several variations in my C source code to try and figure out the root cause, but I haven’t been able to change the behavior when running my Python script. It almost seems like a Python bug, but that is obviously unlikely.

I would greatly appreciate any ideas to address this—thanks!

ShreyasKallingal · November 13, 2023, 1:40am

Update: it seems to be related to how file descriptors are handled by Python subprocess. When I redirect the child process’ stdout to stderr (just to keep things separate), the parent’s output is as expected. But the child notably prints everything, including lines from the parent strangely. Still not sure exactly what’s going on.

Rosuav · November 13, 2023, 2:05am

You might be getting bitten by buffering. Try flushing stdout after each printf call, or at very least doing so before the fork.

csm10495 · November 13, 2023, 2:10am

This is an issue with buffering on stdout. Try this instead:

#include <unistd.h>
#include <sys/wait.h>
#include <stdio.h>

int main() {
  pid_t child_pids[3];
  int statuses[3];

  for (int i = 0; i < 3; i++) {
    printf("forking\n");
    fflush(stdout); // <-- Added this.

    pid_t id = fork();
    child_pids[i] = id;
    if (child_pids[i] > 0) {
      waitpid(child_pids[i], &statuses[i], 0);
      printf("waited %d \n", i);
    } else if (child_pids[i] == 0) {
      printf("in child %d\n", i);
      return 0;
    } else {
      printf("Fork failed.\n");
      return 1;
    }
  }
  return 0;
}

Notice the call to fflush(stdout) to ensure that it gets sent to stdout before forking. Otherwise both children may get around to printing after the fact.

I guess the way the stream is handled is different when Python is the parent vs the console is the parent. Either way, its best to flush here before fork since otherwise its a race condition of sorts.

ShreyasKallingal · November 13, 2023, 2:13am

Thank you so much! I tried flushing before within the child only, but this makes more sense. Glad I finally asked on this forum