Profiling code: multiprocessing.Process sometimes takes 3 seconds to run _Popen

Hello,

I have a Python code that has been running without issues for over a year, but last week a delay in startup started to show up without any update of the code or the base python interpreter. Can anyone help me investigate it?

The cProfiler shows this in case of the delay:

   Ordered by: internal time
   List reduced from 143 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    3.005    1.503    0.001    0.001 /usr/local/lib/python3.12/multiprocessing/context.py:222(_Popen)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.001    0.001 /app/adapter.py:72(_process_message)
      2/1    0.000    0.000    0.001    0.001 /usr/local/lib/python3.12/multiprocessing/process.py:110(start)

The source file looks like this:

#
# Type of default context -- underlying context can be set at most once
#

class Process(process.BaseProcess):
    _start_method = None
    @staticmethod  # <--- line 222
    def _Popen(process_obj):
        return _default_context.get_context().Process._Popen(process_obj)

The same profile without the delay looks like this (the call to _Popen takes no time):

   Ordered by: internal time
   List reduced from 92 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005    0.005    0.005 {built-in method posix.fork}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.005    0.005 /usr/local/lib/python3.12/multiprocessing/popen_fork.py:62(_launch)
        1    0.000    0.000    0.005    0.005 /usr/local/lib/python3.12/multiprocessing/process.py:110(start)
        1    0.000    0.000    0.005    0.005 /app/recorder.py:65(run)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.12/pstats.py:118(init)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.12/site-packages/gelfformatter/formatter.py:103(format)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.12/pstats.py:108(__init__)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.12/logging/__init__.py:298(__init__)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.12/multiprocessing/process.py:80(__init__)
        4    0.000    0.000    0.000    0.000 /usr/local/lib/python3.12/json/encoder.py:205(iterencode)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.12/multiprocessing/util.py:189(__init__)
        1    0.000    0.000    0.005    0.005 /usr/local/lib/python3.12/multiprocessing/context.py:279(_Popen)

I also tried to trace the clone syscall using SystemTap because I found that _Popen calls it under the hood using strace:

#!/usr/bin/env stap

global clones, start

probe begin { start = gettimeofday_ms() }

function timestamp:long() { return gettimeofday_ms() - start }
function proc:string() { return sprintf("%d,%s", pid(), execname()) }

probe syscall.clone
{
    printf("Clone enter,timestamp,%d,proc,%s\n", timestamp(), proc())
}
probe syscall.clone.return
{
    printf("Clone return,timestamp,%d,proc,%s\n", timestamp(), proc())
}

but it does not show any suspiciously long calls to clone().

Thanks for any ideas,
Martin

Something to look into: Popen defaults to close_fds=True, and closing the FDs is done with:

  • the fast close_range(2), if it’s available both at build and run time
  • listing /proc/self/fd, if available, and closing each open file descriptor individually
  • otherwise, going through all possible (!) file descriptors and closing each one individually. You want to avoid this, especially if your RLIMIT_NOFILE is high.

Try checking with strace.
Check python -m sysconfig for HAVE_CLOSE_RANGE.
Make sure /proc is mounted & accessible.
Use close_fds=False if it’s safe for your use case.

AFAIK, the profiler only counts Python functions; called C functions contribute to tottime but not cumtime.

Thank you very much! I’ll try it and let you know the result :slight_smile:

Did any automatic or manual process update any libraries on the machine? Windows will force some updates on people, and if the update just sits there without being installed, we sometimes have weird problems appear.

We had a problem long ago when doing an upgrade on the OS which changed a math library which changed what Perl sprintf() did, and caused major problems with inaccurate prices being calculated.

Did anyone install or reconfigure any anti-malware package on that machine? Maybe all your programs are being scanned now where in the past they weren’t.

Did anyone install or reconfigure any anti-malware package on that machine? Maybe all your programs are being scanned now where in the past they weren’t.

Few years ago I’d probably say that is not possible because it is a Linux server, but now I’m not sure. This already happened to me once, but fortunately (?) the system was broken completely, so I didn’t have to deal with it any more :slight_smile:

I’ll take a look.

It is not available even though I use the official docker image for Python. I wonder why is that.

I also found few interesting parts of the official Python documentation:

Changed in version 3.12: If Python is able to detect that your process has multiple threads, the os.fork() function that this start method calls internally will raise a DeprecationWarning. Use a different start method. See the os.fork() documentation for further explanation.

I wouldn’t use Python 3.12, not all third-party modules will work with it yet. One is speedtest. There are actually a bunch of Speedtests on pypi I can’t recall which one it is now.

Other major vendors may not support 3.12 yet either like MS Azure. I’m not sure about Jupyter notebooks.

I wasn’t able to find close_fds option for multiprocessing.Process and I’d prefer not to recompile my containers with close_range. But after reading the docs I tried to use multiprocessing.set_start_method(method="forkserver") which failed because I also use threads.

After reading even more docs I concluded that cpython discourages me from using both processes and threads in my code, so I’ll try to get rid of the processes in the long run. Anyway I tried to replace the subprocess with a thread. I used a subprocess because the function inside used to be computationally heavy, but it seems to be fine now. I’ll run more tests to verify it performs well enough, but this could hopefully solve the issue.

Yes, MS Azure is a known source of trouble, but I have fairly extensive test suite and it seems to work fine even with Python 3.12, but I have to use old Debian because MS haven’t upgraded their SDK for OpenSSL > 1.1 :roll_eyes: