Hello,
I have a Python code that has been running without issues for over a year, but last week a delay in startup started to show up without any update of the code or the base python interpreter. Can anyone help me investigate it?
The cProfiler shows this in case of the delay:
Ordered by: internal time
List reduced from 143 to 50 due to restriction <50>
ncalls tottime percall cumtime percall filename:lineno(function)
2/1 3.005 1.503 0.001 0.001 /usr/local/lib/python3.12/multiprocessing/context.py:222(_Popen)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.001 0.001 /app/adapter.py:72(_process_message)
2/1 0.000 0.000 0.001 0.001 /usr/local/lib/python3.12/multiprocessing/process.py:110(start)
The source file looks like this:
#
# Type of default context -- underlying context can be set at most once
#
class Process(process.BaseProcess):
_start_method = None
@staticmethod # <--- line 222
def _Popen(process_obj):
return _default_context.get_context().Process._Popen(process_obj)
The same profile without the delay looks like this (the call to _Popen takes no time):
Ordered by: internal time
List reduced from 92 to 50 due to restriction <50>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.005 0.005 0.005 0.005 {built-in method posix.fork}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.005 0.005 /usr/local/lib/python3.12/multiprocessing/popen_fork.py:62(_launch)
1 0.000 0.000 0.005 0.005 /usr/local/lib/python3.12/multiprocessing/process.py:110(start)
1 0.000 0.000 0.005 0.005 /app/recorder.py:65(run)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.12/pstats.py:118(init)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.12/site-packages/gelfformatter/formatter.py:103(format)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.12/pstats.py:108(__init__)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.12/logging/__init__.py:298(__init__)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.12/multiprocessing/process.py:80(__init__)
4 0.000 0.000 0.000 0.000 /usr/local/lib/python3.12/json/encoder.py:205(iterencode)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.12/multiprocessing/util.py:189(__init__)
1 0.000 0.000 0.005 0.005 /usr/local/lib/python3.12/multiprocessing/context.py:279(_Popen)
I also tried to trace the clone syscall using SystemTap because I found that _Popen calls it under the hood using strace:
#!/usr/bin/env stap
global clones, start
probe begin { start = gettimeofday_ms() }
function timestamp:long() { return gettimeofday_ms() - start }
function proc:string() { return sprintf("%d,%s", pid(), execname()) }
probe syscall.clone
{
printf("Clone enter,timestamp,%d,proc,%s\n", timestamp(), proc())
}
probe syscall.clone.return
{
printf("Clone return,timestamp,%d,proc,%s\n", timestamp(), proc())
}
but it does not show any suspiciously long calls to clone()
.
-
Can I debug _Popen even further to discover what takes 3 seconds?
-
How is it possible that tottime is bigger than cumtime?
numpy - How come the tottime is *greater* than cumtime, when using cProfile? - Stack Overflow
python - What is the difference between tottime and cumtime on cProfile output? - Stack Overflow -
Can anyone think of a reason for _Popen to take so long? I’m running out of ideas. If it was a problem with the
clone()
syscall, it would suggest a problem with HW, number of file descriptors etc., but I tested the underlying HW and the system has plenty of CPU time and free file descriptor. -
For some reason, the call to _Popen occurs twice in case of the delay, but the code calls
.start()
on the multiprocessing.Process object only once. I don’t know why that happens.
Thanks for any ideas,
Martin