Perf_counter_ns() affected by external calls?

Hi,
While trying to make a simple test to measure the time of specially fast functions.

I’ve realized that if I execute another call before executing the time measurement like time.sleep(0.01) or print(1) , the test would take more time than if nothing called before.

Here is a simple code to reproduce what I am referring to:

import sys
import time


def product_times_10(num) -> int:
    return 10 * num


def measure_time(func, args) -> int:
    t0 = time.perf_counter_ns()
    func(*args)
    return time.perf_counter_ns() - t0


def main_time():
    test_func = product_times_10
    test_args = (10,)
    loops = 25
    initial_call = measure_time(test_func, test_args)
    call_without_break_ns = []
    call_with_break_time_sleep_ns = []
    for _ in range(loops):
        call_without_break_ns.append(measure_time(test_func, test_args))
    for _ in range(loops):
        time.sleep(0.01)
        call_with_break_time_sleep_ns.append(measure_time(test_func, test_args))

    print(time.get_clock_info("perf_counter"))
    print(sys.version_info)
    print(f"{initial_call=},")
    print(f"{call_without_break_ns=},")
    print(f"{call_with_break_time_sleep_ns=}.")


if __name__ == "__main__":
    main_time()

The code above causes the following output.

namespace(implementation='QueryPerformanceCounter()', monotonic=True, adjustable=False, resolution=1e-07)
sys.version_info(major=3, minor=12, micro=4, releaselevel='final', serial=0)
initial_call=2800,
call_without_break_ns=[800, 300, 200, 300, 200, 200, 300, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 100, 200, 200, 200, 200, 200, 200, 200],
call_with_break_time_sleep_ns=[900, 800, 2900, 2900, 2900, 4700, 4800, 4700, 4900, 9900, 5800, 5500, 8000, 3700, 4100, 2900, 3200, 3000, 3000, 4800, 9700, 5400, 5400, 5600, 5300].

I am ruunining on Windows 11 with python3.12, but I see same behavior on other versions 3.11, 3.10, 3.9, 3.8;
Also I tested on an old Raspberrypi with python 3.7 and observed the same behavior.

Question

I think I may be making an obvious error, here but I am unable to see it,
Could someone point out my mistake? explain please, Why this discrepancy exists?

Additional Info

What else I have tried:

  • I have replaced the time.sleep() with a custom_sleep() function:
    def custom_sleep(sleep_time: float) -> None:
        t0: int = time.perf_counter_ns()
        timeout: float = sleep_time * 1000_000_000
        while time.perf_counter_ns() - t0 < timeout:
            pass
    
    this helps, it make the difference between “with_break” and “without_break” loops shorter, but there is still an noticable difference
  • I have also replaced the time.sleep with other “simple” functions line print or a lambda x: x**x, the lambda function does not affect the measurements, but the print function also affect the measurements.
  • I have tried to disable the garbage collector during the test, but this make no difference in the results.
  • I have also tried using other counters (monotonic,process_time), but the resolution wasn’t enough to measure these calls.

Here I have a bit more complex code were I am testing all the mentioned scenarios:

import statistics
import sys
import time


def product_times_10(num) -> int:
    return 10 * num


def custom_sleep(sleep_time: float) -> None:
    t0: int = time.perf_counter_ns()
    timeout: float = sleep_time * 1000_000_000
    while time.perf_counter_ns() - t0 < timeout:
        pass


GET_TIME_CLOCK = "perf_counter"
get_time_ns = time.perf_counter_ns
# sleep = custom_sleep
sleep = time.sleep


def measure_time(func, args) -> int:
    t0 = get_time_ns()
    func(*args)
    return get_time_ns() - t0


def measure_time_with_brake(func, args):
    sleep(0.01)
    return measure_time(func=func, args=args)


def loop_func(func, args, loops):
    return [measure_time(func=func, args=args) for _ in range(loops)]


def loop_func_with_break(func, args, loops):

    return [measure_time_with_brake(func=func, args=args) for _ in range(loops)]


def print_py_info():
    print(f"clock info for {GET_TIME_CLOCK}")
    print(time.get_clock_info(GET_TIME_CLOCK))
    print(sys.version_info)


def main_time():
    global sleep
    test_func = product_times_10
    test_args = (10,)
    loops = 25
    initial_call = measure_time(test_func, test_args)
    call_without_break = loop_func(test_func, test_args, loops=loops)
    sleep = lambda x: x**x
    call_with_break_lambda = loop_func_with_break(test_func, test_args, loops=loops)
    sleep = custom_sleep
    call_with_break_custom_sleep = loop_func_with_break(test_func, test_args, loops=loops)
    sleep = print
    call_with_break_print = loop_func_with_break(test_func, test_args, loops=loops)
    sleep = time.sleep
    call_with_break_time_sleep = loop_func_with_break(test_func, test_args, loops=loops)
    print_py_info()
    print(f"{initial_call=},")
    print(f"{call_without_break=},")
    mean_call_without_break = statistics.mean(call_without_break)
    print(f"{call_with_break_lambda=}.")
    print(f"{mean_call_without_break=}.")
    mean_call_with_break_lambda = statistics.mean(call_with_break_lambda)
    print(f"{mean_call_with_break_lambda=}")
    print(f"{call_with_break_custom_sleep=}.")
    mean_call_with_break_custom_sleep = statistics.mean(call_with_break_custom_sleep)
    print(f"{mean_call_with_break_custom_sleep=}")
    print(f"{call_with_break_print=}.")
    mean_call_with_break_print = statistics.mean(call_with_break_print)
    print(f"{mean_call_with_break_print=}")
    print(f"{call_with_break_time_sleep=}.")
    mean_call_with_break_time_sleep = statistics.mean(call_with_break_time_sleep)
    print(f"{mean_call_with_break_time_sleep=}")


if __name__ == "__main__":
    main_time()

output of above code:

0.01
...
0.01
clock info for perf_counter
namespace(implementation='QueryPerformanceCounter()', monotonic=True, adjustable=False, resolution=1e-07)
sys.version_info(major=3, minor=12, micro=4, releaselevel='final', serial=0)
initial_call=3200,
call_without_break=[400, 200, 200, 200, 200, 200, 200, 200, 100, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200],
call_with_break_lambda=[200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 100, 200, 100, 200, 200, 200].
mean_call_without_break=204.
mean_call_with_break_lambda=192
call_with_break_custom_sleep=[600, 1200, 500, 1300, 700, 1100, 1100, 500, 500, 300, 500, 500, 300, 400, 500, 200, 500, 900, 300, 500, 300, 500, 500, 200, 400].
mean_call_with_break_custom_sleep=572
call_with_break_print=[1100, 900, 1000, 800, 1400, 700, 800, 900, 700, 900, 1100, 900, 900, 800, 700, 700, 900, 800, 900, 800, 1100, 1000, 700, 1000, 1100].
mean_call_with_break_print=904
call_with_break_time_sleep=[2400, 3000, 2300, 2200, 2500, 2000, 2600, 2300, 2600, 3900, 3600, 4100, 8500, 5300, 5600, 5300, 5700, 5800, 5400, 5600, 5700, 5600, 6000, 5500, 5600].
mean_call_with_break_time_sleep=4364
1 Like

You are expecting the performance to be consistent and reproducible, but it is not.
Things that effect the reproducability include, but are not limited to:

  1. CPU caching of data and instructions
  2. Operation system kernel scheduling decisions
  3. Interrupts from hardware, network for example
  4. Demands on the CPU from other services and apps that are running

The normal way to benchmark is to run the test code a large number of times to average out the variability. This is what the timeit module does.

As an example of variabity try running this in the REPL:

time.perf_counter_ns()-time.perf_counter_ns()

Each time I run it I get a different answer.

Here is a set of answers from my Fedora Linux X86_64 system:

:>>> time.perf_counter_ns()-time.perf_counter_ns()
-1138
:>>> time.perf_counter_ns()-time.perf_counter_ns()
-576
:>>> time.perf_counter_ns()-time.perf_counter_ns()
-840
:>>> time.perf_counter_ns()-time.perf_counter_ns()
-1028
:>>> time.perf_counter_ns()-time.perf_counter_ns()
-965
:>>> time.perf_counter_ns()-time.perf_counter_ns()
-810

As you can see there is a 2 time difference in the values.

Thanks,
I understand that there can be different thing that could affect the reproducibility,
But why does this effect is consistent when for example I call a print(1), or time.sleep(0.001) between each call?
I also tried REPL
and if I call it one by one I get similar results to your example I see some variance in the results in a range from min 400 to max 900

>>> time.perf_counter_ns() - time.perf_counter_ns()
-800
>>> time.perf_counter_ns() - time.perf_counter_ns()
-500
>>> time.perf_counter_ns() - time.perf_counter_ns()
-900
>>> time.perf_counter_ns() - time.perf_counter_ns()
-500
>>> time.perf_counter_ns() - time.perf_counter_ns()
-400
>>> time.perf_counter_ns() - time.perf_counter_ns()
-400

but when I store all the results in a list comprehension I get also some variance but in another range (min 100 max 200)

>>> a = [time.perf_counter_ns() - time.perf_counter_ns() for _ in range(100)]
>>> a
[-200, -100, -100, -100, -100, -200, -100, -200, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -200, -100, -100, -100, -100, -100, -100, -100, -200, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -200, -100, -100, -100, -100, -100]
>>>

I think that when we run 1 by 1 in REPL the additional process of printing in the console impact the next run, which is what I am trying to understand.

>>> for _ in range (10):
...     print(time.perf_counter_ns() - time.perf_counter_ns())
...
-800
-300
-400
-400
-200
-400
-300
-300
-500
-400

at the end what I am trying to understand is the following:

>>> import time
>>> import statistics
>>> a = []
>>> for _ in range (250):
...      a.append(time.perf_counter_ns() - time.perf_counter_ns())
...
>>> a
[-200, -200, -200, -100, -200, -100, -100, -200, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -200, -100, -100, -200, -200, -200, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -200, -100, -100, -100, -100, -200, -100, -200, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100]
>>> b = []
>>> for _ in range (250):
...      time.sleep(0.01)
...      b.append(time.perf_counter_ns() - time.perf_counter_ns())
...
>>> b
[-1400, -500, -700, -600, -700, -800, -1900, -2100, -1500, -1400, -1600, -1100, -2200, -1900, -2600, -2000, -2300, -2100, -2000, -2700, -1700, -2200, -1600, -1700, -2600, -1800, -2000, -1700, -2100, -1900, -1900, -1700, -1700, -2500, -2000, -1700, -1900, -2000, -1500, -1400, -1700, -1600, -2400, -1800, -2000, -1700, -1600, -2200, -2200, -2200, -2800, -2400, -800, -1100, -3100, -2100, -2300, -1800, -4600, -1600, -1400, -2500, -2100, -2200, -800, -800, -900, -800, -3700, -1800, -1800, -2400, -2300, -2200, -1500, -2400, -1900, -2800, -2000, -1800, -2100, -2100, -2000, -2500, -2000, -1700, -1900, -1700, -5100, -2400, -1700, -1400, -1900, -1800, -1900, -4300, -1900, -2100, -10600, -2300, -2000, -1800, -1600, -1300, -2200, -2100, -2500, -1800, -2000, -1200, -1700, -1600, -1600, -1400, -1700, -1600, -1800, -1800, -2000, -2400, -1800, -2600, -9200, -1500, -2600, -2500, -1700, -1700, -2700, -1700, -1600, -38500, -3800, -1600, -3200, -1400, -2500, -3200, -2800, -2200, -1600, -1700, -2200, -1700, -2900, -1800, -2300, -1700, -2900, -1500, -2100, -1700, -1200, -1900, -1800, -1300, -1600, -1600, -2700, -2200, -3000, -2000, -1800, -2400, -1800, -2100, -4700, -2800, -1800, -1600, -1700, -2200, -1900, -1700, -1900, -2900, -2100, -1900, -3200, -1500, -39800, -2500, -1600, -1600, -2000, -2200, -2000, -1600, -1800, -1500, -1900, -1600, -3700, -2000, -2100, -1900, -1600, -2100, -2400, -1900, -3200, -1800, -2500, -1900, -1700, -1800, -2100, -1800, -1700, -2100, -30500, -1900, -1700, -1700, -2500, -1700, -1600, -1800, -2200, -2200, -2100, -2000, -2000, -1700, -2100, -1800, -2000, -1700, -1900, -1400, -1800, -2300, -1600, -2100, -1400, -1400, -1700, -1400, -2900, -1900, -1800, -1900, -1600, -1800, -1300, -1800, -2100, -1700, -1700, -2200]
>>> statistics.mean(a)
-110.8
>>> statistics.mean(b)
-2451.2

Why statistics.mean(a) is so different from statistics.mean(b); It looks oddly consistent that that 2 consecutive calls to time.perf_counter_ns() take more time when those calls are preceded by a time.sleep() or print()

@barry-scott Did you look at their results? Getting ~200 ns many times versus getting ~3000-10000 ns many times is pretty consistent and reproduced. And I can reproduce it as well, with their script as-is, seeing similar results. And your own experiments aren’t even trying what they’re doing, i.e., what this topic is about.

But they may show a way to easier reproducibility:

>>> time.perf_counter_ns()-time.perf_counter_ns()
-1000
>>> time.perf_counter_ns()-time.perf_counter_ns()
-1000
>>> time.perf_counter_ns()-time.perf_counter_ns()
-1300
>>> time.perf_counter_ns()-time.perf_counter_ns()
-1400
>>> time.perf_counter_ns()-time.perf_counter_ns()
-1200
>>> time.perf_counter_ns()-time.perf_counter_ns()
-1000
>>> time.sleep(0.01);time.perf_counter_ns()-time.perf_counter_ns()
-2000
>>> time.sleep(0.01);time.perf_counter_ns()-time.perf_counter_ns()
-1600
>>> time.sleep(0.01);time.perf_counter_ns()-time.perf_counter_ns()
-1800
>>> time.sleep(0.01);time.perf_counter_ns()-time.perf_counter_ns()
-4400
>>> time.sleep(0.01);time.perf_counter_ns()-time.perf_counter_ns()
-2600

Correct, I’m commenting on why benchmarking may not produce consistent results.
Isn’t that the point at issue?

Hi,
I have answered your posts previous times but the system bot is hidden them due to spam, hopefully this make it through.

Thank you all for your comments,
I should have provided an easier way for reproducing,
At the end what I want to understand is the followng:

>>> import time
>>> import statistics
>>> a = []
>>> for _ in range (250):
... a.append(time.perf_counter_ns() - time.perf_counter_ns())
...
>>> a
[-200, -200, -200, -100, -200, -100, -100, -200, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -200, -100, -100, -200, -200, -200, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -200, -100, -100, -100, -100, -200, -100, -200, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100]
>>> b = []
>>> for _ in range (250):
... time.sleep(0.01)
... b.append(time.perf_counter_ns() - time.perf_counter_ns())
...
>>> b
[-1400, -500, -700, -600, -700, -800, -1900, -2100, -1500, -1400, -1600, -1100, -2200, -1900, -2600, -2000, -2300, -2100, -2000, -2700, -1700, -2200, -1600, -1700, -2600, -1800, -2000, -1700, -2100, -1900, -1900, -1700, -1700, -2500, -2000, -1700, -1900, -2000, -1500, -1400, -1700, -1600, -2400, -1800, -2000, -1700, -1600, -2200, -2200, -2200, -2800, -2400, -800, -1100, -3100, -2100, -2300, -1800, -4600, -1600, -1400, -2500, -2100, -2200, -800, -800, -900, -800, -3700, -1800, -1800, -2400, -2300, -2200, -1500, -2400, -1900, -2800, -2000, -1800, -2100, -2100, -2000, -2500, -2000, -1700, -1900, -1700, -5100, -2400, -1700, -1400, -1900, -1800, -1900, -4300, -1900, -2100, -10600, -2300, -2000, -1800, -1600, -1300, -2200, -2100, -2500, -1800, -2000, -1200, -1700, -1600, -1600, -1400, -1700, -1600, -1800, -1800, -2000, -2400, -1800, -2600, -9200, -1500, -2600, -2500, -1700, -1700, -2700, -1700, -1600, -38500, -3800, -1600, -3200, -1400, -2500, -3200, -2800, -2200, -1600, -1700, -2200, -1700, -2900, -1800, -2300, -1700, -2900, -1500, -2100, -1700, -1200, -1900, -1800, -1300, -1600, -1600, -2700, -2200, -3000, -2000, -1800, -2400, -1800, -2100, -4700, -2800, -1800, -1600, -1700, -2200, -1900, -1700, -1900, -2900, -2100, -1900, -3200, -1500, -39800, -2500, -1600, -1600, -2000, -2200, -2000, -1600, -1800, -1500, -1900, -1600, -3700, -2000, -2100, -1900, -1600, -2100, -2400, -1900, -3200, -1800, -2500, -1900, -1700, -1800, -2100, -1800, -1700, -2100, -30500, -1900, -1700, -1700, -2500, -1700, -1600, -1800, -2200, -2200, -2100, -2000, -2000, -1700, -2100, -1800, -2000, -1700, -1900, -1400, -1800, -2300, -1600, -2100, -1400, -1400, -1700, -1400, -2900, -1900, -1800, -1900, -1600, -1800, -1300, -1800, -2100, -1700, -1700, -2200]
>>> statistics.mean(a)
-110.8
>>> statistics.mean(b)
-2451.2

Why is statistics.mean(a) so different from statistics.mean(b) it looks oddly consistent that the result of calling 2 consecutive time.perfcounter_ns() takes longer if those call were preceded by a time.sleep() or a print() than if nothing was preceded

cc @vstinner

No, it’s not.

You are measuring the time to warm up after an interruption.

If you perform the warm-up after the interruption, you will get nearly the same time measurements:

    for _ in range(loops):
        time.sleep(0.01)

        # warm up
        call_with_break_time_sleep_ns.append(measure_time(test_func, test_args))
        call_with_break_time_sleep_ns.append(measure_time(test_func, test_args))
        call_with_break_time_sleep_ns.append(measure_time(test_func, test_args))
        call_with_break_time_sleep_ns.append(measure_time(test_func, test_args))
        call_with_break_time_sleep_ns.append(measure_time(test_func, test_args))
        del call_with_break_time_sleep_ns[-1]
        del call_with_break_time_sleep_ns[-1]
        del call_with_break_time_sleep_ns[-1]
        del call_with_break_time_sleep_ns[-1]
        del call_with_break_time_sleep_ns[-1]

        call_with_break_time_sleep_ns.append(measure_time(test_func, test_args))
1 Like

In what way is it “cold”/slow so that it needs to “warm up”? What happens during warm-up that makes it faster again?

Thanks,
Yes I can belive that but,
What is considered “an interruption” then ?
I provided the worst case with a time.sleep() but I also stated that I’ve tried with other functions like print() or the custom_sleep(), which is just a while loop with a timeout, but not all the functions affected the measurement (caused a cool down), like the aritmetic lambda that I used in one of the scenarios.
So should we just assume that any function call will cause a cool down?
Should we always then run the time.perf_counter_ns() several times before making the next measurement?.

Using psutil, it seems that the version without time.sleep() doesn’t incur in interrupts and context switches, but the one with time.sleep() does:

Edit: I think this is only because more time is elapsing, sorry about the noise :frowning:

import psutil

[...]
    call_without_break_ns = []
    without_interrupts = []
    call_with_break_time_sleep_ns = []
    with_interrupts = []
    for _ in range(loops):
        without_interrupts.append(psutil.cpu_stats().interrupts)
        call_without_break_ns.append(measure_time(test_func, test_args))
    for _ in range(loops):
        time.sleep(1.1)
        with_interrupts.append(psutil.cpu_stats().interrupts)
        call_with_break_time_sleep_ns.append(measure_time(test_func, test_args))

    print(time.get_clock_info("perf_counter"))
    print(sys.version_info)
    print(f"{initial_call=},")
    print(f"{call_without_break_ns=},")
    print(f"{without_interrupts=}")
    print(f"{call_with_break_time_sleep_ns=}.")
    print(f"{with_interrupts=}")

Gives:

namespace(implementation='QueryPerformanceCounter()', monotonic=True, adjustable=False, resolution=1e-07)
sys.version_info(major=3, minor=12, micro=4, releaselevel='final', serial=0)
initial_call=16000,
call_without_break_ns=[1500, 800, 600, 600, 500, 500, 600, 600, 500, 500, 600, 500, 600, 500, 400, 400, 500, 500, 500, 500, 500, 600, 500, 600, 600],
without_interrupts=[1749536682, 1749536682, 1749536682, 1749536682, 1749536682, 1749536682, 1749536682, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683, 1749536683]
call_with_break_time_sleep_ns=[2900, 3200, 3500, 3500, 2900, 2500, 3100, 3500, 3500, 3100, 3100, 2900, 2600, 3800, 4300, 4700, 3400, 3800, 3700, 2700, 3400, 3700, 3000, 3000, 3400].
with_interrupts=[1749539438, 1749541742, 1749543744, 1749545818, 1749549177, 1749551230, 1749553318, 1749555144, 1749557689, 1749559790, 1749561837, 1749565217, 1749567213, 1749569033, 1749570825, 1749573802, 1749575928, 1749577872, 1749579917, 1749582877, 1749584806, 1749586728, 1749590002, 1749592538, 1749594963]

So, while I’m not sure this explains a lot (that I can understand), the stats are different for each case.

Any of these are possible.

The CPU caches fill with benchmark code and data.
OS caches fill with benchmark data.
The scheduler raises the priority of the thread running the benchmark.

3 Likes

So, does this means that time.sleep always causes any/all of those you mentioned?

In the benchmark I ran, it was almost always consistent that, time.sleep impacted significantly more the measurements than other calls.

It is likely that one or more happen.

If you look in the windows task manager just after you login you can see the nymber of processes and thread the OS is running.
I see 190 processes and 2k threads.
If you observe you will see that without tou doing anything the number of processes changes a little.

All that background activity is putting noise into you benchmarking.
Averaging over a number of long runs will reduce the impact of that noise on your measurements.

When you call time.sleep(seconds), the calling thread is suspended for the specified number of seconds, allowing other threads to run. This suspension gives the operating system an opportunity to switch context to another thread or process, enabling concurrency and improving the efficiency of multitasking.

Context switching is costly and is the main issue when using threads.

Here, you can use time.sleep to switch between threads:

import threading
import time

def print_numbers():
    for i in range(5):
        print(i)
        time.sleep(1)  # Simulates a blocking operation

def print_letters():
    for letter in 'abcde':
        print(letter)
        time.sleep(1)  # Simulates a blocking operation

# Create two threads
t1 = threading.Thread(target=print_numbers)
t2 = threading.Thread(target=print_letters)

# Start the threads
t1.start()
t2.start()

# Wait for both threads to complete
t1.join()
t2.join()
1 Like

Is this then the same case for other kind of function call like a “simple” print call?

import statistics
import sys
import time



def main_time():
    loops = 100
    initial_call = time.perf_counter_ns()-time.perf_counter_ns()
    call_without_break_ns = []
    call_with_break_time_sleep_ns = []
    for _ in range(loops):
        print(1)
        call_with_break_time_sleep_ns.append(time.perf_counter_ns()-time.perf_counter_ns())
    for _ in range(loops):
        call_without_break_ns.append(time.perf_counter_ns()-time.perf_counter_ns())

    print(time.get_clock_info("perf_counter"))
    print(sys.version_info)
    print(f"{initial_call=},")
    print(f"{call_with_break_time_sleep_ns=}.")
    print(f"Mean: {statistics.mean(call_with_break_time_sleep_ns)}, \n")
    print(f"{call_without_break_ns=},")
    print(f"Mean: {statistics.mean(call_without_break_ns)}, \n")


if __name__ == "__main__":
    main_time()

In this example I call print(1) between each measurement and one set of the results are taking “longer” than the other; maybe not by much as in the case of time.sleep(1), but it still, a noticeable difference

1
...
1
namespace(implementation='QueryPerformanceCounter()', monotonic=True, adjustable=False, resolution=1e-07)
sys.version_info(major=3, minor=10, micro=11, releaselevel='final', serial=0)
initial_call=-200,
call_with_break_time_sleep_ns=[-400, -400, -400, -300, -300, -400, -300, -400, -300, -300, -400, -300, -400, -300, -300, -300, -300, -300, -300, -300, -400, -200, -300, -300, -300, -300, -300, -300, -300, -400, -200, -400, -300, -400, -200, -300, -400, -400, -400, -300, -400, -400, -400, -400, -400, -400, -500, -300, -400, -300, -700, -400, -300, -500, -300, -200, -300, -300, -400, -300, -300, -300, -400, -400, -400, -400, -400, -400, -400, -300, -400, -300, -300, -300, -400, -1900, -400, -400, -1200, -800, -400, -400, -400, -400, -300, -400, -400, -300, -300, -200, -300, -300, -200, -400, -400, -400, -1500, -400, -800, -500].
Mean: -395,

call_without_break_ns=[-200, -100, -200, -100, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -200, -200, -100, -100, -100, -100, -200, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100],
Mean: -108,

So would it be safe then to assume that any unknow previous statement before time.perf_counter_ns() could cause any or many of those OS system cache fills/priority changes/ thread interruption?
which consequently impacts the measurement, since we would also be measuring the “warm up”

1 Like

I print call does I/O and that will cause python to release the GIL and allow other python threads to run.

Once you are happy that your measurement code is working as you want remove the debug print() and time.sleep() calls.

1 Like

Thank you all for your answers, they have been really informative,
Here are the conclusions I draw from this discussion:

  • The time.perf_counter_ns()-time.perf_counter_ns() statement will actually give a larger result if a print() or a time.sleep() is preceded.
  • The reason for that to happen is that some of these time.sleep(), print(), and pottentlaly other kind of calls is that those function may cause one or many of the following OS events:
    • The CPU caches fill with benchmark code and data.
    • OS caches fill with benchmark data.
    • The scheduler raises the priority of the thread running the benchmark.
    • OS Context switch caused by thread interruption
  • These OS events will cause that the time.perf_counter_ns()-time.perf_counter_ns() statement needs to be warmed-up which means that time.perf_counter_ns()-time.perf_counter_ns() will actually give a larger result.
  • Knowing this it would be safe to assume that any unknow previous statement before time.perf_counter_ns() could cause any or many of those OS events.
    which consequently impacts the measurement, since we would also be measuring the “warm up”

So if a somewhat reliable benchmark for performance critical calls wants to be executed, one must to run the measure the targeted call/code a large number of times and ensure that there is no intermediate call or statement between each measurement that causes these OS events. This can be achieved for example by using the built in timeit module,

If one must necessarily make and intermediate call or statement between each measurement, then take in account the potential noise caused by the mentioned OS events