Unexpected large execution time

Hi Daniel,
Thank for introducing the concept of profiler to me. A few questions, if you do not mind:
1- After installing line_profiler, and issuing the command ’ kernprof -lv my_code.py’, what happens? Do I still need to run Python as usual: ‘python3 my_code.py’?

2- It also seems that I need to modify my_code.py and outside a function call to insert a line: ‘@profier’.
But how do you end this profiling if you are targeting a specific function? That is how do we indicate that a very specific function requires profiling and not all other functions?

Thanks a lot.

In case interested, I modified and simplified the code as follows so no other function call is used other than a simple stupid while loop:

from datetime import datetime
import gc

gc.disable() 
m=0
s = 2000
t = 500
j = 7
i = 750
all_multiplied = s * t * j * i
k = 0
start_time0 = datetime.now()
#for k in range (all_multiplied):
while (k < all_multiplied):
    m = m + 1
    k = k + 1

end_time0 = datetime.now()
elapsed_time0 = (end_time0 - start_time0).total_seconds() * 1000
print ("All done! Number of iterations = ", k+1, ", of the total= ",  all_multiplied)
print ("start time= ", start_time0)
print ("end   time= ", end_time0)
print (f"      Elapsed time: {elapsed_time0} miliseconds")

I then ran the code twice. One resulted the execution in 2 hours as can be seen here:

All done! Number of iterations = 5250000001 , of the total= 5250000000
start time= 2023-05-12 19:18:55.621718
end time= 2023-05-12 20:52:51.384111

  •  Elapsed time: 5635762.393 miliseconds*
    

And the second one 5 hours!:

All done! Number of iterations = 5250000001 , of the total= 5250000000
start time= 2023-05-12 23:23:46.436322
end time= 2023-05-13 04:26:53.443783

  •  Elapsed time: 18187007.461000003 miliseconds*
    

So something really strange is going on here.

Also further info on my system:
I am on a MacBook Pro, macOS Mojave, 2.5 GHz Intel Core i5; with 8 GB 13333 MHz DDR3

And you’ve confirmed that this is true with garbage collection active, and with monotonic and perf_counter clocks too?

This detail is irrelevant and distracting. n = 5250000000 would be better. A smaller and more rounded n would be even better. Posted code should be minimized and irrelevant detail eliminated.

You have been asked more than once to use a timer function designed to measure elapsed CPU time measurement instead off something not so designed. Using datetime.now across a ‘fall back’ wall clock change or a system clock resetting from the internet could even result in a negative elapsed time.

Have you tried disabling App Nap? I believe the command to disable it system wide is something along the lines of defaults write NSGlobalDomain NSAppSleepDisabled -bool YES. (Not sure whether a reboot is required to make that take effect or not.)

Terry,
These small additional 4 lines nobody complained about and it skipped my mind (as I had it originally in the code).

On using timer function, if you have read the thread, I mentioned that I have tried couple of them (time.monotonic, and time.perf_coutner) to no avail.

Furthermore, if you had paid attention to the thread and checked out the latest code, you would have noticed that there are no reference to datetime.now within the loop, so that makes this recommendation irrelevant.

Hi Mark,
No I have not tried that, but what is App Nap?! Not familiar with it and how is it related to Python?

Thanks

Hi Chris,
In the latest code that I have shown, I am not even using time.monotinic or time.perf_counter within the loop. Also as you have noticed, gc is enabled. I can try to use time monotonic once again outside the loop and remove gc to see if that could have any impact, though I doubt that.

Yes, but you’re using datetime.now() as your definition of how long it takes. As has been mentioned by several people, this is an unideal way to measure execution time, and it would be more reliable to use monotonic or process_time to do this. Ideally, measure multiple clocks and show all of them at the end.

I strongly recommend reading up on the different clocks available and how best to use them.

Definitely something to look into. Do you know how this interacts with process_time()?

@curiousg

I tried your code on my machine (macOS Ventura; Python 3.11.3). The first one finished in about 75 minutes, and the one without the timing code in the loop took 7 minutes to run.

If you were to do (for example) 1/20th of the numbers of iterations, does it take around 1/20th of the time? Or does it take much less/more time than that?

Chris,
I am currently trying the latest code with gc disabled as well as changing the delay using time.monotonic().

On your second question, I have no idea what App Nap is and obviously slightest idea on its interaction with process_time(). Need to educated myself on that.

Hi J. Rivers,
I guess you do not need to focus on the first version as I am being criticized for using measuring delay within the loop. But in the latest version where I have no delay measurement within the loop, I wonder if you can try the code couple of times and see if you get the same result each time.

On reducing the number of iterations, I will try it later after my current experiment where I am using time.monotonic() in the code instead (although I vaguely remember that I tried to reduce the number of iteration before but do not remember exactly the outcome!)

For kernprof usage, please see python - How do I use line_profiler (from Robert Kern)? - Stack Overflow

I’m now thinking you’re probably seeing garbage collection.

I realize you’re turning off the generational GC, but that’s a different thing from Python’s Reference Counting, which I’m hearing can’t be turned off.

What leads you to think that that’s the cause? I’m curious here. With the generational GC disabled, refcounting in CPython will only ever result in immediate disposal of objects (other language interpreters do other things, like batching up garbage disposal until a function returns, but CPython always deallocates objects immediately). So that shouldn’t give the results shown here.

As a side note, you wouldn’t ever WANT to disable the basic refcounting garbage collection. If you did, every temporary and intermediate value would stick around until the process terminates. You could probably hack CPython to disable Py_DECREF altogether, and you’d find that your RAM usage quickly spirals out of control. There are good reasons for disabling the generational GC, usually relating to consistency (especially timing consistency), but the basic refcounter doesn’t have inconsistency to be removed.

https://news.ycombinator.com/item?id=32276580

I could be wrong though.

Quoting from one of those comments:

Essentially, whereas tracing GC has pauses while tracing live data, reference counting has pauses while tracing garbage.

That’s pretty much how it goes. The two most popular forms of garbage collection (or perhaps I should say, the two popular broad families, since each one comes in many variations) are “reference counting” and “mark-and-sweep” (also called “tracing” as used in that quote). One important distinction, though, is what happens when you temporarily disable the tracing part of the algorithm. With refcounting, it falls back on a naive “if nothing is using it, it can be thrown away” algorithm, which works on everything that isn’t part of a reference cycle; with mark-and-sweep, it now fails to deallocate anything at all, which is significantly faster but has the slight downside of, well, never reusing any memory.

The comment is exactly right; both forms of garbage collection can induce pauses within the running time. With CPython, the pauses for garbage collection are managed by recognizing that most garbage is fresh, and thus it can do a quick GC pass by only looking at objects that were allocafed since the most recent GC pass (that’s why it’s called “generational” - anything that survives the first GC pass since its creation is likely to survive more, so it’s less important to check it), but they will still be there, particularly in refloop-heavy code.

Are you running this in an iTerm2 on your Macs?

I’ve seen iTerm2 (or something it depends on) cause sluggishness from time to time.

Not a chance here. gc is used only to reclaim trash in cycles, and there are none here. It’s not triggered at all unless the number of object allocations significantly exceeds the number of object deallocations so far, but that never happens in this code: refcounting cleans up the trash this program creates at the same rate it’s created.

I don’t see anything notable running this on my box, but it’s Windows. One nasty possibility nobody has mentioned yet: code that unexpectedly takes very long at times can be a symptom of impending hard drive failure (among other, even less likely, hardware problems).

That would have explained this issue as well if I had not run the same code on two different machines (both Macbook pro, but the other one a rather new machine). So chances are slim here but possible.

For all to know, I have removed the line “gc.disable()” from the code and am running the experiment with time.monotonic() to see if that makes a difference. Doubt it though!

At this point, I am about to give up. First here is the entire code that I have experimented with in the latest runs (note that I have removed all lines from the code based on received recommendations):

#from datetime import datetime
#import gc

#gc.disable()
import time
m=0
s = 2000
t = 500
j = 7
i = 750
all_multiplied = s * t * j * i
k = 0
#start_time0 = datetime.now()
start_time0 = time.monotonic()
#for k in range (all_multiplied):
while (k < all_multiplied):
    m = m + 1
    k = k + 1

#end_time0 = datetime.now()
end_time0 = time.monotonic()
#elapsed_time0 = (end_time0 - start_time0).total_seconds() * 1000
elapsed_time0 = (end_time0 - start_time0) * 1000
print ("All done! Number of iterations = ", k+1, ", of the total= ",  all_multiplied)
print ("start time= ", start_time0)
print ("end   time= ", end_time0)
print (f"      Elapsed time: {elapsed_time0} miliseconds")

I then run the above codes 4 times. The first three took about half an hour (unacceptable yet still reasonable). What REALLY threw me off was the forth one! The reported elapsed time was indicating still about half an hour, yet the real time clock when the code ACTUALLY finished took more than 10 hours!! :shushing_face: :shushing_face:

So maybe time.monotonic() call is not appropriate here.
Here are the 4 recordings of these 4 runs:

First Run:
All done! Number of iterations = 5250000001 , of the total= 5250000000
start time= 155263.557062098
end time= 156270.338826857
Elapsed time: 1006781.7647590127 miliseconds

2nd Run:
All done! Number of iterations = 5250000001 , of the total= 5250000000
start time= 156586.463894025
end time= 157763.631888741
Elapsed time: 1177167.994716001 miliseconds

3rd Run:
All done! Number of iterations = 5250000001 , of the total= 5250000000
start time= 157916.881159114
end time= 158963.813152404
Elapsed time: 1046931.9932899962 miliseconds

4th Run:
All done! Number of iterations = 5250000001 , of the total= 5250000000
start time= 159231.365576018
end time= 160424.175859087
Elapsed time: 1192810.283069004 miliseconds