Using IDLE, script terminates and shows python prompt but it still consuming 100% cpu for hours

I am running a script that consumes 100% CPU when running (looping 2^k times, k=17) and uses LOTS of bigints. When the script ends, I get the last output line and the python prompt (>>>), BUT cpu is still at 100% and ignores Ctrl-C - for hours!. All I want is the output listing, but clicking on the IDLE output window can take minutes before showing the selected text. Is there a secondary process (such as garbage collection) that might have got itself lost? I’ve never used debuggers or loggers in python, but I strongly suspect that the cause is not my code (there’s no input), but the demanding load for several/many hours. Any thoughts welcome, since for every increase of k, the time to run doubles (and so does the output - for k=17, 515841 lines). Once it gets to a week for a k value, then I’ll stop and see if I have enough data for my study.

I am running IDLE 3.10.12 on Linux Mint 21.3 Cinnamon

Note: running python3 from CLI is superquick, so that’s what I’ll use. But still want to know why IDLE locks up.

Does the script create a background thread?
Did your script shutdown the background thread?

no, and no; just 2 nested loops, but outer is 2^17 loops.

Did you run top to check which process is using the CPU time?

Yes. IDLE - 98% to 100% for 2 hours after shows python prompt. I suspect it is the IDLE interface, since running in CLI is mere seconds!

Barry’s the expert. But are you sure IDLE’s not creating background threads? My VS Code set up with various extensions, creates 10 separate processes to open a single repo. If someone reported a bug that VS Code used 100% CPU, I wouldn’t be surprised in the slightest. It’d probably turn out to be populating an auto-complete cache, syntax highlighting (or doing something with Electron or Copilot).

My code is 3 defs: 1 to implement a nested if, next def is a while loop using the first def, and the third def is a range loop from 1 to 2^k (with k=17). Uses lots of bigints, some about 10^100 or more so there’s lots of overhead converting back to decimal, plus bigints are immutable so lots of garbage. What takes two hours using IDLE ran in less than a minute using ‘python3 'xxx.py’ in a terminal. Note that the output is over 500,000 lines and 32MB, so IDLE might just be overloaded.

It would help if you posted your code. But you’ve essentially fixed this already yourself, by dropping IDLE. My suggestion is about not ruling out IDLE any more than you would any other ‘text editor’ / IDE. Not about your code.

There’s a few leftovers from previous work, but here’s the code:

import math

printlim= 9 # higher number cuts off more printing

def CheckElement(p):
    if p[0] % 2 == 0:   # is modulus even?
        if p[1] % 2 == 0:   # is remainder even?
            return "EVEN"       # result is EVEN
        else:               # remainder is odd
            return "ODD"        # result is ODD
    else:                   # modulus is odd
        return "MIXED"  # result is mix of odds and evens


def loopbackQ(p):
    pj=p
    print("[{},{}]".format(pj[0],pj[1]), end="")
    k = int(math.log(p[0])/math.log(2))
    xy=""
    ctrsteps=0
    while pj[1] > p[1] or pj[0] == p[0]: # not loopback
        res=CheckElement(pj)
        ctrsteps += 1
        if res == "ODD":
               print("x ", end="")
               xy = xy + "x"
               pj = [3*pj[0], 3*pj[1]+1]
        if res == "EVEN":
               print("y ", end="")
               xy = xy + "y"
               pj = [int(pj[0]/2), int(pj[1]/2)] 
        if res == "MIXED":
            print("m ", end="")
            break
        print("[{},{}]".format(pj[0],pj[1]), end="")
    print()
    print("  steps=", ctrsteps, "k=",k, '"' + xy + '"', end="")
    if pj[1] <= p[1]:
        print(" loopback")
    else:
        print(" not yet resolved")
    return pj[1] <= p[1]

#print(loopbackQ([32,7]))


def listSubsets(k):
        listLoopbacks=[]
        listCN=[]
        listNYR=[]
        modulus=pow(2,k)
        for n in range(0,modulus):
                if loopbackQ([modulus,n]):
                        listLoopbacks.append([modulus,n])
                        if not loopbackQ([modulus//2,n]):
                            listCN.append([modulus,n])
                else:
                        listNYR.append([modulus,n])
        print("list of loopbacks for", k, "=", listLoopbacks)
        print("list of Collatz Numbers for", k, "=", listCN)
        print("list of not-yet-resolved for", k, "=", listNYR)
        print(len(listLoopbacks), "loopbacks (incl. ", len(listCN), "CNs) and ", len(listNYR), "not yet resolved")
        print(len(listLoopbacks)*100/modulus, "% loopbacks and ", len(listNYR)*100/modulus, "% not yet resolved")
        return len(listLoopbacks)       


listSubsets(17)

Yes, it is Collatz stuff.

Nice one - thanks. None of that should cause an issue. If the initial values were fixed width integers, that’s more simple enough to rewrite in Rust or C, by the way.

I’ve just found myself sometimes, that adding so many print statements to performance critical code, can cause the device or terminal being printed to to be the speed limit, instead of the computational goal. Maybe all the prints (or printing entire lists of length 2 ** k) over loads a frame buffer in IDLE or something.

You may be interested in the integer bit_length method:

In [1]: for i in range(10):
   ...:     print(i, 'has', i.bit_length(), 'bits')
   ...: 
0 has 0 bits
1 has 1 bits
2 has 2 bits
3 has 2 bits
4 has 3 bits
5 has 3 bits
6 has 3 bits
7 has 3 bits
8 has 4 bits
9 has 4 bits

This avoids using floating point arithmetic so it is easier to reason about.

It could be that this is something that can be improved in Idle. My suggestion would be to avoid printing all the numbers to the console and instead write them to a file (or three files). It looks like your program does not need to even keep the numbers in memory and could write the numbers out as it goes.

Alternatively given that your program does keep them all in memory you could avoid needing to compute loopbackQ([modulus//2,n]) by checking what you previously computed if you put the numbers into sets.

1 Like

Oscar,

All good grist for the mill. Thank you. I do not call myself a python coder; it’s just the most accessible tool for me to fiddle with. I am just exploring for Collatz patterns at this time, and all code is rather ad-hoc. I am really learning as I go (both Collatz and python), which is both fun and frustrating. All languages get to be idiosyncratic at the coalface. Back in 1980, FORTH was my favourite. FORTRAN was the first language I used, and COBOL was the first language I got paid to use. None of those are of any use in this problem space! I have made no effort to optimise any code yet, because I do not know what I really am going to find. All assistance and ideas are more than welcome.

Another suggestion is to use multiple arguments to a function rather than putting them into a list. Your code does:

def func(p):
   # stuff with p[0] and p[1]

def other_func():
    func([x, y])

The usual way to do this is like

def func(a, b):
    # stuff with a and b

def other_func():
    func(x, y)

I ran your program. It outputs about 64 million characters. Running it in the terminal is fine, but IDLE has problems dealing with huge amounts of printed text, which leads to the problem you’re seeing.

But yes, you’re right: Either use something else to run the script (the terminal or another IDE), or have the script write its output to a text file instead of printing to the screen.

I don’t know if IDLE is slow to handle this amount of output text because of an IDLE issue or something with the tkinter GUI framework it uses.

2 Likes

Either IDLE or tkinter or both could be the problem. Enough long lines will cause a tkinter Text widget get slower and slower or even seem frozen. But this should be prevented by the squeezer feature if not disabled.

In IDLE’s normal mode with a separate execution process, each print statement pushing output back to Shell throuch the socket takes much longer than the REPL printing directly to a terminal or Command Prompt window. Even if the ratio were 100 (say 10 microseconds versus a millisecond), the difference would not be noticeable with normal output. But with a million prints, it would be. This is mentioned in the IDLE doc, which suggests printing fewer bigger chunks. Printing more text to shell than a normal person reads in a year is obviously exceptional.

If anyone wants to experiment, start IDLE with the ‘-n’ (no subprocess) option and see if it runs better.