Mutually referenced objects not getting garbage collected in loop

The objects a and b have references to each other.
The memory does not get freed during the loop.
Manually calling gc in the loop can free the memory.
Is there a better way to handle it?

import numpy as np

class A:
    def __init__(self):
        self.data = np.random.rand(10000000).tolist()
    def setb(self, b):
        self.b = b
        
class B:
    def __init__(self):
        self.data = np.random.rand(10000000).tolist()
    def seta(self, a):
        self.a = a


def test():
    a = A()
    b = B()

    a.setb(b)
    b.seta(a)
    
for i in range(100):
    print(i)
    test()

del a

…will release the object assignment and free up memory, as discussed HERE.

This would not help in this case. When the test() function exits, its local scope ceases to exist. In effect it is like all the local variables are deleted.

def test():
    a = A()
    b = B()

    a.setb(b)
    b.seta(a)

    del a, b  # This is redundant. Local variables are deleted without this anyway.

The problem here is that object a refers to object b and vice versa. So when we delete a and b their reference counts will not become zero and the objects will not be deleted immediately. A simpler example of a cyclic reference of two objects is:

l1 = [None]
l2 = [l1]
l1[0] = l2

del l1, l2
# This deletes the variables but it does not delete the objects immediately.

To resolve similar cases Python’s garbage collector detects reference cycles but this does not happen immediately. You can tune the parameters for such cleanup using the gc module:

https://docs.python.org/3/library/gc.html

https://devguide.python.org/internals/garbage-collector/

@chcl3 so maybe the problem is that in the loop the garbage collection did not happen yet and you need to tune the GC parameters for your case. BTW how do you monitor the garbage collection? Are you sure that it did not happen?

Thank you for your reply.

Yes, one has to manually call GC. It does not seem to be a good solution when the classes are part of a module.
Maybe one should never use such mutual reference of large objects in loops.

I didn’t monitor the GC process. I find removing a.setb(b) and b.seta(a) solves the problem.
So I guess it is a GC problem.

Actually I solved it by passing a and b as arguments to the methods using them rather than storing them as attributes to the mutual reference.

Yes, certainly, if you can avoid the circular references for such huge objects, get rid of it. They prevent the fast and efficient garbage collection.

If you need the references but you want to leave the objects for the fast garbage collection, check weakref — Weak references — Python 3.10.5 documentation

I was interested in how it works so I simplified your code:

import gc

class A:
    def __init__(self, ref=None):
        self.data = [object() for _ in range(10_000_000)]
        self.ref = ref

def testfn():
    a = A()
    b = A(a)
    a.ref = b

print(f'{gc.get_threshold() = }')
gc.set_threshold(30)
print(f'{gc.get_threshold() = }')

for i in range(1000):
    print(f'{i:3}\t{gc.get_count()}')
    testfn()

When you lower the GC threshold for example from 700 (my default) to 30, the objects are garbage collected after every few cycles of the loop.

I am not sure what is the first number in gc.get_count(). It is not the number of objects created because it would grow by millions then. Maybe it is the number of immediate garbage collections? Anyway when this count exceeds the threshold then the garbage collection of the cyclically referred objects starts.

Why do you care? Are you asking out of curiosity, or because you have an actual problem that needs to be solved?

For most people. you don’t need to care. Like any other garbage collected language, such as Java, the garbage collector will eventually run and clear the a and b objects even if they are in a cycle. It is unpredictable how often that runs, but you can tune the garbage detector thresholds to make it run more or less frequently.

So the first solution is: don’t do anything, you’re probably worrying for nothing, the garbage collector will solve your problem after a few seconds or so.

If that’s not enough of a solution, you can:

  • manually run a collection, but that hurts performance;
  • use weakrefs to avoid forming a reference cycle, but that takes more work;
  • manually break the reference cycle, which then allows the reference counter to recover the object’s memory immediately;
  • avoid forming the reference cycle in the first place.

Without knowing what your actual problem is, it is hard to know that is the best solution.