Different value from gc.get_count() VS len(gc.get_objects(gen))

DanielLee343 · October 9, 2023, 5:55pm

I’m doing some dummy tests on python GC module and found this inconsistency behavior regarding gc.get_count() and gc.get_objects(). For example:

import gc
gc.collect()
import random
random.seed(1)
matrix_size = 5
# initialize a random 2D-list matrix
matrix_A = [[random.randint(1, 10) for _ in range(matrix_size)] for _ in range(matrix_size)]
print(gc.get_count())             # (59, 1, 0)
print(len(gc.get_objects(0))) # 78
print(len(gc.get_objects(1))) # 363
print(len(gc.get_objects(2))) # 4655
print(len(gc.get_objects()))   # 5096

As you see the value get from gc.get_count() does not match the length of gc.get_objects(gen). I expect both would give me the matched number of objects tracked by each generation in GC. I’m using python3.9 build from source with configure --enable-optimizations. For python3.10, it gives me different values but the unmatched patterns still exist. I’m I missing something here? Thanks for any explanation.

tjreedy · October 9, 2023, 11:24pm

gc.get_count()
Return the current collection counts as a tuple of (count0, count1, count2).
gc.get_objects(generation=None)
Returns a list of all objects tracked by the collector,

I presume that ‘collection counts’ is number of objects that have been or will be collected (for instance by an immediate gc.collect), which is much smaller than objects tracked for possible collection.

tim.one · October 10, 2023, 3:29am

get_count() doesn’t have anything to do with numbers of objects. It has to do with the number of garbage collections that still need to be done to meet the thresholds returned by get_threshold():

>>> import gc
>>> gc.get_count()
(30, 2, 1)
>>> gc.get_threshold()
(700, 10, 10)
>>> gc.collect()
0
>>> gc.get_count()
(22, 0, 0)

As shown, each value in a get_count() tuple is typically no larger than the corresponding value in a get_threshold() tuple, and the values get_count() returns typically fall after a collection occurs. They’re generally not in any sense counting objects, but are instead saying how many collections of a generation have occurred since the last time that generation was collected. Although, yes, the first is giving the excess of the number of object allocations over deallocations (due to reference counting) since the last time gc ran. When that excess exceeds get_threshold()[0] (700 above), that triggers a gc run.

What can you do with this info? Beats me . I’ve never had a real use for it.

DanielLee343 · October 10, 2023, 5:06am

Thanks for the clarification! I’m actually hacking CPython code to track all PyObjects and see how I can understand their access pattern. To do so I have to get PyObjects *op, with the corresponding size in bytes. It seems gc.get_objects() already gives me all GC-traced container objects in this case, and gc.get_count() sounds less interesting. I got a follow up question here.

Do you think there exists any relationship between PyObjects access frequency (ie, hotness), with PyObject lifetime (ie, in which generation). In other words, can I assume objects in older generation are get accessed less frequent than objects in gen 0? Or it’s the opposite. Or maybe there’s not correlation.

tjreedy · October 10, 2023, 7:18pm

The currect doc for get_count is inadequate. @DanielLee343 Could you open a doc issue with a suggestion for an improved sentence, based on Tim’s explanation?