Can we expose the head of gc generation0?
We have a Linux server providing services to multiple users, where a large number of PyObjects exist. We want to avoid the pauses caused by garbage collection that can lead to a poor user experience. Therefore, we have adopted a parallel GC collection method:
-
Fork a child process, and call
gc.collect
in the child process. Then send the results back to the main process. -
The main process cleanup these received garbages.
However, calling gc.collect
in the child process causes an increase in memory usage. This is because gc.collect
frequently modifies the reference counts of PyObjects and the GC generation lists, which triggers copy-on-write. Instagram has a blog post that explains this issue very well: Dismissing Python Garbage Collection at Instagram
So our second version of the implementation works as follows:
-
Fork a child process.
-
The child process traverses the GC generations and records the PyObjects into an
unordered_map
, using a separate attribute (e.g.,refcnt
) to record the reference count of each PyObject. -
Like what
gc.collect
does, perform asubtract_refs
on the map, ultimately identifying the unreachable objects, and send the results back to the main process. -
The main process performs the garbage cleanup.
This approach creates new lightweight data structures, instead of in-place modifying CPython data structures. The challenging part is how to access the head of the GC list. In Python 2.7, this is quite straightforward because we have:
extern PyGC_Head *_PyGC_generation0;
However, in Python 3, _PyGC_generation0
has been removed. We have to use very convoluted methods to access generation0
:
-
Call
PyObject_GC_New
to obtain a new PyObject. -
Call
PyObject_GC_Track
on this new PyObject. -
Return the next linked element of this new PyObject, which is
generation0
.
This method is a little tricky and relies on internal implementation. Therefore, can we add an exposed attribute in pycore_object.h
like this:
PyGC_Head *generation0;
...
generation0 = &interp->gc.young.head;
Thanks.