We have a get_stats method in the GC module. It works well, but it has a few drawbacks for my eyes.
- It displays only a small part of the statistics that have been collected.
- To get statistics, it should go through the chain of calls from Python to C and back. For each interpreter.
- The application that is being monitored should explicitly call the method.
I propose adding a new internal module to read the gathered statistics from GC.
It is supposed to work out-of-process, and this will cover the last two points from above.
For the first point from above, I propose to extend number of statistics that we gather and expose. For example, the number of visited objects, the number of reachable objects from roots and etc. We can, of course, improve the results from get_stats with new data, but that’s not part of the current proposal.
It will use _PyDebugOffset and do minimal work to convert GC’s statistics to Python objects. These objects could then be processed by an external tool of your choice.
I have a working prototype.
Example of data visualized with Perfetto UI:
I have added the following new statistics to the prototype:
- heap_size - the number of live objects.
- work_to_do - an internal value from incremental GC. It controls the number of objects that are processed by one increment.
- object_visits - the number of objects that were visited during GC work.
- objects_transitively_reachable - the number of objects that were reachable from the global and local roots when incremental GC is called.
- objects_not_transitively_reachable - the number of objects that were reachable while increment was constructed.
