Switching from refcounting to libgc

nas · April 18, 2020, 11:17pm

This is an old thread but contains a lot of interesting discussion. I had mentioned this “crazy” idea:

maybe the gc.freeze() idea could be extended to disable reference counting on all objects found when it is called.

Something like that idea has now been implemented by some Instagram people:

https://bugs.python.org/issue40255

They don’t use memory bitmaps to disable reference counts. Instead they use the high bits of the refcnt field. I’ve been thinking about how that change could be extended to be more useful to general Python programs. In Instagram’s case, the big win comes from their application’s use of copy-on-write OS memory pages and avoidance of dirtying them. Most Python programs are not written that way and will not benefit from PR 19474 as it is.

My thought for today is could we change CPython to use a hybrid of mark-and-sweep (M&S) and automatic-reference-counting (ARC) for memory management? I found this paper today:

A Unified Theory of Garbage Collection:
https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon04Unified.pdf

From the abstract:

Using this framework, we show that all high-performance collectors (for example, deferred reference counting and generational collection) are in fact hybrids of tracing and reference counting.

The cyclic GC that Python uses is like a kind of M&S collector. We are close to the hybrid approach but we are paying for both the ARC and the M&S. Can we extend PR 19474 to get better performance for the “average” Python program? My idea is to use ARC on the young generation of objects and M&S for the old. Maybe avoid more of the ARC overhead on old objects?

Leaving out a bunch of details, it would be like calling the PR 19474 immortalize_heap() at the end of a full cyclic GC pass. Or, alternatively, having the M&S collector set the bit on the refcnt field to disable ARC as it is working. I.e. everything the M&S collector sees is considered “old”.

What is involved with making that work? If we use libgc, as per the original proof-of-concept by Thomas, not too much. As I recall, libgc already supports a hybrid approach in that you can manually call free(). We need to replace the current cyclic GC with an actual mark-and-sweep collector that can find all “roots”. The libgc library has magic (e.g. C stack crawling, heuristic pointer detection) to do that. Or, we could enhance the current cyclic GC to make it into a M&S collector. That would involve making sure every container object has an accurate tp_traverse method, dealing with the C stack somehow, marking all global PyObject pointers. Tedious but doable.

It would be easy to support the HPy handle API with this approach. Use another bit in the refcnt field to mark objects that have an “open” handle to them. Then, the M&S would always consider them to be alive as long as they are open. Alternatively, can add them to a handle data structure.

A possible alternative to libgc would be to adapt the Apple “libauto” library:

https://opensource.apple.com/source/libauto/libauto-77.1/README.html
https://opensource.apple.com/source/libauto/libauto-77.1/

I didn’t study the code too closely but it looks quite clean and fairly portable. It has support for write memory barriers which would be a performance enhancement for the M&S collection. BTW, it is interesting that Apple moved away from M&S collection and the newer versions of Objective-C use pure ARC. At least, that’s my understanding.