Switching from refcounting to libgc

(Thomas Wouters) #1

I’ve talked to a couple of people about this at PyCon2019, and there was more interest than I expected, so I figure I should just post this here:

I’ve (hackishly, as a proof of concept) replaced refcounting with libgc (aka BDW GC, aka Boehm GC, the canonical generic, conservative mark-sweep (non-copying, non-moving) GC):

I initially just disabled pymalloc and linked against a libgc with --enable-replace-malloc, which interposes malloc() and free(). I then commented out the free() call in _PyMem_RawFree, and added a libgc destructor (called when libgc thinks the object is no longer reachable) that warned when the object had a refcount that wasn’t 0. This showed me several things: libgc didn’t try to deallocate anything that shouldn’t have been deallocated, and (comparing against a regular build without pymalloc) the performance difference wasn’t really noticeable.

(Using libgc with --enable-replace-malloc means libgc tracks all allocations, not just ones made with _PyMem_RawMalloc. For libgc to work it needs to be able to see all memory that might potentially contain pointers to its allocations. Because with refcounting code can just store a PyObject * anywhere and INCREF it to make it not go away, using --enable-replace-malloc is the easy way to do that.)

I then disabled Py_INCREF and Py_DECREF, disabled the gc module (because libgc already takes care of circular references), and fixed weakrefs to work with libgc. This is the current state of the thing. It works well enough to run setup.py and much of the testsuite (and passes a whole bunch of tests), but it has a tendency to segfault randomly while running tests – probably because of not locking correctly in the right places. (I haven’t seen it segfault while running setup.py, so I’m pretty sure it’s to do with threads.) I have not measured the performance of this at all, yet.

Just to be clear: this doesn’t remove the GIL – it doesn’t even remove the ob_refcnt struct field – it just removes refcounting, which is the main (but not only) reason for the GIL. I don’t think this approach is source-level-compatible enough for the real world. It’s just an experiment, and I pushed the branch to GitHub primarily so Larry could look at it for his Gilectomy work.

For a more backward-compatible approach at introducing real GC, I have something in mind like this:

  • Introduce a new, completely separate PyGCObject API.
  • Leave PyObject to be refcounted as normal. Don’t change the struct at all.
  • Provide a proxy PyObject wrapper for PyGCObjects, so that anything taking (or returning) a PyGCObject can be called from PyObject-oriented code.
  • Provide a proxy PyGCObject wrapper for PyObjects, so that anything taking (or returning) a PyObject can be called from PyGCObject-oriented code.
  • Migrate CPython itself from PyObject to PyGCObject.
  • Only require the GIL for PyObject-touching code (and many fine-grained locks for PyGCObjects that need locking).

I’m not yet sure how feasible this approach is (quite apart from the 2x explosion of exposed APIs) – I have the beginnings of an experiment with this in a different branch, but I won’t know how well it works until I make at least dicts, strings, ints and ceval.c use PyGCObjects… I may get to that in the next couple of weeks.

(Antoine Pitrou) #2

Interesting. I’ll be curious to learn about your findings along the way.

(Neil Schemenauer) #3

Hi Thomas,

I did the same kind of experiment way back in 2000. There is still a patch on my web site for it. If I recall, performance was a fair bit worse and there was a few issues to work out.

After I did the Boehm GC experiment, I moved to the idea of cyclic GC. Tim Peters helped a lot of make the gcmodule.c code polished. The idea for the double-linked list design was apparently from Eric Tiedemann. I think I was working on similar GC ideas (based on Tim’s cyclops tool) but I was using hash tables or trees rather than linked links.

Even after these 19 years, I still wish that we could have a higher performance GC for Python. I don’t think it can be done without breaking the C-API. I feel like the overhead of the current reference counting plus the cycle GC is much higher than compared to other GC languages.