According to the docs, gc.freeze() will avoid “unnecessary copy-on-write in child processes will maximize memory sharing and reduce overall memory usage.”
But I don’t see how that could be true. As soon as another process so much as looks as a single object on a page, its reference count will be modified and copy-on-write will kick in.
The original issue claimed that for Instagram this improved memory sharing, but no numbers were given. Unless an application has huge numbers of classes and functions that are never used, I don’t see how this would help. If it does have that many unused functions and classes, then I feel gc.freeze() is the wrong solution for that problem.
Should we update the docs to say that gc.freeze() doesn’t work in general and that objects should be made immortal instead, which does make them COW safe.
AFAIR from the original discussion, having “many unused functions and classes” is exactly the use case. It’s easy to claim that gc.freeze is the wrong solution, but how would you fix it without rearchitecting your entire codebase into much smaller services?
As I recall, the idea was that a full (3 generations at the time, maybe different now) collection will modify absolutely every tracked object in existence, ragardless of whether user-level code has even looked at an object since the last time. This via mucking with bits in the gc header to keep track of whether or not all references to an object come from the generation being collected.
Moving possibly-never-referenced-by-“real-code”-in-the-child objects into the frozen “(non)generation” ensures that gc won’t modify them either. Note that while Python-level code will incref a referenced object, the internal object traversals within gc read up pointers from objects without changing the refcounts. No W, no COW then.
Rearchitecting isn’t even desirable for many applications. Large applications grow tons of conditional functionality. But any given piece of work they’re handling, they will rarely touch all of it. When you fork a worker to handle some set of work, the chances the set of work you process before recycling hits even 80% of the code paths is quite low. Doubly so in a dynamic language like Python.
Though immortalizing things is not exposed from a Python level API. We should point 3.12+ users there from the gc.freeze() docs as a potentially more powerful starting point. People who want this are more likely the types of users fine using C APIs and actually understanding internals…
gc.freeze() is a “because we could” within the existing design without breaking anything else feature. I expect it’ll become a legacy API no-op in the long run. It did its job and saved those who needed it a lot of compute $resources in the interim.
To illustrate Mark’s point, here’s a small reproducer that crashes my system. So far as I have found, the only value of gc.freeze is that it allows you to avoid the atexit.register(os._exit, 0) trick discussed in Instagram’s post here. I agree with Mark’s point, although my preference would be for gc.freeze’s behavior to be extended so that objects become immortal, rather than just updating the docs.
import gc
import os
from typing import Self
from dataclasses import dataclass
# This program will crash no matter whether FREEZE is False or True
FREEZE = True
# Each ListNode instance is 80 bytes, so the total process memory usage should
# be 320 MiB or so.
NODE_COUNT = 4 * 1024 * 1024
# On unpatched cpython, this program will OOM if not given at least
# 125 GiB (and change) to use.
CHILD_COUNT = 400
@dataclass(frozen=True)
class ListNode:
cdr: Self | None
BIG_DATA = None
for i in range(NODE_COUNT):
BIG_DATA = ListNode(BIG_DATA)
gc.collect()
if FREEZE:
gc.freeze()
r, w = os.pipe()
for i in range(CHILD_COUNT):
pid = os.fork()
if pid == 0:
break
else:
# Parent process
os.write(w, b"\x00" * CHILD_COUNT)
while True:
try:
os.wait()
except ChildProcessError:
break
raise SystemExit
# Child process
# Use a pipe to force all children to wait until we've finished forking
os.read(r, 1)
if not FREEZE:
# When gc.freeze is not called, a simple SystemExit will OOM, due to
# the GC that is performed during interpreter finalization
raise SystemExit
# With gc.freeze, we must read the data (incref'ing it)
# to produce an OOM
elem = BIG_DATA
while elem:
elem = elem.cdr
As expected, I was able to run this program successfully on my laptop (32GB of RAM) if I patched cpython to make frozen objects also immortal:
For what it’s worth, I’ve also seen advice around the Internet suggesting to call gc.freeze() for long-running processes, like API servers, right after start up, before serving the first request. The idea being that all of the initialized objects up to that point can be assumed to live for the entire life of the process, reducing the length of GC pauses.
Thanks for sharing that link. It is interesting to hear about how GC performance affects a real-world application. IMHO, gc.freeze() is not pointless but it only makes sense for some subset of applications or programs. The implementation of it is fairly simple and so I think the cost of having it maintained in CPython is worth it even though it is unlikely to be widely used. At least, for now. Some future versions of CPython might make gc.freeze() hard to support, for example.
I’ve been doing some thinking about how to further reduce the cyclic GC pause time for CPython. I have a prototype of a more incremental collector (it runs the “mark alive” phase of the collector incrementally, with low pauses, and then only does a full collection after the marking is done). Mark Shannon’s recent work on the collector should have also made the pauses shorter. You’ll need to wait for 3.14 to get most of those benefits.