It seems to me that `gc.freeze()` is pointless and the documentation misleading

According to the docs, gc.freeze() will avoid “unnecessary copy-on-write in child processes will maximize memory sharing and reduce overall memory usage.”

But I don’t see how that could be true. As soon as another process so much as looks as a single object on a page, its reference count will be modified and copy-on-write will kick in.

The original issue claimed that for Instagram this improved memory sharing, but no numbers were given. Unless an application has huge numbers of classes and functions that are never used, I don’t see how this would help. If it does have that many unused functions and classes, then I feel gc.freeze() is the wrong solution for that problem.

Should we update the docs to say that gc.freeze() doesn’t work in general and that objects should be made immortal instead, which does make them COW safe.

3 Likes

Does immortalization also untrack objects? Otherwise, it seems like both immortalizing and freezing the heap is necessary to get the desired effect.

AFAIR from the original discussion, having “many unused functions and classes” is exactly the use case. It’s easy to claim that gc.freeze is the wrong solution, but how would you fix it without rearchitecting your entire codebase into much smaller services?

Is there a public Python API to do that?

2 Likes

As I recall, the idea was that a full (3 generations at the time, maybe different now) collection will modify absolutely every tracked object in existence, ragardless of whether user-level code has even looked at an object since the last time. This via mucking with bits in the gc header to keep track of whether or not all references to an object come from the generation being collected.

Moving possibly-never-referenced-by-“real-code”-in-the-child objects into the frozen “(non)generation” ensures that gc won’t modify them either. Note that while Python-level code will incref a referenced object, the internal object traversals within gc read up pointers from objects without changing the refcounts. No W, no COW then.

2 Likes

Rearchitecting isn’t even desirable for many applications. Large applications grow tons of conditional functionality. But any given piece of work they’re handling, they will rarely touch all of it. When you fork a worker to handle some set of work, the chances the set of work you process before recycling hits even 80% of the code paths is quite low. Doubly so in a dynamic language like Python.

Immortal objects are effectively the follow-on work. Introducing Immortal Objects for Python - Engineering at Meta

Though immortalizing things is not exposed from a Python level API. We should point 3.12+ users there from the gc.freeze() docs as a potentially more powerful starting point. People who want this are more likely the types of users fine using C APIs and actually understanding internals…

gc.freeze() is a “because we could” within the existing design without breaking anything else feature. I expect it’ll become a legacy API no-op in the long run. It did its job and saved those who needed it a lot of compute $resources in the interim.

6 Likes

Yes, at least since 3.13:

3 Likes