Caching methods causes instances to live forever

will use this class for the examples

import time
from functools import cache 


class Person:
    def __init__(self, name: str, age: int) -> None:
        self.name = name
        self.age = age

    @cache
    def think_about_math(self, n: int) -> int:
        time.sleep(n)
        return n*2

lets say I have a long term script that creates/deletes Person instances

def foo(count: int) -> None:
    for n in range(count):
        p = Person(name="bob", age=99)
        p.think_about_math(n)

foo(5)

it is expected that after the function returns, it deletes all the Person instances and therefor the cache is also deleted, but this is not the case.
if I execute this code

import gc

p = Person("bob", age=99)
p.think_about_math(2)
print(len(gc.get_referrers(p)))

2

there are 2 references to that object even though I created only 1 reference p, so when I drop p, the instance still be alive because it has another reference.

why?

all this time I was under the impression that the cache method creates a hash of the passed parameters (including self), and store it in a dict where the value is the original function result.

but turns our that that python stores the whole function parameters in a tuple (which is hashable), and it sets the tuple object itself as the key in the dict.
here is the function that creates the tuple to store in the cache dict:
https://github.com/python/cpython/blob/main/Lib/functools.py#L464

at the end the _make_key function returns

return _HashedSeq(key)

which is just an object that inherits from list but it implements __hash__, but our class reference is in there, in that list, thats why the object reference is jumped to 2, thats why the instance and the cache will never die.

it is laso happening here in the c module of functools
https://github.com/python/cpython/blob/main/Modules/_functoolsmodule.c#L858

the return value is a python object (in that case a tuple) that stores all our references

# sinppet
885  |  key = PyTuple_New(key_size);

why it returns a python object instead of the hash iteself, i donno, maybe someone can explain? should it be changed?

Would you prefer fake cache hits, where a person gets a previous person’s results just because they happen to have the same hash?

why? _make_key should return hash instead of a tuple, why create a new reference to the instance in a tuple, set the tuple as the key in the cache dict, the dict converts the tuple key to hash behind the scenes anyway

Yes, but dict hashes aren’t unique - hash collisions are expected and catered for in the dict implementation.

The example is incorrectly designed because the method body does not depend on self. Use staticmethod instead.

@staticmethod
@cache
def think_about_math(n: int) -> int:
    time.sleep(n)
    return n*2

Another incorrect design is decorating with cache instead of lru_cache. Applying cache to any function or method causes its inputs to live until cache_clear is called. The cache decorator translates to lru_cache(maxsize=None).

Linking follow-up GitHub issue for future context:

As a side note, this scenario is used in memray’s (a memory profiler) tutorial as a hard-to-find memory “leak”:

https://bloomberg.github.io/memray/tutorials/3.html

It is easy to find because cache explicitly says to accumulate entries without bound. This is easy to recognize with any callable, not just with methods. Everywhere it occurs, the inputs and outputs live until the cache is cleared.

Almost all other memory leaks are harder to find. Almost anything can create a reference to an object. At least cache is clear about what it does and it is easy to fix. Replace cache with lru_cache and an explicit size limit.

I didn’t mean to get in the discussion of whether or not it’s particularly easy or hard - my bad for saying that.

I do still think this behavior is unexpected. Having shown it to beginners, they expected a new cache object to be created for each instance. I think that’s more intuitive.

This is memray’s solution to the problem:

class Person:
    def __init__(self, name: str, age: int) -> None:
        self.name = name
        self.age = age
        self.think_about_math = cache(self._think_about_math)

    def _think_about_math(self, n: int) -> int:
        time.sleep(n)
        return n*2