will use this class for the examples
import time
from functools import cache
class Person:
def __init__(self, name: str, age: int) -> None:
self.name = name
self.age = age
@cache
def think_about_math(self, n: int) -> int:
time.sleep(n)
return n*2
lets say I have a long term script that creates/deletes Person
instances
def foo(count: int) -> None:
for n in range(count):
p = Person(name="bob", age=99)
p.think_about_math(n)
foo(5)
it is expected that after the function returns, it deletes all the Person
instances and therefor the cache is also deleted, but this is not the case.
if I execute this code
import gc p = Person("bob", age=99) p.think_about_math(2) print(len(gc.get_referrers(p)))
2
there are 2 references to that object even though I created only 1 reference p
, so when I drop p
, the instance still be alive because it has another reference.
why?
all this time I was under the impression that the cache
method creates a hash of the passed parameters (including self), and store it in a dict
where the value is the original function result.
but turns our that that python stores the whole function parameters in a tuple (which is hashable), and it sets the tuple object itself as the key in the dict.
here is the function that creates the tuple to store in the cache dict:
https://github.com/python/cpython/blob/main/Lib/functools.py#L464
at the end the _make_key
function returns
return _HashedSeq(key)
which is just an object that inherits from list but it implements __hash__
, but our class reference is in there, in that list, thats why the object reference is jumped to 2, thats why the instance and the cache will never die.
it is laso happening here in the c
module of functools
https://github.com/python/cpython/blob/main/Modules/_functoolsmodule.c#L858
the return value is a python object (in that case a tuple) that stores all our references
# sinppet
885 | key = PyTuple_New(key_size);
why it returns a python object instead of the hash iteself, i donno, maybe someone can explain? should it be changed?