Memoizing methods considered harmful

I recently discovered that memoizing a class’s methods using functools.cache in the obvious way has the unwanted side-effect that instances of that class will never be garbage-collected.

Here is a small example:

import functools
import objgraph   # pip install objgraph
import gc


class Foo:
    # This cache is held somewhere in global memory, and prevents Foo objects from being reaped
    @functools.cache
    def func(self, key):
        return key + key


class Bar:
    def __init__(self):
        # This cache dies when Bar dies, so it doesn't prevent reaping
        self._cache = {}

    def func(self, key):
        if key not in self._cache:
            self._cache[key] = key + key
        return self._cache[key]


for i in range(10):
    Foo().func(i)
    Bar().func(i)

gc.collect()
print(f"Foo: {objgraph.by_type('Foo')}")  # 10 objects
print(f"Bar: {objgraph.by_type('Bar')}")  #  0 objects

I have tested this in python 3.9 and 3.10 with identical results.

We figured this out at my $work when a “memory leak” was observed (behavior: memory grows without bound as Foo objects are created and destroyed; whether you want to call this a memory leak is debatable, but the end result is the same) and we eventually narrowed it down to the behavior of functools.cache in this case. In our case, the result was the vanilla straightforward result of a typical memory leak - eventually we ran out of memory and crashed. Other unintended effects could happen, such as not performing any finalization at the expected time, etc.

Now, to be fair, functools.cache does not advertise that it’s okay to decorate methods with it - it only uses the word “function” in its documentation. So maybe this use case was never expected to work.
But since methods are “just” a specific type of function (one that knows its associated self, IIUC), one could expect that it’s okay to use functools.cache on them, too.

So I would argue that the principal of least surprise, when a method is decorated with functools.cache, is that the cache’s lifespan should be the same as as the method’s lifespan, which the same as the object’s lifespan, and that adding a cache should not affect the object’s lifespan.

This behavior has been observed before, for example in this SO thread: caching - Python functools lru_cache with instance methods: release object - Stack Overflow .

I propose that functools.cache be amended so that when it decorates an instance method, it doesn’t have this behavior of altering the lifespan of that instance. How do people feel about that?

I have not attempted to create a solution yet. I’ve only looked briefly at the functools.cache code to see where the cache lives, which I kind of expected to be in some kind of global, but that does not seem to be the case in a straightforward way.

The cause seems clear to me: @functools.cache is caching a value to return for a specific combination of arguments to a function; the decorator is being applied to a function and not to a bound method; and self is an argument for a call to that function. Thus, the cache indirectly contains references to that instance, via the objects used for cache keys (to represent argument-combinations).

So I would argue that the principal of least surprise, when a method is decorated with functools.cache, is that the cache’s lifespan should be the same as as the method’s lifespan, which the same as the object’s lifespan, and that adding a cache should not affect the object’s lifespan.

This would make a little sense if the method (Foo().func) were being decorated, but it isn’t. The function (an attribute of the class: Foo.func) is what gets decorated. Methods can’t be decorated until they exist, and they don’t exist until they’re looked up on an instance (at which point they are created, via the descriptor protocol).

But aside from that, obviously the cache extends the lifetime of the objects used as other parameters for the function. Code like the following will “leak” in the same way:

import functools, itertools

@functools.cache
def func(x):
    return x

for x in itertools.count():
    func(x) # the cache will map an unbounded amount of integers to themselves

because the point of the cache is to remember the result of calling func with varying values of x, in case the same value is tried again later. The only reason this doesn’t register to you as a “leak” is because you intuitively expect func to be long-lived, and the cache to grow without limit (as functools.cache doesn’t implement a limit). But in your example code, Foo.func is not different; it’s just a function.

Given foo = Foo(), doing foo.func(i) in Python is not a single-step process. First foo.func is evaluated, which creates a new callable object every time this code runs, and then that resulting object is called. (The semantics of the call are to delegate to Foo.func - which is a perfectly ordinary function in 3.x - and insert foo at the beginning of the arguments, so that it can fill in the value for the self parameter.)

It is, of course, possible to cache a result of that method lookup:

>>> class Foo():
...     def __init__(self): print(self, 'was created') # for testing
...     def __del__(self): print(self, 'was GCd') # for testing
...     def func(self, v): return v
... 
>>> foo = Foo()
<__main__.Foo object at 0x7fc02112af10> was created
>>> foofunc = foo.func
>>> foofunc(1)
1
>>> del foo # `foofunc` is still usable;
>>> # it holds a reference to the Foo instance internally
>>> foofunc(2)
2
>>> del foofunc
<__main__.Foo object at 0x7fc02112af10> was GCd

And the cached foofunc can be procedurally decorated:

>>> foo = Foo()
<__main__.Foo object at 0x7fc021141f70> was created
>>> from functools import lru_cache
>>> foofunc = lru_cache(foo.func)
>>> foofunc(1)
1
>>> del foo
>>> foofunc(2)
2
>>> del foofunc
<__main__.Foo object at 0x7fc021141f70> was GCd

(I am using 3.8; the behaviour with functools.cache in newer versions should be the same.)

Of course, this creates a separate memoization cache associated with this specific bound method object, which has been explicitly remembered with its own name. It would not be useful, for example, to write code like functools.cache(foo.func)(1), because this would create a temporary cache for that call (along with the temporary bound-method object), call the method, store the result in the temporary cache, return the result, and then throw both the bound-method and the cache away. It would avoid leaking memory, but it would also defeat the purpose of caching.

The sort of behaviour you seem to want, does seem like it might be possible using weakrefs, though I imagine there would be significant overhead, and it also isn’t possible to create a weakref to built-in types like int AFAICT. They’re also just generally really obnoxious to work with.

Do not rely on garbage collection for finalization in Python. The reference-counted GC in CPython is only an implementation detail. with and finally exist for a reason.

It lives within the object created by the decorator, which is a wrapper for the original function (which will generally be stored as a __wrapped__ attribute). In CPython, the implementation appears to depend both on the Python version and on whether an optimized C implementation is available (or else a fallback is used). For example, it might return another function which has the cache as a closure, or the cache might be implemented by C internals not otherwise accessible from Python code.

(This is how it works conceptually and semantically, though the interpreter is free to optimize this down for the extremely common case where this “new callable object” is used just for the call and nothing more.)

2 Likes

@functools.cache only works on methods where the instance is hashable anyway – obviously, as it is simply caching a function, and all the arguments must be hashable.

Anyway, this is a limitation of caching for methods, and the issue with “memory leaks” is yet another limitation.

Just spitballing here, but it seems that a memoizer designed specifically for methods could be made to work here – it would keep a cache specifically for that instance, and it would then go away when the instance was no longer referenced. (probably a weakref to the instance).

Could this help with the hashability? probably not, but maybe? I know in my use cases, I often want to memoize a method on an instance that isn’t changing – and it’s kind of a pain to make it hashable – though how else to keep track of the changes??

Kinda makes we want a general purpose make_this_instance_imutable function of some sort.

functools has singledispatch and singledispatchmethod. I suspect we should have a cachemethod too, to avoid caching the instance. Or more likely, to use a weakreference for it.

In the meantime you can avoid unbounded caching by giving the cache a max size.

1 Like

Thanks for the reply.

I want to be a bit more clear about my intention. I’m not saying this is a bug in functools.cache, or that I want to understand why it’s is having this effect, or even necessarily (but possibly) that I want the behavior to change.

I’m saying that memoizing methods by functools.cache is an unexpectedly (at least to me) bad idea, for reasons that aren’t immediately obvious to a user.

1 Like

One approach that could work well would be to stick the cache inside self, like a “normal” user implementation inside the method would. Initialize the cache to an empty dictionary if it doesn’t exist yet. Then no weak references would be necessary.

In the meantime you can avoid unbounded caching by giving the cache a max size.

Not in my case. I’m not using caching to avoid computation cost, I’m using it to ensure that subsequent calls to the method return the same object as the first call. There’s no fixed maximum number of distinct calls that can be known in advance.

If you’re looking for advice, the simplest recommendation is to use lru_cache() instead of cache(). The latter is solely for cases where you really do want to have cache entries kept forever or until you call cache_clear(). Perhaps we should have called it infinitycache to be even more clear :wink:

This is not an informative or useful example because it doesn’t depend on self. The func should be either a plain function (outside of the class definition) or it should be a staticmethod:

class A:
    @staticmethod
    @cache
    def func(key):
        return key + key

As I explained in my previous comment, that’s not a feasible solution in this case because there’s no set limit to how many elements can be in the cache associated to a particular Foo object.

I do think it’s an informative example, because it shows what’s going on, using the simplest possible code. :person_shrugging:

Thanks Steven, that captures the intent nicely.

I had to take a day off from this topic because I’ve been really upset about the way it was handled in the bug tracker. I’m hoping it’s permissible to re-open discussion here about it.

So one of the participants closed the ticket, saying:

I’m going to close this for the time being. Shortly (not right now), I will kick off a new discussion with an outline of the perceived problem, various use cases, and what things can or should be done about it. I’ve been thinking about this for months and have been putting together an analysis.

I don’t think it’s helpful, or considerate, to shut down a conversation because you’ve “been thinking” about it and (I guess?) you don’t want other people talking about it instead of you.

If the intention is to discourage people from trying to collaborate, it’s working. :slightly_frowning_face: I thought the process was working fairly well, I was happy that I might get to help make a contribution, and happy that other people also seemed to see the need for something to help the situation. It seemed like a healthy discussion around different alternatives was starting to take place.

Is there a pecking order, or a chain-of-command here, that I’m not aware of? I’m confused about why one participant can shut the whole process down, but maybe if that person is “in charge of” this topic or something, it explains it? I didn’t think that was the case here, though.

The following was also stated:

Note the python-help post was not useful. The example was for something that didn’t use self at all and could have easily been cached as a staticmethod. Also, it focused on @cache which is explicitly documented to be unbounded. The tone was also presumptuous, aggressive, and mildly alarmist.

I certainly didn’t want to come across as “presumptuous, aggressive, and mildly alarmist” in my comments. But when I re-read the previous comments now, really trying to have an open mind, I still don’t think that’s a fair characterization. And I explained twice why the “unbounded” nature of the cache is not the relevant fact here, so I’m not sure why that’s being focused on again.

Finally, is it honestly helpful to the discussion to tell me that the post that started it is “not useful”? It certainly doesn’t seem very nice.

For the record, the original issue that I raised - that memoizing the results of instance methods can bite you in a serious and non-obvious way - was something that came up in the real world: I and several other people at my organization spent several days trying to analyze why a very large application (involving optimization engines, and threading, and database handles, and web service endpoints, and circular references for gc to handle) was growing without bound. After discovering that it was such an extremely simple cause compared to the complexities of the application, I thought it would be good to help future coders not fall into the same trap. So that’s why I’m here.

I will attempt to follow up on the more technical aspects in subsequent comments.

5 Likes

There exists linter that has a warning devoted to this specific behavior. flake8-bugbear has a rule for it.

This discussion let me learn of this and I did review my team’s codebase and found two places affected. I’ve updated both and added lint rule. For both, even though behavior makes sense for cache it wasn’t something I’d thought about at all.

Edit: I think a pure documentation note would be sufficient. If you know about it then it’s likely doable to adjust your code to have cache on different related function.

1 Like

@Kenahoo: Note, I don’t have any official role here, but I’ve been engaged with the Python community a LONG time, so I’ll try to smooth things over:

Raymond Hettinger is a core dev, a long time contributor to Python, and very well known and respected in the community. Which you may already know. What you may not know is that while maybe not “official”, developers that contributed particular parts of the stdlib and are continuing to maintain it are, in an informal way, “in charge” of that part of Python. Raymond contributed most of itertools, and I’m pretty sure is the original author of the functools caching functions – so yes, he’s “in charge” of this topic.

I do think there’s been unfortunate language used, but we all have to be very careful not to get offended by tone in internet technical discussions. For my part, I went to take a look at the issue, and was pretty surprised, and dismayed when I saw it had already been closed. However, when I read the closing note, I thought it was fine. I read it as:

“This is an important topic to discuss, and perhaps do something about, but as I was already thinking about it, I want to pause the discussion until I can gather my thoughts and put it all down in an organised way, and then resume the discussion”

Note that that was in an gitHub issue for cPython – The core devs control that forum, and I think it’s intended for discussion around code that’s already in or ready to go in to the code-base.

Which doesn’t mean you aren’t free to discuss this (and other) topics freely here – as long as we’re all respectful.

I also think the response to your simple example was harsh – it was certainly clear to me that it was a simple-as-you-can get quicky example – not intended to really demonstrate the core issue, but that’s me.

Oh – the “considered harmful” in the title might have been interpreted as alarmist – I read it as a playful take.

2 Likes

Let me see if I can address this concern using a different kind of example. I’m invested in this because it’s actually quite close to the real-world code that inspired this post in the first place.

Consider the following variation:

import random
class Foo:
    @cache
    def func(self, key):
        """ Assign each key to a random number, unchanging for this instance """
        return random.random()

Like the original example, it doesn’t look like it uses self - this one doesn’t even look like it uses key. The intention is to be 100% equivalent to the following:

class Bar:
    def func(self, key):
        """ Assign each key to a random number, unchanging for this instance """
        if not hasattr(self, '_cache'):
            setattr(self, '_cache', {})
        if key not in self._cache:
            self._cache[key] = random.random()
        return self._cache[key]

For an instance method, I would argue that this type of caching is what “most people” (probably impossible to measure) would expect it to do. Perhaps I’m biasing myself because this is the use case I’m coming from - but I’d say at the very least this is a really common pattern.

Notice that this type of caching is not necessarily just for saving computational cycles - it’s for guaranteeing the contract that given a class instance and a key, the same result will always be returned. I point that out because I feel that some of the conversation assumes a position of “what’s the big deal, you can always just expire objects to prevent memory leaks”. In this case, no, you can’t. That would break the method.

Is this a legitimate use of caching? I say it is, for the same reason it’s a legitimate use case of a dictionary.

Does functools ever declare anything like “caching is only for saving computational cycles, not making interface guarantees, be warned that anything in your caches can be evicted without notice”? Not that I can find. @functools.cache says that it’s “a thin wrapper around a dictionary lookup for the function arguments”, which I take to mean any valid use case for a dictionary is a valid use case for @functools.cache.

Finally, I also assert that, specifically because of the caching (using either what I’m calling the “harmful” behavior WRT methods, or the behavior I’m suggesting would be helpful, it doesn’t matter for this demonstration), the result of my original example actually does depend on self, as the following shows:

class Baz:
    @functools.cache
    def func(self, key):
        return key + key

print(Baz().func((1, 2)) is Baz().func((1, 2)))  # False
b = Baz()
print(b.func((1, 2)) is b.func((1, 2)))  # True
2 Likes

Thanks Christopher, I appreciate the response, it’s helpful. I try not to get so bothered by these kinds of interactions, but this did really weigh me down.

1 Like

Sure – but in a way that matters? checking object identity is a special case – the results would be equal, yes?

If the point you are making is that the cache is getting filled up, yes, it sure is – which is why folks have been saying that perhaps there should be a method-specific cache. But in this case, why not use a staticmethod?

In [17]:  class Baz2:
    ...:     @staticmethod
    ...:     @functools.cache
    ...:     def func(key):
    ...:        return key + key
    ...: 

In [18]: b = Baz2()

In [19]: b.func(4)
Out[19]: 8

In [20]: b2 = Baz2()

In [21]: b.func(4) is b2.func(4)
Out[21]: True

You’ve got two instances, but only one item in the cache.

If the cached method DOES depend on self, then you do need to have it cached including self anyway – and you’ll need a good hash function for your class as well.

But that’s not true, as the example demonstrates. self does not have to be hashable in order for the cached result to depend on it. If I stick a list or other unhashable object into self as a member, the Bar example caching continues to work exactly the same, no hashing of self required.

This is simply because instead of putting self in the cache, we’ve put the cache into self, avoiding the need for hashability of self, or weak references to self, or any of that stuff that doesn’t need to enter the picture.

You can simply do this because you’ve written a custom method and you know that self exists and that you can add a _cache attribute without issue.

functools.cache can’t make these same assumptions, it may be decorating a plain function call.

Even with a potential method_cache, the class may use __slots__ so it can’t add an arbitrary _cache attribute to allow it to store data on an instance or the name chosen for the _cache attribute may clash with something that already exists.

Making a version which doesn’t make any assumptions about slots or name clashes, similar to the current @cache design would require hashability for the dictionary key and weakref to prevent the reference as a dictionary key from keeping the object around.

With the current @cache decorator you can quite easily break it by adding an __eq__ method so instances are no longer hashable.

class Baz:
    @functools.cache
    def func(self, key):
        return key + key

    def __eq__(self, other):
        return self is other

Baz().func((1, 2))
TypeError: unhashable type: 'Baz'

You can do something similar by using a closure that is probably more straight forward anyway as it won’t require you to have a hashable self.

class A:
    @cached_property
    def func(self):

        @cache
        def closure(k):
            #your normal code goes here
            return self.someproperty + k
    
        return closure

You MAY need to wrap the instance with a weakref which wont work all the time but should at least work most of the time. I’m not 100% sure the closure will get garbage collected without the weakref because of the circular reference.

class A:
    @cached_property
    def func(instance):
        inst = ref(instance)

        @cache
        def closure(k):
            self = inst()
            if self is None:
                raise Exception("closure called after object was gc'd")
            #your normal code goes here
            return self.someproperty + k
    
        return closure

The function is then bound to the instance so the cache will share its lifespan with the instance.

I am unsure if this actually an adaptable pattern for a wrapper because it requires removing self from the function arguments.

1 Like