Memoizing methods considered harmful

PythonCHB · March 15, 2023, 5:09pm

What I meant by “depends on self” is that attributes of self are used in the function – and in that case, you need a way to check whether those attributes are the same in order to use the cached result. Usually that’s the hash function. If your class is immutable, I guess it doesn’t matter, but that’s a special case that doesn’t work in general.

class Mutable:
    def __init__(self, x):
        self.x = x
    @functools.cache
    def method(self, val):
        return self.x + val

m = Mutable(5)

assert m.method(5) == m.x + 5

m.x = 10

assert m.method(5) == m.x + 5

...
AssertionError                            Traceback (most recent call last)
~/Junk/method_cache.py in <module>
     19 m.x = 10
     20 
---> 21 assert m.method(5) == m.x + 5
     22 
     23 

AssertionError:

If the method doesn’t depend on any of the attributes in self, then why not make it a staticmethod?

NOTE: this actually brings up another gotcha with caching methods – if a method DOES depend on attributes of self, and an eq was not defined, then you won’t get any errors, but it will very much break when the instance is mutated

Probably deserves a note in the docs.

Looking back – this is a special case – it’s pretty rare in Python to want the object, rather than the same value – and I think the general use case of caching is about values. So perhaps just write your own decorator to do this.

Another note: cache works by putting the arguments in a dict – which requires hashability – works great for immutable types: numbers, strings, etc. But not so much for others.

I suppose you could make a cache that was based on values – caching maybe using a stringifaction of the object, rather than the object itself as your key. And then the cache wouldn’t mess with garbage collection either.

It would only work for things that easily stringify, but it could be documented to work only for objects with an appropriate __repr__

Here’s a naive, quickie implkimentation – it works for at least some functions / values

def reprcache(func):
    cache = {}

    def __wrapped__(*args):
        key = tuple(repr(arg) for arg in args)
        if key in cache:
            result = cache[key]
        else:
            result = func(*args)
            cache[key] = result
        return result
    return __wrapped__


@reprcache
def a_func(x, y, z):
    time.sleep(1)  # to simulate slow function
    return x + y + z

@reprcache
def a_mut_func(lst, x):
    time.sleep(1)  # to simulate slow function
    return [x + i for i in lst]

PythonCHB · March 15, 2023, 5:13pm

well, you could simiply say “you can’t do that” – similar to functools.cached_property:
“”"
Also, this decorator requires that the __dict__ attribute on each instance be a mutable mapping. This means it will not work with some types, such as metaclasses (since the __dict__ attributes on type instances are read-only proxies for the class namespace), and those that specify __slots__ without including __dict__ as one of the defined slots (as such classes don’t provide a __dict__ attribute at all).
“”"

Well, you could use somethign like __special__name_for_cache__
which would be VERY unlikely to clash – or be defensive on the first all and raise an exception if the name is already there. Or just it break if it’s there – that would show up pretty fast in tests

kknechtel · March 22, 2023, 6:54pm

… but the actual cache implementation respects that contract, because self will be part of the composite key used for the unified cache. I don’t see an issue here.

Nothing is getting evicted from any caches here. That requires functools.lru_cache with a non-None cache size specified. And for that case, the documentation clearly states not only that elements will be evicted from the cache, but the eviction policy. (As a reminder, the lru_ part stands for “least recently used”.)

Your interpretation is correct as far as I can tell.

If you care about object identity, sure. But the claim is that the calculation in the example doesn’t depend on self - which it clearly doesn’t - and Raymond’s point was that this means it should not be an instance method in the first place. You would have the same object-identity issue regardless of whether the cache were per-instance or unified, and would have it with a function independent of the class: after all, Baz().func((1, 2)) will eventually resolve to Baz.func(Baz(), (1, 2)), and clearly the cache cannot just store a precomputed result for that all such calls unless it is established that all the results of calling Baz() repeatedly are equal. IOW, what Chris told you.

Hashable does not mean the same thing as immutable. First off, true class immutability isn’t really possible; protection schemes like underscore naming and __getattr__/__getattribute__ hacks can be worked around. More to the point, object.__hash__ exists, and thus class instances are hashable by default. When Python hashes an object it does not attempt any kind of mutability check (there is not any sane way to code this, and the insane ways would probably be imperfect and very slow). It checks for a __hash__ method. That’s it. Simply doing self.mylist = [] is completely irrelevant here.

If you made, say, a class that inherits from a non-hashable built-in type, you would hit the limitation:

>>> from functools import lru_cache
>>> class example(list):
...     @lru_cache(None)
...     def computation(self):
...         return 1
... 
>>> example().computation()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'example'

As David points out, it will also break if you define __eq__ without defining __hash__, since object.__hash__ will complain about that. And the lack of implementations of __eq__ and __hash__ is why you still get duplicate cache entries - Python can’t determine that they’re duplicates, because they hash differently (based on object identity) and wouldn’t compare equal.

Incorrect on both counts. We’ve put the cache into a closure around the function (created by a decorator), which in turn is in the class (not the instances). We’ve also put self into an object that is used as a key for the cache.

joyful-giraffe · September 28, 2023, 10:33am

Here is a new answer to the SO question above that fixes this issue: caching - Python functools lru_cache with instance methods: release object - Stack Overflow.