Adding a method caching decorator to functools

In my company’s project, we needed isolated caches for each instance. After researching existing implementations, we adopted cached_method.

Implementation: cached_method/cached_method.py at main · martenlienen/cached_method · GitHub

The cached_method behaves as follows:

  1. The decorated method generates a cache instance upon the first access, similar to a cached property.

  2. The cache instance is created by wrapping the bound method with lru_cache, meaning that self is not included in the cache key.

  3. When wrapping the bound method with lru_cache, weak references are used to prevent circular references. However, even without using weak references, memory leaks do not occur.

Our project is a web API server that utilizes multithreading. When an instance occupied by a thread uses cached_method, its cache is completely isolated from other threads too. This design is simple and robust, and since the cache is released immediately when the instance is released, there is little concern about worsening memory pressure.

3 Likes

I’d looked at doing something similar, there are a couple of downsides for something intended as general use.

  1. Instances become unpicklable as the lru_cache wrapped function stored on the instance isn’t picklable. This can be worked around, but I’m not sure if there’s a way to preserve the cache while doing so?

  2. This implementation doesn’t work for slotted classes. This is the same as cached_property but is something people have already brought up as desirable.

1 Like

Reading various comments about the implications of a shared cache, with weakrefs, the one thing that troubles me is that when an object is GCed and the weakref is removed, other objects matching that cache entry would experience a cache miss. So! Although I think that is solvable, let’s think about how we’d do a method cache per instance.

I agree that this is the simplest, easiest implementation. It also requires that we define what attribute name gets used, at least by default, and how it gets initialized.

It’s actually pretty easy to use one dict as a per-instance cache, if you make the method names part of the cache keys.

Picking a single attribute name for all method caches – e.g., self.__method_cache__ – makes things easier for explicit __slots__ usage as well.

My main question here would be when do we expect the cache to be created?

I was considering suggesting a descriptor – and I see that’s how the cached_method implementation @methane linked works. It’s very similar to what cached_property does as well.[1]

If we have to have a lazily evaluated descriptor which updates instance.__dict__ in order to initialize the cache, why wouldn’t we want to use that opportunity to replace the decorated method with a caching variant, just like cached_property and the suggested cached_method implementation do? I can’t see a significant advantage to having a separate attribute per-instance which stores the same information and needs the same late-initialization behavior.


  1. I had no idea about this particular implementation. I was wondering if it was possible to do something like it, and evidently it is! ↩︎

2 Likes

The main problem with caching on __dict__ is that it won’t work with __slots__ (even if '__dict__' is included), since __slots__ creates its own descriptor:

>>> class C:
...     __slots__ = ('f',)
...     def f(self): ...
...     
Traceback (most recent call last):
  File "<python-input-18>", line 1, in <module>
    class C:
        __slots__ = ('f',)
        def f(self): ...
ValueError: 'f' in __slots__ conflicts with class variable

Granted, this is already the case for cached_property, but I’m not sure we’d want to impose the same restriction on cached_method.

(In fact, if cached_method does work with slots, we could recommend composing it with @property as a workaround to cached_property.)

If __dict__ is included then you wouldn’t create the slot for ‘f’. You can do this for cached_property already, although you lose the benefit of not being able to assign arbitrary attributes.

>>> from functools import cached_property
... 
... class C:
...     __slots__ = ("__dict__",)
...     @cached_property
...     def f(self):
...         return 42
...         
>>> c = C()
>>> c.f
42
>>> c.__dict__
{'f': 42}
>>> c.n = "Dent"
>>> c.__dict__
{'f': 42, 'n': 'Dent'}

If you’re constructing the class with something like @dataclass it’s possible to instead wrap the slot descriptor in another descriptor and avoid the need for __dict__, I do this with my classbuilder although unfortunately attribute access is then slower than just creating __dict__[1].

>>> from functools import cached_property
>>> from ducktools.classbuilder.prefab import Prefab
>>> 
>>> class C(Prefab):
...     @cached_property
...     def f(self):
...         return 42
...   
>>> c = C()
>>> c.f
42
>>> C.__slots__
{'f': None}
>>> c.__dict__
Traceback (most recent call last):
  File "<python-input-8>", line 1, in <module>
    c.__dict__
AttributeError: 'C' object has no attribute '__dict__'. Did you mean: '__dir__'?

  1. Also slower than the workaround __getattr__ logic attrs currently uses but it works with super(). ↩︎

If there is a desire to add this, then I’d like to suggest my implementation from this cpython issue, which notably:

  • works on self types without a __dict__;

  • works on self types that are not hashable;

  • should be reasonably efficient as it avoids re-hashing the runtime arguments;

  • does not rely on descriptors, so it will cache even when called via SomeClass.some_cached_method(self, *args, **kwargs), not just self.some_cached_method(*args, **kwargs);

  • is static typing compatible.

(On the conversation here: I’m -1 on implementations that mutate self, which is an approach that I find interacts very poorly with dataclasses, immutable types, pickling, etc.)

3 Likes

This is pretty neat, the callback on the weakref as part of the value was something I hadn’t considered[1].

I’d suggest reusing functools._make_key over having different caching logic between this and @cache both for performance and consistency.

I understand the goal is to improve cache hits if the same arguments are given in different forms but from brief testing it seems like the additional inspect.signature based argument management for _CacheKey makes both attribute retrieval and class creation slower.

Timing 'WeakIdDict with _CacheKey':
Class creation for 100000 classes: 0.887s
Attribute retrieval 100000 times: 0.210s

Timing 'WeakIdDict with _make_key':
Class creation for 100000 classes: 0.371s
Attribute retrieval 100000 times: 0.033s

That’s not really an argument against descriptors, just an argument that descriptors should also handle this correctly.


  1. To be fair I’d forgotten the callback existed, I don’t tend to use weakref often. ↩︎

1 Like

Thanks, I’m glad you like it!

No strong feelings on the points you raise! E.g. agreed that if this was merged into functools then it would make sense to be consistent on key creation.

Okay, I just got a couple of spare hours to play around with the available prior art. Again, thanks all for the various pointers!

@patrick-kidger’s version was a good place for me to start hacking around, but I think inspect usage is a no-go: as @DavidCEllis noted it’s a performance killer. My current preference, of the various versions I’ve tried, would be for the following:

def _make_callback(cache_dict, id_key):
    def callback(ref):
        cache_dict.pop(id_key)
    return callback


def _make_cached_func(ref, unbound_method, maxsize, typed):
    @functools.lru_cache(maxsize, typed)
    def wrapped(*args, **kwargs):
        return unbound_method(ref(), *args, **kwargs)
    return wrapped


def cached_method(maxsize=None, typed=False):
    def decorator(func):
        per_instance_cache = {}

        @functools.wraps(func)
        def wrapper(self, *args, **kwargs):
            self_id = id(self)

            try:
                ref, func_using_ref = per_instance_cache[self_id]
            except KeyError:
                ref = weakref.ref(self, _make_callback(per_instance_cache, self_id))
                func_using_ref = _make_cached_func(ref, func, maxsize, typed)
                per_instance_cache[self_id] = ref, func_using_ref

            return func_using_ref(*args, **kwargs)

        return wrapper
    return decorator

(I’ve omitted types, as a matter for typeshed rather than stdlib.)

This works on slotted classes just fine, as long as they are weak-reference-able. Hashability is not required, and the cache does not get pickled (I consider this good).

Because this operates on the unbound function, not the bound method, it’s simpler than several other versions. Keying by id but mapping this back to a weakref ensures that each instance gets a unique cache – having had additional time to consider it, that seems to me like necessary behavior to avoid confusing people.

There are only two issues I could find. (1) it’s a little slower than I’d like, taking around 100ns of overhead on my machine to use a cached_method vs a similar lru_cache-wrapped function. (2) we don’t have access to cache_clear() or cache_info() for the method – providing this would require a descriptor.[1]


The biggest missing ingredient here is buy-in from the core-devs who look after functools.

After or during PyCon, I may try to figure out how to get a clearer idea of what they would consider the minimum requirements for an implementation of functools.cached_method.


  1. I’m leaving it out for a start, with the thought that I could add it later if the rest of this seems solid. ↩︎

3 Likes

Out of curiosity, why this rather than either a lambda function or defining the callback where it’s used? I only see _make_callback being used in one place, unless there’s somewhere that I’m missing.

I really like this version. For whatever it’s worth, +1 from me on this.

1 Like

A lambda works for this too – I’ll refrain from changing it this minute but I have no attachment to that detail. :slightly_smiling_face:

I’ve hurt myself a couple of times on the sharp corners around lambdas and scoping/name binding. So I started to avoid them by habit for deferred work like callbacks.

1 Like

Yeah, that’s fair! Having a def function right at the point of definition is also valid, if you want an alternative. But whichever way will work.

1 Like

A much simpler (and arguably more elegant) approach that doesn’t require the instance to be weakref-able or hashable is to simply make the id of the instance and the arguments the key to the cache.

Here’s a minimal implementation for easy illustration:

def cached_method(func):
    cache = {}
    def wrapper(self, *args, **kwargs):
        key = id(self), args, tuple(kwargs.items())
        if key in cache:
            return cache[key]
        cache[key] = value = func(self, *args, **kwargs)
        return value
    return wrapper

so that:

class Foo:
    @cached_method
    def bar(self):
        print(f'called for {id(self)}')
        return id(self)

foo1 = Foo()
foo2 = Foo()
print(foo1.bar())
print(foo1.bar())
print(foo2.bar())
print(foo2.bar())

outputs:

called for 1983853619456
1983853619456
1983853619456
called for 1983854053008
1983854053008
1983854053008

Replace {} with an LRU dict and a deleted instance would simply age out of the cache organically without the overhead of any reference management. The only downside is that there has to be a maximum size to the LRU dict for the automatic cleanup to happen.

IDs can be reused, entries need to be removed when the object is deleted. Otherwise a new instance can retrieve cached entries from a previous instance.

With your implementation I get this for example:

>>> @dataclass
... class Example:
...     a: int
...     @cached_method
...     def get_a(self):
...         return self.a
...         
>>> def get_ex():
...     for i in range(1000):
...         ex = Example(i)
...         ex.get_a()
...     return ex
...     
>>> ex = get_ex()
>>> ex.get_a()
27
>>> ex.a
999

5 Likes

Ah good point. Never mind the naive approach then.

IMO it’s always good to at least explore the naive approach. Every piece of complexity needs to be justifiable, usually as “the naive approach fails under this circumstance”.

4 Likes