_lru_cache_wrapper should be optimized for argument-less functions

functools.(lru_)cache on argument-less functions is a convenient tool to create lazy globals when “enforced” singleton behavior is undesirable (which I think is most times) or difficult (a value is cached, not an intrinsic type, adding a wrapper singleton just for that purpose is generally overkill if it’s desirable at all).

However both and especially functools.lru_cache are quite sub-optimal: by definition 0-argument functions can only have a single cache entry at the key (), but by default lru_cache will do the whole LRU dance (including locking) on each access, cache (/ lru_cache(None) skips the LRU part but still has a fair amount of overhead. And this does not seem intuitive, as I’ve seen people use lru_cache(1) on arguments-less functions, which could not be less useful.

I think that if user_function takes no arguments and the maxsize is non-zero, it should avoid all mapping interactions entirely, it should just need to check if it has been initialised (or reset), and increment the relevant counter.

6 Likes

Yes, that seems useful! I’d encourage you to implement your suggestion, run some benchmarks to demonstrate the speedup, and submit a PR.

5 Likes

I think that if user_function takes no arguments and the maxsize is non-zero, it should avoid all mapping interactions entirely

How can you reliably detect the number of arguments that user_function accepts?

That seems like a different, but related, function. Maybe it should be @memoize ?

2 Likes

@Rosuav Do you mean a decorator just for the special case of nullary functions? Then @memoize would be a misnomer.

Yes

Sure. I’m not sure what WOULD be a good name, but I don’t think this is really an optimization of lru_cache, it’s separate enough to be its own thing.

FWIW, we have a @cache_once helper for that exact purpose at work. I agree that a dedicated function would better that (ab)using lru_cache!

1 Like

Timings do not show “a fair amount of overhead”.

def f():
    pass

@cache
def g():
    pass

Calling f in timeit gives 21.4 nsec per loop. Calling g gives 19.7 nsec per loop. The cached version is already cheaper than calling an empty function. There’s not much fat to cut-off here; this is already very cheap.

Also, I’m unclear why this would ever matter in a real use case. Have you seen tight loops that call a constant function over and over again, but don’t do anything interesting with the result? Shouldn’t the code hoist the constant function call out of the loop. I can’t think of any circumstances where cutting this from 21 nsec to perhaps 15 nsec would make a difference to the calling code.

3 Likes

With what Python version, and exactly how did you measure? For me, f is much faster than g.

Simple way:

f 31.2 ns
g 48.0 ns
f 30.5 ns
g 47.3 ns
f 30.0 ns
g 47.7 ns

Python: 3.13.0 (main, Nov  9 2025, 11:53:23) [GCC 15.2.1 20250813]
code for above
from functools import cache
import timeit, sys

def f():
    pass

@cache
def g():
    pass

for h in [f, g] * 3:
    t = min(timeit.repeat(h)) * 1e3
    print(h.__name__, f'{t:.1f} ns')

print('\nPython:', sys.version)

Attempt This Online!

Reduced timeit-overhead:

f 23.1 ns
g 40.4 ns
f 23.3 ns
g 40.0 ns
f 22.8 ns
g 38.9 ns

Python: 3.13.0 (main, Nov  9 2025, 11:53:23) [GCC 15.2.1 20250813]
code for above
from functools import cache
import timeit, sys

def f():
    pass

@cache
def g():
    pass

for h in 'fg' * 3:
    t = min(timeit.repeat(f'{h}();' * 25, 'from __main__ import f, g', number=40000)) * 1e3
    print(h, f'{t:.1f} ns')

print('\nPython:', sys.version)

Attempt This Online!