_lru_cache_wrapper should be optimized for argument-less functions

functools.(lru_)cache on argument-less functions is a convenient tool to create lazy globals when “enforced” singleton behavior is undesirable (which I think is most times) or difficult (a value is cached, not an intrinsic type, adding a wrapper singleton just for that purpose is generally overkill if it’s desirable at all).

However both and especially functools.lru_cache are quite sub-optimal: by definition 0-argument functions can only have a single cache entry at the key (), but by default lru_cache will do the whole LRU dance (including locking) on each access, cache (/ lru_cache(None) skips the LRU part but still has a fair amount of overhead. And this does not seem intuitive, as I’ve seen people use lru_cache(1) on arguments-less functions, which could not be less useful.

I think that if user_function takes no arguments and the maxsize is non-zero, it should avoid all mapping interactions entirely, it should just need to check if it has been initialised (or reset), and increment the relevant counter.

8 Likes

Yes, that seems useful! I’d encourage you to implement your suggestion, run some benchmarks to demonstrate the speedup, and submit a PR.

6 Likes

I think that if user_function takes no arguments and the maxsize is non-zero, it should avoid all mapping interactions entirely

How can you reliably detect the number of arguments that user_function accepts?

That seems like a different, but related, function. Maybe it should be @memoize ?

2 Likes

@Rosuav Do you mean a decorator just for the special case of nullary functions? Then @memoize would be a misnomer.

Yes

Sure. I’m not sure what WOULD be a good name, but I don’t think this is really an optimization of lru_cache, it’s separate enough to be its own thing.

FWIW, we have a @cache_once helper for that exact purpose at work. I agree that a dedicated function would better that (ab)using lru_cache!

1 Like

Timings do not show “a fair amount of overhead”.

def f():
    pass

@cache
def g():
    pass

Calling f in timeit gives 21.4 nsec per loop. Calling g gives 19.7 nsec per loop. The cached version is already cheaper than calling an empty function. There’s not much fat to cut-off here; this is already very cheap.

Also, I’m unclear why this would ever matter in a real use case. Have you seen tight loops that call a constant function over and over again, but don’t do anything interesting with the result? Shouldn’t the code hoist the constant function call out of the loop. I can’t think of any circumstances where cutting this from 21 nsec to perhaps 15 nsec would make a difference to the calling code.

3 Likes

With what Python version, and exactly how did you measure? For me, f is much faster than g.

Simple way:

f 31.2 ns
g 48.0 ns
f 30.5 ns
g 47.3 ns
f 30.0 ns
g 47.7 ns

Python: 3.13.0 (main, Nov  9 2025, 11:53:23) [GCC 15.2.1 20250813]
code for above
from functools import cache
import timeit, sys

def f():
    pass

@cache
def g():
    pass

for h in [f, g] * 3:
    t = min(timeit.repeat(h)) * 1e3
    print(h.__name__, f'{t:.1f} ns')

print('\nPython:', sys.version)

Attempt This Online!

Reduced timeit-overhead:

f 23.1 ns
g 40.4 ns
f 23.3 ns
g 40.0 ns
f 22.8 ns
g 38.9 ns

Python: 3.13.0 (main, Nov  9 2025, 11:53:23) [GCC 15.2.1 20250813]
code for above
from functools import cache
import timeit, sys

def f():
    pass

@cache
def g():
    pass

for h in 'fg' * 3:
    t = min(timeit.repeat(f'{h}();' * 25, 'from __main__ import f, g', number=40000)) * 1e3
    print(h, f'{t:.1f} ns')

print('\nPython:', sys.version)

Attempt This Online!

It’s worth noting that the locking is needed even in the case of 0 argument functions due to threads + nondeterministic functions being cached.

If someone is using this for lazy globals like your use case is for, and their use is getting an id created by a db to use in that db once to tie everything done by this invocation of the application, there needs to be one and only one outcome, not two parallel outcomes, one which replaces the other due to timing of both being a cache miss.

the example isn’t perfect, but the locking is part of the semantics of this, people can reasonably rely upon it, so you can’t remove it. You can write your own function that doesn’t do it if you don’t need it, and also drop anything else functools caches do, like attaching cache statistics.

Why are nondeterministic functions being cached?

The example there is a slight modification of one I’ve seen in real code, granted in the real code it wasn’t done using functools cache, and it was done in a way where it was sure to not be called from other threads.

We don’t have a guarantee of determinism in python. We have things people can analyze and reason about the level of determinism offered, so inevitably, there will be uses where people rely on the properties of a function that other people feel are inappropriate to rely upon for that use case. Whether you agree with that use or not isn’t relevant here. Changing them would break those users without it being a fault of those users for relying on what the function’s behavior is.

I don’t understand your point though. This thread is about functools.cache and how it behaves with a zero-argument function. By definition, if you’re caching the result, you don’t expect it to change. And if the function is nondeterministic, then you’re getting a random selection from its possible outputs.

1 Like

No, if you’re caching the result of a nondeterministic function, you are choosing to use one consistent one of it’s possible outputs for the duration it is cached.

That’s a behavior I’ve seen relied upon in real world code, and the example I gave in the same post you’re questioning even gives an example of how that can be useful. I’ve seen more complex versions of this with examining what happens with certain systems under every possible initial state, but that example is also going to be a practical vs purity argument on if it should be supported vs what it actually does currently.

The purity argument for not caching such a function doesn’t apply here, because that’s not the semantics this has ever had. Assuming everyone prefers that purity over a reasonable practical application of the existing behavior is how things get broken.

What I mean is, you are now saying “once this has been done once, it is now not going to change”. That is the entire point of caching. Which means that it has to be acceptable for it to not change now.

Yes, I understand that, and even gave an example where that’s literally the intent. It’s using the cache behavior to pick a value and use it consistently from then on. The locking is neccesary for that behavior to hold with nondeterministic functions that are cached and used across multiple threads.

Oh, I think I understand. So what you’re saying is that a naive caching function might not be suitable for such a situation, and the “inefficient” use of functools.cache is actually ideal?

Right, the fact that the locking behavior is there and guarantees that multiple threads see the same result even with a nondeterministic function isn’t something we can remove without breaking some existing uses.

Additionally, I believe this must have been at least somewhat intentional at some point (though I haven’t tracked down history to check) because if we assume the only functions the cache is intended to support are deterministic, then the lock is unnecessary.

Gotcha. I’m not entirely sure that the locking is sufficient to handle nondeterministic functions, though. Consider:

@functools.cache
def first_thread_ident():
    time.sleep(1)
    return threading.current_thread().ident

The obvious correct behaviour here is that this will return the identifier for the first thread that called it. Let’s try that.

import threading, time, functools
def thrd(delay):
    ident = threading.current_thread().ident
    print("Starting thread", ident)
    time.sleep(delay)
    print("Thread", ident, "got", first_thread_ident())

for delay in (0, 0.5, 1.5):
    threading.Thread(target=thrd, args=(delay,)).start()
time.sleep(2)

Result:

Starting thread 139680391079616
Starting thread 139680382686912
Starting thread 139680374294208
Thread 139680391079616 got 139680391079616
Thread 139680382686912 got 139680382686912
Thread 139680374294208 got 139680382686912

Two threads start, and both call the function. Both find that the cache is empty, and so they call the function. Each one gets its own thread identifier back. The third thread waits until the cache is populated, and it gets the second thread’s identifier.

So, no, multiple threads don’t see the same result, at least not with the current locking. In order to get that, you would need to populate the cache with a blocking marker that says “call already in progress”, and then somehow know when to resume. (For example, the blocking marker could be an event semaphore, the second thread waits on it, the first thread marks the event when it happens.) A lot of overhead for the normal case where you don’t get two threads concurrently calling the function.

2 Likes

:thinking: Okay, now I’m significantly more curious about the lock, as well as glad that my own use doesn’t rely on that. If it’s just for updating the cache statistics, the “optimization” available might be a version that doesn’t track cache stats at all for those who don’t use it and therefore doesn’t need to be something sperate that drops any of the extra steps I suggested people could write themselves.

1 Like