functools.(lru_)cache on argument-less functions is a convenient tool to create lazy globals when “enforced” singleton behavior is undesirable (which I think is most times) or difficult (a value is cached, not an intrinsic type, adding a wrapper singleton just for that purpose is generally overkill if it’s desirable at all).
However both and especially functools.lru_cache are quite sub-optimal: by definition 0-argument functions can only have a single cache entry at the key (), but by default lru_cache will do the whole LRU dance (including locking) on each access, cache (/ lru_cache(None) skips the LRU part but still has a fair amount of overhead. And this does not seem intuitive, as I’ve seen people use lru_cache(1) on arguments-less functions, which could not be less useful.
I think that if user_function takes no arguments and the maxsize is non-zero, it should avoid all mapping interactions entirely, it should just need to check if it has been initialised (or reset), and increment the relevant counter.
Calling f in timeit gives 21.4 nsec per loop. Calling g gives 19.7 nsec per loop. The cached version is already cheaper than calling an empty function. There’s not much fat to cut-off here; this is already very cheap.
Also, I’m unclear why this would ever matter in a real use case. Have you seen tight loops that call a constant function over and over again, but don’t do anything interesting with the result? Shouldn’t the code hoist the constant function call out of the loop. I can’t think of any circumstances where cutting this from 21 nsec to perhaps 15 nsec would make a difference to the calling code.