There are lots of other examples in the ecosystem outside of Django - it’s a convenient way to lazy intialize objects like Boto3 clients, “default” instances or even ML models.
Looking at the source code in _functoolsmodule.c I cannot see any special casing for these kind of functions. An lru_cache on a function with no parameters is a bit of an oxymoron - there can only ever be one output and it cannot be dependent on any inputs.
I’d like to suggest that we add a special case to lru_cache() for callables that have no parameters, to avoid the current overhead of using a PyDict on every lookup and make it more akin the code sample above - a single PyObject variable that’s stored on the first call and returned on subsequent calls.
If there is a generally agreeable reception to this proposal, I’d really like to give this a shot as my first contribution to CPython.
Can you show there’s enough of a performance improvement on subsequent calls to the functions to warrant the maintenance cost of special-casing for this?
Agreed, a special @once decorator sounds like a better approach here. Although its behaviour is straightforward enough that I usually write it myself if I need it (I don’t even have a library with it in, I just rewrite it each time).
And to be honest, I’ve never heard of the idea of using functools.lru_cache for this before. Maybe it’s a trick that’s only common in django, which makes me think that django could expose a custom version of this if the performance improvement warrants it?
I see this trick in some large, inter-connected projects to reduce startup time (vs. calculate those values on import), and Django is one of them. This is not the only solution to the problem; pip uses classes and cached properties, for example, which makes me wonder whether it would be a better solution to make descriptors work on modules so we can just use @cached_property for this… but that’s significantly expanding the scope of the solution.
Can you show there’s enough of a performance improvement on subsequent calls to the functions to warrant the maintenance cost of special-casing for this?
I believe I can, I’ll try and create a PoC this weekend. Looking at the implementation for lru_cache I also think it’s pretty simple to do - I was going to use the PyObject member that’s currently always a PyDict as the place to store the returned object.
I think most of the savings will be memory from PyDict, but there is also hashing and PyDict calls that will be removed.
And to be honest, I’ve never heard of the idea of using functools.lru_cache for this before. Maybe it’s a trick that’s only common in django, which makes me think that django could expose a custom version of this if the performance improvement warrants it?
@once might be a semantically better, but it doesn’t exist in the stdlib, and people (myself included) have clearly cottoned on to the idea of using lru_cache as a simple, copy-paste free replacement. Plus it’s faster (Using the once implementation above)
In [14]: %timeit return_1_once()
98.8 ns ± 4.51 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [15]: %timeit return_1_lru_cache()
56.7 ns ± 1.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
It undoubtably doesn’t matter at all, I’m just guessing why people use lru_cache in the way they currently do. It’s a use case that people currently have, it should be simple to implement and has the upside of being even faster and more memory efficient.
If a patch really turns out to be simple enough then it seems like everybody wins.
I guess the main contention here is if we should encourage people to use the lru_cache decorator like that, even implicitly?
If you can confirm that the maintenance cost of the special case is justified by the performance improvement, this should probably just be submitted as a bpo item and PR (treating it as a special-case optimisation, not trying to promote or emphasise it as a “call once” design pattern, or anything like that).
If you can’t justify the cost, I suspect it should just be dropped.
IMO no. By all means pursue this as an optimisation (if it proves justifiable), but don’t try to make this the “sanctioned” way to do call-once functions. If it’s to be “official”, it should be more discoverable/obvious (i.e., a @once decorator), rather than a side-effect of adding a LRU cache.
Ok, thank you for your guidance. I’ll try and submit a PR with this optimization and see how it goes.
I’m keen to also add a functools.once decorator, I’ll draft up a proposal and submit it to the python-ideas mailing list (or does this forum supercede that? Im not clear on that, sorry).
Yep, and I’m eager to try. I’ll start with the optimization, then take a stab at making a patch for once(). I’ll then post to the python-ideas mailing list and see if I can get a consensus on the PR, and see if it needs a PEP or not.
My point is that a pure Python version won’t1 be faster than using a C-accelerated lru_cache, and if once can’t out-perform lru_cache there’s no point (beyond naming2, which can be covered by once=lru_cache…)
I totally agree that this discussion is all about a micro-optimisation that hasn’t yet been demonstrated to be worth the cost.
Correct, that’s just standard policy though (PEP 399).
1 Conceded, that’s an unproven assertion.I think it’s likely, though 2 I assume when you say “clarity” you mean “of the user’s code”, rather than “of the implementation”.
try/catch NameError may be faster than if is None (especially after implementing a zero-overhead try). I don’t know whether it will be faster than lru_cache().
I didn’t want to bump the thread unnecessarily, but it occurred to me that thread safety is also a bonus with the lru_cache vs the pure-python implementations above. These work in a single threaded environment but the moment concurrent threads might call the function you end up with your “once” function being called two or more times.
So the “simple” implementations above are not equivalent, and the equivalent with locking is obviously not something nice to repeatedly write.
It is easier to implement the variant with locking in Python than in C. There is not a special C API for reentrant locks.
lru_cache() does not guarantee that the function will be called only once with the same argument. It only guarantees that the internal structure will not be broken and there will be no leaks or crashes even if it will be called multiple times.
I worked on a simple implementation and found that the speed benefits are probably not worth the changes, especially given that using lru_cache like this isn’t semantically correct.
I’ve started a thread on the python-ideas mailing list about adding functools.once: