Simplify lru_cache

Hence lru_cache uses some rather awkward doubly-linked lists

It isn’t awkward at all. It’s a rather straightforward idea based on the notion that doubly-linked lists are ideal for this task. Every cache hit extracts a link and reinserts it at the head.

If a rewrite seems simpler, it is only because the task of maintaining order has been shifted to another data structure. Underneath the hood, the regular dict maintains order by storing data a sequence of consecutive pointers. Deletions are handled by leaving holes in the sequence. New entries are appended. This works fine for day to day dictionary use, but is not well suited to an lru_cache() where sequence rearrangements dominate.

When the cache is large, full, and has a high hit rate, essentially all of the work is rearranging the sequence and while leaving the dict unchanged.

and has to be really careful about multi-threading and reentrancy.

No matter how lru_cache() is implemented it has to be “really careful about multi-threading and reentrancy”. Every dict lookup or store is potentially reentrant. It is a non-trivial task to keep the cache in a consistent state between each of those calls. See the pure python implementation of lru_cache() — it uses a slow reentrant lock and still has to take a great deal of care.

I have a prototype of a re-implementation of lru_cache

I really wish people would discuss these ideas at a conceptual level before embarking on a large rewrite. It puts us all in a difficult position when someone shows up saying, look at all this work I’ve done, especially when the idea has been considered and rejected previously.

At least in theory, the doubly-linked list approach should always beat a maintain a sequence of pointers approach. That is why production quality ordered dicts and LRU caches are almost always implemented with a dict pointing to links in a doubly linked list.

Another thought is that I don’t what problem you’re trying to solve. Serhiy’s amazing implementation of the lru_cache() has astonishingly good performance. With the current implementation, it is easy to see that there are no unfavorable workload patterns — updating order for a cache hit takes constant (very fast) time regardless of the users pattern of access.

AFAICT the only benefit of a rewrite is to shift some of the complexity back into the regular dict, but in doing so we lose the ability to choose the best underlying store and we constrain future development for regular dicts (in particular, we can’t shrink the resize point for the underlying sequence).

Raymond

P.S. Here’s quick primer on tuning the regular dict for lru_cache style workloads:

Unhappy case: 1000 entries are stored in an array and we compact/resize at 1005 entries. Every cache hit deletes an entry and appends a new one at the end. After five hits, a compaction is triggered and all 1000 entries need to be reinserted. This is a massive overhead.

Happy case: 5 entries are stored and we resize at 1005 entries. After 1000 hits, a compaction is triggered and the 1000 entries are reinserted. That gets us back to the single update per hit that we have now.

The regular dict is currently tuned for space-efficiency and to not have noticeable performance issues for typical dict workloads (where it is uncommon to intermix deletions and insertions). The regular dict is not tuned for an lru_cache() workload where every hit deletes an entry and appends a new one at the end. This is appropriate — almost the entire point of the compact and ordered dictionary was to save space.

I’ll post some benchmarks later.

Be sure to time the unhappy cases. It is very easy to fool yourself into thinking that a dict store, dict deletion, and periodic compaction is cheaper than a few pointer updates per cache hit.

Also note that any timings reflect the current tuning of the dict which is something would like to be able to change in the future without having to worry about the lru_cache().

Remember, when there is a cache hit, the current implementation makes no changes to the dict at all. Only a few DLL pointers are updated. This is VERY hard to beat.

2 Likes