Anyone here who follows my posts closely, may have noticed that I’ve been studying the Python dict implementation quite intensely the last few days.^{[1]} I have essentially two ideas for improvements that could save a fair bit of space (varying amounts from 10-50%, with 25% typical for smallish dicts) above and beyond recent improvements, with what I believe would be minimal impact on performance. But I’ll present those separately. For the moment, I want to talk about trading off some space for performance, since so much has already been saved.

Some background: the `USABLE_FRACTION`

macro in the CPython code governs the amount of space that can conceptually be filled in the hash table powering a Python dict (the size of which hash table is always a power of two). Essentially it multiplies the table size by a load factor, currently two-thirds. Thus, when the first key is added to an empty dict, and it’s given the smallest allowed hash table (8 entries), that dict can hold up to 5 items before it will be rehashed and the table resized.

Since the table size doubles whenever it’s resized, and this is triggered by reaching a density of about 2/3, it stands to reason that the new table has a density of about 1/3. Approximating that dicts of each size are equally common across any given small range of sizes, the average density is 1/2.^{[2]}.

Before the Python 3.6 optimization, this meant that a dict with a table size of, say, 128, would use 384 words for that table (a key pointer, value pointer and hash value for each table entry), plus whatever header, and on average represent 64 key-value pairs. And that was the tradeoff for a while. Some other languages offered hash tables that defaulted to a higher maximum load factor (from what I can tell, Java used 0.75), and perhaps offered some customization.

But now, we can *mostly* collapse the empty entries in the table as far as a single byte, and 64-bit architectures are common. The same dict will use 255+16 = 271 words on a 64-bit machine. Even though the table’s average load is nominally ~1/2, the actual storage for full-sized table entries averages ~3/4 full.

Not to mention the other optimizations available: setting aside shared-key dicts for a moment, dicts that have only string keys get a special optimization that drops the hashes out of the table (instead using a hash carried with the string itself) - so they would use only 170+16 = 186 words. That’s less than half the original amount.

So at this point I’d like to propose setting the maximum load factor to half what it currently is - in practical terms, doubling each “table index” size, but leaving “table entry” sizes alone. This causes a hiccup for dicts with 86…170 keys, since they get shifted up a “shoebox size” and need to use 16-bit values for indices (so the “table index” storage space actually quadruples). However, I think this is well worthwhile. The worst-case increase in space used is about 26% (`(340+H+128)/(340+H+32)`

, where H is the relatively small number of words of space in the dict headers etc.). The normal cases are more like 6% (`(255+H+32)/(255+H+16)`

).

Plus, again, I have more possible space optimizations in mind - why not give some of that savings back, if the performance can improve?

So let’s examine that: when the table is 2/3 full, simple math shows it takes an average of 3 attempts to find an empty slot with random^{[3]} probing. At 1/3, the average number of attempts is only half that, at 1.5. More advanced math can give us an average across that range assuming a uniform distribution. With load varying from 1/3 to 2/3, we can say that the average number of probes is about 2.08.^{[4]} If we lower the maximum load to 1/3, on the other hand, then it will range from 1/6 to 1/3 by the previous argument, and thus the average number of probes falls to about 1.34 : about a 36% improvement.

Of course, the returns are rapidly diminishing^{[5]}, and the time spent inserting a key depends on more than just the number of attempts to find a place for it^{[6]}. So I’m not proposing to go any further than that. But this seems to me like an easy win.

In fact, this has been part of an effort to design a combined sequence/mapping data structure for my own programming language. ↩︎

Of course, it’s not possible to have a uniform distribution over an unbounded interval. In practical terms, a dictionary’s maximum size is limited by memory, as well as by the type used to record the hash table size. But more importantly, in practice the distribution is not uniform, but weighted towards smaller dicts. One can imagine that the actual average density, in practice, is slightly lower than the naive calculation gives. ↩︎

Of course, probing does need to be deterministically based on the hash, but it should ideally have the characteristic of a random probe - except slightly better, since it should avoid trying the same slot twice. ↩︎

Again, slightly less if we account for a practical distribution rather than a uniform one. But the same applies regardless of load factor; to a first approximation I will say that it cancels out. ↩︎

Obviously, it takes at least 1 attempt on average. ↩︎

The overall impact would need to be benchmarked, of course. ↩︎