Developing a detailed historical understanding of Python dict implementations

kknechtel · May 4, 2024, 11:37pm

I’m trying to put together a really complete, detailed technical understanding of Python dict implementation and how they’ve varied across Python versions - so that I can do a writeup and have all the information in one place.

So far, I have:

refreshed my general understanding of hash tables (it’s been 20+ years since I studied this in university)
referred to a Stack Overflow Q&A, @rhettinger 's original proof-of-concept and proposal for the new implementation in 3.6, and some issue tracker entries such as GROWTH_RATE prevents dict shrinking · Issue #77386 · python/cpython
looked through several versions of the relevant source code, along with dictnotes.txt (although the latter seems to be out of date and didn’t really clarify much for me)
written some test code to investigate some interesting scenarios in different Python versions, and put it on GitHub

However, there have been details I haven’t been able to figure out. I’ll explain my aims with the test code and what I think I’ve demonstrated so far, and ask some questions along the way.

First: across many Python versions it has been a constant, as far as I can tell, that the smallest available hash table has 8 entries, and that Python will rehash when an attempt is made to use the 6th of these. This is because of the USABLE_FRACTION macro, which seems to have been left alone and which implies that (8 << 1) / 3, i. e. 5 of the table slots may be used.

My tests, therefore, first check the sys.getsizeof a few specific example dictionaries: an empty one, ones with 5 elements (as many as will fit before the first resize), and ones with 10 elements (as many as should fit before the second resize - keep reading). They also try both strings (single characters 'a', 'b' etc.) and numbers (small integers 0, 1 etc.) as keys.

As some general notes: the results I show in the README on GitHub are from 64-bit builds. I am, for the most part, not considering shared-key dictionaries here.

Observations and questions (in bold), part 1:

The implementation seems to change at least a little bit in most minor versions.
(Not reflected in the test code) It seems to make no difference whether keys are added one at a time or all at once (e.g. by dict.fromkeys). This is a little strange to me, because from what I could understand about the discussion around GROWTH_RATE, this is supposed to be explicitly considered. (Otherwise: why not just count the non-dummy entries, and choose the power-of-two size that makes the density somewhere between 1/3 and 2/3?)
(Not reflected in the test code) Python 3.5 seems to resize the table in the normal way, from 8 slots to 16; but a 16-slot table will hold 11 elements rather than 10 (since the USABLE_FRACTION macro will round down) as in all other versions. What happened?
Python 2.7 seems to resize the table by a factor of 4 instead of doubling it. Some of what I could find about GROWTH_RATE seems to imply that the table gets resized twice in a row; but I couldn’t understand how the calculation of GROWTH_RATE really works, nor how it’s applied in order to resize the table. Anyway, this of course means that tables created with up to 21 keys (assuming no deletions) will be the same size. Was this intentional? If so, why was it changed back?
But even weirder than that: the 2.7 dict size jumps from 280 bytes to 1048. In other words, 768 more bytes are taken up - enough for 32 more table slots (in the old scheme). This is bizarre to me - it’s as if the original 8 slots are held onto, and wasted. Why? This doesn’t happen with future resizes: e.g. the next jump in size is from 1048 to 3320 (96 more slots, going from 32 to 128 - another quadrupling).
In 3.8, empty dicts finally stop taking as much space as 5-element dicts. It appears that they not only save the 128 bytes needed for 5 “entries” plus 8 “indices”, but in fact drop the entire PyDictKeysObject and leave the PyDictObject with a null pointer. Was using the space a conscious decision before, or just an oversight?
Compared to Python 3.8, non-empty dicts in 3.6 were 8 bytes larger, and in 3.7 16 bytes larger. Why? Is this extra space in the PyDictObject, or in PyDictKeysObject (or both in 3.7)? I can’t check code versions this old on GitHub, of course.
Starting in 3.11, we see an optimization for dicts whose keys are all strings. From what I can tell in the source code: a different entry type is used which omits the hash, and the implementation instead relies on a cached hash value in the string objects. Is that right? Also: I thought I had heard about dicts being optimized for string keys in much older versions of Python. Did I just imagine that? Maybe I had it confused with the key-sharing optimization (since both of these would be relevant to the __dict__ of instances of the same class).
Also in 3.11, the fixed-size portion of PyDictKeysObject becomes 8 bytes smaller by e.g. taking advantage of the fact that the table size will be a power of two (and thus storing a smaller table-size number in a smaller type). Nice. It also looks like PyDictKeysObject gained a 4-byte dk_version which seems intended to replace the 8-byte ma_version_tag in PyDictObject (deprecated in 3.12). Is this used for detecting modifications during iteration? I couldn’t understand the description in the source comments about how the field is updated. Or else just what is it for?
It seems that the combined overhead of the miscellaneous “bookkeeping” bits - everything aside from the actual table entries (and table indices, for the new implementation) was less before: 88 bytes in 2.7, 96 in 3.5; 104 bytes today (and as much as 120 bytes in 3.7). What sorts of things changed? In particular, was the PyDictKeysObject data inlined before or something? (the shared-key optimization seems to exist in 3.5, but not in 2.7. I think that optimization requires PyDictKeysObjects to exist separately, but I’m not sure.)

In the second part, I try something more involved, but which gives a much simpler diagnostic. I create a dict with 5 keys, delete some number of them, then attempt to add a key, and see if the dict gets bigger. Generally, deleting a key from the dict needs to leave behind a “dummy” in the table (so that keys that were inserted later and had a hash collision, can still be found - as the dummy will signal to keep looking). So that means in general we still have to rehash even if we nominally “made room”.

In 3.5, the behaviour is not consistent between runs. What’s going on, exactly? It seems like there’s some attempt to check whether the new key could replace a dummy, before deciding to resize. If that’s it, why don’t other versions seem to do this too? I also remember something about the hash function having some kind of random seed to avoid denial-of-service attacks (by providing a large amount of keys in input data that would be known to collide). Is that relevant here?
In 3.6 through 3.9, if we remove at least three keys before adding a new one, the dictionary consistently ends up taking the same amount of space as before. My guess is: the rehash happens regardless, but since dummies aren’t included when rehashing (there’s no reason to), the number of entries being hashed is now small enough that Python decides to allocate the same amount of space as before (instead of doubling the size). Is that right?
Starting in 3.10, the old 2.7 behaviour appears to come back: unless the table is completely emptied, adding a new key will force a rehash and a size increase. Is there a good reason for this, or should it be considered a regression?
Speaking of which: is emptying a dict completely special-cased somehow? In my tests, I del each key one at a time; so if there’s a different code path for empty dicts (rather, ones where every table entry is a dummy), it needs to actually detect or track that somehow - it isn’t something intrinsic to .clear, for example.

kknechtel · May 7, 2024, 6:07am

… Okay, this particular behaviour has me completely stumped:

$ python3.11
Python 3.11.2 (main, Apr  5 2023, 03:08:14) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class x: pass
... 
>>> [x().__dict__.__sizeof__() for i in range(40)]
[280, 272, 264, 256, 248, 240, 232, 224, 216, 208, 200, 192, 184, 176, 168, 160, 152, 144, 136, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56]

I know that object dictionaries have a special optimization where they share an object to represent common keys, and store the values separately. But I have no idea how this leads to successive instances supposedly using progressively less space, one word at a time until the minimum.

It’s also new in 3.11. In 3.10:

$ python3.10
Python 3.10.14 (main, May  2 2024, 15:07:09) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class x: pass
... 
>>> [x().__dict__.__sizeof__() for i in range(40)]
[88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88]
>>>

(I guess that the shared keys object is never counted as part of the individual dicts in <=3.10? But the main dict object and the individual values object are?)

MegaIng · May 7, 2024, 10:18am

See Compact layout of Plain Python Objects · Issue #72 · faster-cpython/ideas · GitHub . You will not be able to ignore that part of python’s optimization history if you want to talk about dict in the newer version because of how tightly coupled it is.

kknechtel · May 7, 2024, 9:32pm

Yes, I understood in broad terms how the key-sharing dicts for __dict__ work. The minimum size in 3.11 ends up being 56 bytes, presumably, because an empty dict is 48 bytes (sys.getsizeof reports 16 more for the double-linked list holding GC-tracked objects) and then the ma_values pointer points at a minimal-size PyDictValues (some header data that fits in a single word, and no actual reserved slots for table entries).

What I don’t get is why the size starts out much larger and progressively declines. It’s not as if it’s accounting for “part ownership” of the shared PyDictKeysObject (that wouldn’t give integer values, anyway). Rather it looks as though each successive object reserves a smaller table for values, starting at a huge 28 slots, despite none of them being used for anything (since the x instances have no instance attributes).

The header seems to be the same in 3.10, so it looks as if it just reserves space for 4 values every time, which is also a bit strange.

Anyway, that GitHub issue thread appears to be describing a proposal that won’t be seen until 3.13 (although it does give some background on the key-sharing dicts).

MegaIng · May 7, 2024, 11:38pm

The central behavior you are observing is described in this comment by @markshannon

In python 3.13 a smarter way to estimate the final value count is/will be implemented, but the above behavior is already implemented 3.11.

The problem is that the values table is inlined into the object allocation, meaning it can’t be changed after the object is created, at least not without allocating a new one and wasting all that space from the previous allocation. This is why the objects first overallocate to try and fit to the correct number of attributes “from the top down”.

kknechtel · May 8, 2024, 2:07am

I see - in 3.13 there would be a big problem if the inlined values table isn’t big enough. But even now, a rough implementation is in place. Thus: when the class is created, Python doesn’t know how many attributes the instances will end up with after __init__ runs. So for now, it’s over-allocating until it can find out.

Do I understand correctly that: when a new attribute is added to an instance (i.e., a key to its __dict__), the shared keys will accommodate up to the values_size limit (and future instances of the same class will be bound to allocate space for that value)? But after that point, the instance that tries to add yet another attribute will “un-share” the keys?

… It does seem like a pretty clever system for the common cases (avoiding the need for __slots__). But I sense that things like types.SimpleNamespace (or the similar thing you get back from argparse, or any number of other re-inventions of that wheel) could end up wasting a lot of space, where ordinary dicts would be better. (Of course, people use tools like that because they want the attribute syntax.)