Why does my set seem sorted?

I was having some tension between a set and a list in a class and idly decided to see if sets seemed to preserve insertion order like dicts. Well, no, but I got a surprise:

Python 3.10.6 (main, Jan  1 2024, 19:58:15) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s={1,3,5,7,9}
>>> list(s)
[1, 3, 5, 7, 9]
>>> s.add(11)
>>> list(s)
[1, 3, 5, 7, 9, 11]
>>> s.add(9)
>>> list(s)
[1, 3, 5, 7, 9, 11]
>>> s.add(4)
>>> list(s)
[1, 3, 4, 5, 7, 9, 11]
>>> s.add(2)
>>> list(s)
[1, 2, 3, 4, 5, 7, 9, 11]
>>> s.add(10)
>>> s
{1, 2, 3, 4, 5, 7, 9, 10, 11}
>>> list(s)
[1, 2, 3, 4, 5, 7, 9, 10, 11]

This is a small set, but is this expected? An artifact of int hash functions? Something else?

Small ints are interned, and they’re more predictable:

>>> {1, 2, 3, 4, 5, 7, 9, 10, 11}
{1, 2, 3, 4, 5, 7, 9, 10, 11}
>>> {11, 10, 9, 7, 5, 4, 3, 2, 1}
{1, 2, 3, 4, 5, 7, 9, 10, 11}

Larger ints, however:

>>> {255, 256, 257}
{256, 257, 255}

I wondered about interning but couldn’t see how it would tie into the ordering. I’m unconvinced.

I see that ints seem to use themselves as their hash function:

>>> for i in 1,2,3,4,5,6,7,8,9,10,11,256,257,258,3001,4001: print(i,hash(i))
...
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
256 256
257 257
258 258
3001 3001
4001 4001

I can imagine a set having a minimum initial number of buckets, and while the hash function is below that size you’d get ordered results.

That’s my current hypothesis.

Has nothing to do with interning. Matthew’s result {256, 257, 255} for example has that order because the hash values of those ints modulo the hash table size (8 for three elements) are 0, 1 and 7.

3 Likes

Stefan is correct, and I’d like to expand on his point with more details. A set is an unordered data structure that uses a hash table. This means the order of elements in a set is determined by the hash values and how those hash values are mapped to slots in the hash table.

When you create an empty set or dict, the hash table starts with 8 buckets. This is the default initial size for both structures in Python.

For set , the modulo-based bucket lookup happens in the following code, it’s implemented using bitwise operations rather than the traditional % operator:

2 Likes