Add `OrderedSet` to stdlib

steven.daprano · October 22, 2022, 8:54am

An ordered set would be a data structure which supports the set API, but additionally guarantees that elements keep some specified order (usually insertion order).

Sets are most certainly not merely “fancy lists”. The most important part of the set API is fast membership testing, preferably O(1) (constant time) but sometimes O(log N) is acceptible. This contrasts with lists, where it is O(N).

For example, if colours is a list of strings, then "red" in colours will take time proportional to the size of the list. If you double the size of the list, it will double the average time to search it. If you increase the size of the list to be 100 times bigger, then on average it will take 100 times longer to search.

But with sets, the time is (approximately) independent of the size of the set. "red" in colours will take the same amount of time regardless of whether the set has ten entries or ten million entries.

That fast membership testing allows sets to efficiency support operations like intersection and disjoint union, which are possible with lists but slow.

Python’s sets are unordered: they do not guarantee the order of elements:

>>> print({"red", "blue", "green", "yellow", "orange"})
{'blue', 'red', 'orange', 'yellow', 'green'}

An ordered set would guarantee the order of elements. This is useful for many reasons:

having a consistent and predictable display;
making it easier to write doctests involving sets;
guaranteeing iteration order;
predictability of set.pop().

But adding order to a set doesn’t come for free: it requires a significantly more complex implementation, which is more work, probably requires more memory, and may slow down set performance.