Add `OrderedSet` to stdlib

I do not need that often, but one use case is when serializing/deserializing JSON which doesn’t have sets, only lists. I can use a library that will desarialize some JSON lists to sets (eg. hints in annotations) so I can work on sets in code. In one of my applications I use content addressing. The order of sets is important when serializing back.

4 Likes
  • when we finally came up with a good ordered dict implementation, that implementation unfortunately does not carry over to sets;

why cannot ordered dict implementation be used for set ?

a set can be implememented with a dict where you simply don’t use/care about the value given/associated to the keys, no ?

1 Like

My memory may fail me, but at my previous job as an intern I had to design a CRUD application that would generate both aggregate and per-file statistics for a collection of CSV files, where I would have to filter out invalid rows based on a number of numerous predefined conditions. Deduplicating indeces that have failed multiple expectations at once with indeces = {} and indeces = indeces | dict.fromkeys(df.index.values)) - read, with an ordered hash set - inside a for loop has allowed me to generate a sequence of indeces that has failed at least one expectation while maintaining an intuitive order in which a sequence of invalid rows has appeared in that collection of files. I believe my code would be more readable if I had a stdlib option to use a hash set for my purposes, as opposed to a hash map as a proxy to an ordered hash set - I needed a set, but my particular data structure did not have all of that good stuff that comes from inheriting from collections.abc.Set.

Now, as a more experienced coder I see that I probably should have used pd.concat and pd.unique, but the point still stands. Deduplication of a sequence where you would like to 1) keep it a subtype of a sequence 2) while gaining collections.abc.Set methods is something that I would imagine comes handy in designing many CRUD applications.

Side note: msets actually are pretty useful. We’ve been benefiting from msets through collections.Counter , which practically is a multiset :slight_smile:

Is it? I see it more as a special case of defaultdict(int). Maybe I’m missing something fundamental about it.

The idea behind a multiset / multimap, as I understand it, is to be able to store multiple equivalent keys (objects that compare equal to one another), or multiple values bound to the same key, and then find all of them with a lookup. Not just count how many there are, but actually store and return them all.

A multimap[K, V] can be implemented using a mapping from K to Collection[V]. A multiset[K] could be considered equivalent to multimap[K, K].

The advantage of a dedicated multiset/multimap data structure is that it works without requiring this two-level structure. In fact, it usually isn’t much of a problem to take the hash table backing sets and maps and configuring it to allow inserting multiple “equal” keys.

But the need for such sets/mappings doesn’t come up that often, and the “Collection[V] as value” way usually does the job, as sub-optimal as it is.

1 Like