I’m a big fan of collections.Counter
.
In some sense, collections.Counter
is the evolution of defaultdict(int)
. It takes that class and adds sugar including appropriate constructor overloading, methods like .most_common()
and set operations.
Thinking about what I still use defaultdict
for, the main cases are defaultdict(list)
and defaultdict(set)
, which I use often (perhaps more than Counter
). However, I miss the sugar of a Counter.
Here are some ideas:
Construction
Construct with a mapping or an iterable of pairs (like dict
):
>>> from collections import ListCollector
>>> ListCollector([('a', 1), ('b', 2), ('a', 3)])
ListCollector({'a': [1, 3], 'b': [2]})
This might be the most common usage, an alternative to itertools.groupby()
which collects non-consecutive groups:
>>> ListCollector(range(5), key=lambda v: v % 2)
ListCollector({0: [0, 2, 4], 1: [1, 3]})
Sometimes you already have a sequence of sequences:
>>> ListCollector.from_iterables([('a', [1, 2]), ('a', [3, 4])])
ListCollector({'a': [1, 2, 3, 4]})
Operators
Because these are effectively defaultdicts, item access inserts and returns an empty item:
>>> x = ListCollector()
>>> x['spam'].append('eggs')
Lists support + and += for concatenation; ListCollector would apply these item-wise:
>>> x = ListCollector({'a': [1, 2], 'b': [3]})
>>> y = ListCollector({'a': [4], 'b': [5], 'c': [6]})
>>> x += y
>>> x
ListCollector({'a': [1, 2, 4], 'b': [3, 5], 'c': [6]})
Methods
As with Counter, dict methods would be identical to a normal dict, but additional methods are available to treat the collector as a collection of key-value pairs.
Counter has .elements()
; these types are duck-typed to do the same. This is arguably more useful than Counter.elements()
.
>>> x = ListCollector({'a': [1, 2], 'b': [3]})
>>> list(x.elements())
[('a', 1), ('a', 2), ('b', 3)]
We can implement an equivalent of Counter.most_common()
for this type, by comparing the len()
of the values.
>>> ListCollector({'x': [1, 2], 'y': [3]}, 'z': [4, 5, 6]).most_common(2)
[('z', [4, 5, 6]), 'y': [3])]
Other implementations
If the values are hashable, then other possibilities are available. We can collect only unique items using sets, or use Counter as multisets/bags to avoid losing the frequency data.
>>> from collections import SetCollector
>>> SetCollector('Apoiehjr-8141¬', key=unicodedata.category)
SetCollector({'Lu': {'A'}, 'Ll': {'j', 'o', 'p', 'h', 'r', 'e', 'i'}, 'Pd': {'-'}, 'Nd': {'1', '4', '8'}, 'Sm': {'¬'}})
>>> CounterCollector('AARRR!!!!', key=unicodedata.category)
CounterCollector({'Lu': Counter({'A': 2, 'R': 3}), 'Po': Counter({'!': 4})})
Unlike ListCounter, both of these could support set operations (considering them as sets/counters of key-value items).