Counter.fromkeys

This is half a new feature proposition, and half a doc update proposition.

I’m using the collections.Counter class, and I wanted to initialize a Counter with a series of keys being set to some integral value, let’s take 3 for example. I discovered that Counter.fromkeys is completely inactivated, and that workarounds using the Counter constructor are advised instead.
The reason at first glance is obvious : Counter relies on its values supporting addition, so a default value of None is a non-starter. So, I wondered, why not keep the method but make its value parameter required instead of optional ? In other words, why is Counter.fromkeys(iterable, v) disabled even when I do provide a v ? Counter(iterable) * v would be a workaround if Counter supported that arraying of operators, but it doesn’t (and I think that shouldn’t change).

New feature
I dug into the code, and quickly observed that a difficulty would be to respect Counter’s take on multiplicity : if a key is repeated in an iterable, Counter will cumulate its default value. So, a dumb implementation as this :

@classmethod
def fromkeys(cls, iterable, v): # notice v is not optional
    super().fromkeys(iterable, v)

… would fail to take repetitions into account.
I put together a simple implem for that, which requires an adaptation on the C version of the _count_elements helper function but the idea is there. It can be found here.

Doc update
However, I found that you already can do the version of this which doesn’t care about repetitions in the input iterable (which I don’t) in your code without modifying the stdlib, and without having to build one dict to be copied by the Counter. With this simple code : super(Counter, Counter).fromkeys(iterable, v).

My doc proposition is this : if the change to Counter.fromkeys is deemed not a good idea for various reasons (which I can understand, the conditions on the default value are a bit weird : it must support addition with 0 and results of that addition), how about adding that line to the doc, as a documented workaround ?
I suspect some number of people must have come there wondering why the fromkeys method doesn’t work, so giving them a workaround would be a fair tradeoff for taking the method away.

So, adding that super(Counter, Counter).fromkeys(iterable, v), and Counter(iterable) if you don’t have a default value, and Counter(iterable.keys()) if it’s a mapping. And too bad if you want both repetition of the input iterable to matter and a non-1 default value, you’ll have to do that yourself, but at least we offer help for the other cases.

What do you all think ?

If e.g. your input is a list, you could just multiply the list before using the Counter constructor. Of course, that temporarily wastes the space.

It would still have to do a type check and arguably a value check (should it be allowed to initialize to a negative value?). If it were available, I can’t see why it wouldn’t make sense to have a default of 1 instead. (Surely not 0, since then there would be no point in pre-setting the key.)

Well, yes; that’s the point of using the class: to count the times that each value appears in the inputs.

For this exotic circumstance, just build a dict first and then convert it:

Counter(dict.fromkeys(seq, 3))

The docs already have an example for converting from a dict for a more common case:

Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs

The comments in the Counter are clear that the inherited fromkeys method is disabled because it would be bug bait:

colors = 'red green red'.split()
Counter.fromkeys(colors, 3)    # User intent is ambiguous.

Since a Counter is just a dict with a few overridden methods, it is easy to bend it to your will either by building a regular dict first, or with simple update loops, or with a subclass.

7 Likes

The idea is to provide a faster option than that, by avoiding the need to create a whole dict and then copy it key-by-key, value-by-value. Both the super way and the one I prototyped avoid that.

Then we make the method’s doc unambiguous. That’s a design choice already made in the Counter constructor, I don’t understand why the one would be more of a bug nest than the other.
In my case I was replacing dict.fromkeys(some_d, v), I doubt there are many cases with repetitions and which would change anything.

Because that can be done by passing the iterable to the constructor directly (using .keys() if it’s a dict or iter() if you’re not sure of the type). If using 1-parameter Counter.fromkeys is the same as calling Counter, then there’s no point in enabling 1-parameter Counter.fromkeys at all.

My input is usually a dict or another counter, which I want to (copy and) set to a constant value, so doing that list-mul thing would waste both time and memory.
The super solution is really a great solution to that use case : it doesn’t care about repetition but neither do I since I pass dicts. Adding that in the doc would also show a creative use of super which I think has a lot of unused potential.

That is a mischaracterization. Converting from a dict invokes super().update(iterable) which is optimized for an exact dict input.

1 Like
>>> from timeit import Timer
>>> from collections import Counter
>>> from random import random, randrange
>>> Timer("super(Counter, Counter).fromkeys(d, 3)", "d={randrange(10**8):random() for _k in range(1000)}", globals=globals()).autorange()
(5000, 0.2569447999703698)
>>> Timer("Counter(dict.fromkeys(d, 3))", "d={randrange(10**8):random() for _k in range(1000)}", globals=globals()).autorange()
(20000, 0.27109410002594814)
>>> Timer("super(Counter, Counter).fromkeys(d, 3)", "d={randrange(10**8):random() for _k in range(1000000)}", globals=globals()).autorange()
(2, 0.30232439999235794)
>>> Timer("Counter(dict.fromkeys(d, 3))", "d={randrange(10**8):random() for _k in range(1000000)}", globals=globals()).autorange()
(5, 0.45290790003491566)

Turns out in practice the super way takes way more time, and the dict intermediary is more efficient than I thought even for massive dicts.
So it’s not worth documenting it, obviously, and a specific Counter.fromkeys like I prototyped could be useful if people want duplicate stacking support, which I don’t need.

And I will implement fromkeys that way in my Counter subclass :upside_down_face: