Defaultdict: Optionally pass key to default function

Kacarott · January 23, 2024, 7:33pm

On multiple occasions I have found myself reaching for defaultdict as a good solution to a particular requirement, only to realise that the generated default value should actually depend on the key trying to be accessed.

Currently, the function passed as an argument to defaultdict is called without any arguments to generate a default value. In my opinion a very simple but useful improvement would be to additionally support functions which expect to be passed the key as argument.

This could be implemented as simply as adding a keyword argument to defaultdict: keyarg=False (or something other arg name) which could be called with True where this behaviour is desired. I believe this should also be fully backwards compatible.

As a motivating example for its use case and how the code might look, consider this:

from collections import defaultdict

def run_query(tag):
    # do expensive database query based on tag
    return result

def predict_changes(tags_changes):
    data = defaultdict(run_query, keyarg=True)
    for tag, change in tags_changes:
        data[tag] += change
    return data

Other things to consider:

I am aware that such a class could be easily implemented by subclassing defaultdict, but I am making this suggestion to make it more convenient, and more directly available.
In some cases, the same effect can be achieved by simply using @functools.cache, however the downside of that is that the stored data is not as easily accessible, for example to modify, inspect, or to potentially delete and recompute values.

Are there potential downsides I am missing? What do you think? I am keen to hear feedback.

Rosuav · January 23, 2024, 7:51pm

It can actually be implemented by subclassing dict, which might be easy enough?

merwok · January 23, 2024, 8:02pm

In this case, you would subclass dict and define the __missing__ magic method:

class QueryDict(dict):
    def __missing__(self, key):
        # expensive computations
        return result

Kacarott · January 23, 2024, 8:43pm

Yes I am aware. To reiterate, I do know that it is possible to implement this just by subclassing, just as it would be very easy to implement defaultdict itself just by subclassing dict. The idea here is simply to make defaultdict (what is essentially a convenience class) yet even more convenient with effectively no downsides that I can see.

jamestwebber · January 23, 2024, 9:02pm

One downside is that all of the current uses of defaultdict would get a tiny bit slower, right? Because now, every use of __missing__ has to check if keyarg is true or not.

edit: although given that the core of defaultdict is implemented in C, this is probably negligible

merwok · January 23, 2024, 9:12pm

Ah, I missed that. In this case, I don’t see defaultdict being changed.
The true protocol to produce values for a missing key is __missing__ in a dict subclass; defaultdict is just one convenience when a default factory (function that doesn’t take a key) does what is needed. I don’t think having two ways of solving the problem would be good.

(Also they generally have a different code style: with defaultdict it’s just one line to have registry = defaultdict(list), but in your example you need to define a whole function run_query, so defining a class that contains that function is a natural step)

Kacarott · January 23, 2024, 9:44pm

Your point about code style is a good one. I think I tend to prefer functions over singleton style classes which are only used once, but there probably isn’t really a good reason for that.

I would argue that just as defaultdict is a convenience class for a common use case of __missing__ (ie. a factory function with no args), that the keyarg usage of it could also be a convenience for another common use case. But perhaps I am overestimating how common it is.

zware · January 23, 2024, 9:45pm

I’m definitely sympathetic to the request, I’ve wanted this myself several times in the past. However, I’m not sure there’s a good way to get there.

I don’t think adding keyarg=True will fly; it feels like a hack and also precludes passing keyarg as an initial member. That is, this is currently legal:

>>> from collections import defaultdict
>>> defaultdict(int, keyarg=True)
defaultdict(int, {'keyarg': True})

defaultdict.__init__ could inspect default_factory to determine whether it takes an argument, but that feels a bit too magical (and also causes problems with defaultdict(int), where you don’t want the key to be passed).

I think the only feasible option would be to add a new variant of defaultdict that passes the key to default_factory, but it’s not going to have a nice name

jamestwebber · January 23, 2024, 9:51pm

It sounds potentially useful–I suspect I’d use it, although I can’t think of specific examples when I would. But it feels conceptually different from the existing defaultdict to me, and should be a separate class^[1].

A proof-of-concept that people could install for themselves would be nice to play with. Maybe it is a common need, and a usable package might be the first step toward incorporation in the stdlib.

no idea what to name it though! ↩︎

Kacarott · January 23, 2024, 9:55pm

Oof, I knew that this idea seemed too easy, I completely overlooked this.

Well in that case I agree with you guys, if anything it would have to be its own class. Thanks for hearing out the idea anyway. Maybe I will put together a small module to play with the idea.

zware · January 24, 2024, 12:15am

Actually, here’s a thought that I don’t completely dislike^[1] in a partial Python implementation:

class defaultdict(dict):
    def __init__(
        self,
        default_factory: Optional[Callable] = None,
        default_factory_key_argument: Optional[str] = None,
        /,
        *args,
        **kwargs,
    ):
        self.default_factory = default_factory
        if not isinstance(default_factory_key_argument, (str, type(None))):
            args = (default_factory_key_argument, *args)
        self.default_factory_key_argument = default_factory_key_argument
        super().__init__(*args, **kwargs)
        ...

    def __missing__(self, key):
        if self.default_factory is not None:
            factory_args = {}
            if self.default_factory_key_argument is not None:
                factory_args[self.default_factory_key_argument] = key
            default = self.default_factory(**factory_args)
            self[key] = default
            return default
        raise KeyError(key)

The defaultdict call in the original example would look like defaultdict(run_query, 'tag') with this implementation, which skirts backwards compatibility concerns because that call currently raises a ValueError.

though I still dislike it quite a bit ↩︎

ntessore · January 24, 2024, 8:30am

I would have liked this functionality several times in the past. Another option could be an alternative constructor, e.g. defaultdict.with_key(factory, /[, ...]), which sets some internal flag for passing the key.

yoavdw · January 24, 2024, 10:44pm

This sounds really useful, and to me conceptually similar to PEP 712: Adding a “converter” parameter to dataclasses.field (though not the same since converter applies to all values).

It’s like the difference between dataclass.field’s default_factory and converter: One sets a default value from a 0-argument callable, and the other uses converts the initial value using a single-argument callable.

So for naming, I think converterdict or converter_defaultdict may fit.

Daverball · January 25, 2024, 7:12am

I don’t think the name fits since converter converts the value you pass in, whereas in this case you would “convert” the key to a value, which also only happens if you do __getitem__ without the key being first present. So the conversion isn’t really the primary function. So I don’t see a whole lot of real similarities with PEP712.

jamestwebber · January 25, 2024, 3:33pm

I’ve been trying to figure out why this structure feels off to me. I think it’s the implied relationship between keys and values. dicts and defaultdicts don’t suggest any relationship: they’re arbitrary mappings from whatever key you need, to values.

This dictionary is sneakily typed: the key needs the right type to be valid for the constructor function, and the values are typed by the output. But there’s a twist, which is that you’re free to use other types as keys/values on construction and assignment. But you can break things if you do some common dictionary operations with the wrong types. defaultdict implies a type on the values, based on the factory. But not on the keys.

Calling this thing FuncDict as a placeholder:

# d is implicitly typed dict[int, int] based on the key function
# but it starts with a [str, list[str]] item
d = FuncDict(lambda k: k // 2, {'a': ["list", "of", "stuff"]})
d['b'] = d[4] + d[6]  # this will work!
d['c'] += d[7]  # but what about this?

Kacarott · January 25, 2024, 4:34pm

This seems to me like it applies just as much to defaultdict though.

# d is implicitly typed dict[T, int] based on the default function
# but it starts with a [str, list[str]] item
d = defaultdict(lambda: 2, {'a': ["list", "of", "stuff"]})
d['b'] = d[4] + d[6]  # this will work!
d['c'] += d[7]  # but what about this?

In both cases the type nastiness is coming from the mixing ““improper”” types into the object from the start.

Daverball · January 25, 2024, 4:41pm

Either of those improper uses would be caught by a type checker so I don’t see the problem:

The other case would be detected just the same, as long as the annotations on __init__ are correct.

Other type checkers may allow the initial statement and infer defaultdict[str, int | list[str]] but then you would get errors for the += case, since it’s only valid for one of the union members.

jamestwebber · January 25, 2024, 5:01pm

Sure, the difference is that defaultdict never specifies the type of the key, and d['c'] += d[7] works fine, whereas for FuncDict it’s a type error.

I agree that type checkers could figure this out, but I’d rather make the two cases explicitly different by construction.

I should have been more clear: my post was about trying to figure out why it doesn’t feel like a defaultdict to me. It’s a different structure–maybe useful enough to add to collections, but I wouldn’t want it as an option for defaultdict.

merwok · January 25, 2024, 5:12pm

Previous proposition for this sub-thread: PEP 455 – Adding a key-transforming dictionary to collections | peps.python.org

yoavdw · January 25, 2024, 6:19pm

Is that talking about the same thing? This proposal is about the default value being based on the key, not the key being converted to a different key