Add value factory parameter to `dict.fromkeys` to enable use with mutable values

The built-in function dict.fromkeys(iterable, value=None, /)is not friendly to mutable values:

>>> d = dict.fromkeys(range(3), [])
>>> d
{0: [], 1: [], 2: []}
>>> d[0].append(7)
>>> d
{0: [7], 1: [7], 2: [7]}

What do you think of dict.fromkeys(iterable, value=None, value_factory=None, /), which accepts a factory for the value? The session above would look like this:

>>> d = dict.fromkeys(range(3), None, list)
>>> d
{0: [], 1: [], 2: []}
>>> d[0].append(7)
>>> d
{0: [7], 1: [], 2: []}
d = {x: [] for x in range(3)}
7 Likes

Should dict.fromkeys be deprecated?

No; it still serves a good purpose. For example, if you’re initializing a dictionary with a bunch of zeroes, it doesn’t matter that they’re all the same object. But when you DON’T want them to be the same object, you’re no longer in fromkeys’s domain, and it’s best to reach for something more flexible - such as a comprehension.

1 Like

I might use

d = dict(zip(range(3), map(list)))

if that weren’t artificially forbidden :frowning:

Not sure what you mean by “artificially”, and I’m not sure what map(list) ought to do. Can you elaborate?

1 Like

I mean map’s implementation would already naturally do what I want it to if it weren’t artificially forbidden to do it. We talked about it here:

2 Likes

I agree with pf_moore’s response there: this is an extremely unobvious way to spell this, and there are better alternatives.

It’s worth noting that explicit calls to map tend to compose poorly, quickly becoming hard to read. Comprehensions are almost always better for anything other than the very simple (and most common) case of mapping a predefined callable over a single iterable.

A dictionary with all zero values can be initialized with a comprehension too. Why do we want to keep dict.fromkeys if a comprehension addresses both cases and doesn’t leave the door open to a subtle bug?

2 Likes

I recall pytest 1.x back in 2006 using dict.fromkeys for lack of set

The api is simply ancient and its utility diminished over the years

It may predate comprehension and sets (please excuse me not checking it on mobile)

I consider it as something that keeps aged code running

1 Like

Because it’s there, it works just fine, and there’s no reason to remove it. Why do you want to break something that works just fine, simply because it isn’t suited to every possible task? Will you deprecate hammers because they’re not good at inserting screws?

FYI dict.fromkeys is reccomended by linters where it’s appropriate: unnecessary-dict-comprehension-for-iterable (C420) | Ruff

It’s also helpful in situations where you want to pass a callable to do this task.

1 Like

I already mentioned one reason: avoid a subtle bug. Another is general reduction of parts and thus complexity.

Deprecating is not breaking.

Let’s not escape into metaphors, which aren’t always isomorphisms.

So far, I haven’t gotten any concrete reason to keep the function in the long term. So I set up to find one by myself.

I suspected a good answer had to do with performance, because dict.fromkeys is written in C. So, I went ahead and measured it against the comprehension with the following code:

#!/usr/bin/env python3

import timeit

N = int(input(f"Samples: "))
K = int(input(f"Keys: "))

keys = range(K)

time_fromkeys = timeit.timeit(lambda: dict.fromkeys(keys, 0), number=N)

time_comprehension = timeit.timeit(lambda: {k: 0 for k in keys}, number=N)

print(f"Time in seconds: fromkeys={time_fromkeys} comprehension={time_comprehension}")

and an example execution (of many that yield similar results) is:

Samples: 100000

Keys: 1000

Time in seconds: fromkeys=2.100494457874447 comprehension=2.450188084039837

So, we see that the comprehension takes about 17% more time than dict.fromkeys. That’s pretty significant if one is using this a lot.

I’m filing that one under “reasons to keep dict.fromkeys”. I do like the simpler syntax, not gonna lie, and that’s why I created this post in the first place, but that’s more subjective. We can’t argue as much about performance!!!

2 Likes

You said:

“Keep” is not the opposite of “deprecate”. It is the opposite of “remove”. Asking “why do we want to keep this” is asking “why can we not remove it”, and the answer is that we don’t want to break things.

Deprecation is a half-way house to breaking things. It is saying “this will break in the future”. Note for contrast the camelCase names in the threading module, which were quite deliberately NOT deprecated: 16.2. threading — Higher-level threading interface — Python 2.7.18 documentation Now, maybe you can argue that dict.fromkeys wouldn’t have been needed if people had used comprehensions from the start, but I believe dict.fromkeys predates dict comprehensions by quite a few years. Deprecating or removing it NOW would cause breakage, which has to be justified by a lot more than merely “you can do this with a comprehension”.

I’m honestly not sure what people expect “deprecate” to mean, given that you’re far from the only person who thinks that deprecation doesn’t cause any issues.

I can at least tell you that for me “deprecation” means “controlled removal” or even “controlled replacement”, and that a critical aspect of it is that you don’t just break whoever or whatever depends on the functionality. Instead you give them control over the process to move towards a state in which they no longer use the deprecated functionality. Their system stays operating during this whole process. In order to make the process advance, you probably also need to provide incentives, either positive or negative.

Thanks for the concrete example.

That’s a lovely theory. Unfortunately, the fact is that this IS breaking things; the system won’t stay operating. You say that you “don’t just break whatever depends on” the thing you’re deprecating, but the entire point IS to break it.

Incentives. Well, let’s have a look at it. Suppose Python 3.16 deprecates dict.fromkeys, with plans to remove it in ten years. For nine of those years, the incentive is “keep using what works”. This is especially true if the recommended new way to do things was introduced in the same version that deprecated the old way, since the “new way” simply won’t work on older versions. Why should anyone change? All the incentives are pushing to stability, not churn. Churn is bad. So finally, in the very last year, suddenly there’s a reason to change, and it’s not the deprecation - it’s the imminent breakage.

Unless there is some extremely strong reason to change, most people won’t want to. Deprecating a perfectly-working API breaks people’s code for no reason.

I’m totally convinced that these are your opinions and that you won’t budge.

I’ve lived through big deprecations in large private organizations before and, while still getting massive resistance sometimes, they eventually get done successfully. Positive incentives can be leaderboards and recognition. Negative incentives is getting your team or even yourself exposed as the bottleneck. Breaking operations are prepared with rollbacks and sometimes they need to be rolled back, but eventually they succeed and the deprecation too.

Your view is very pessimistic but I don’t question the reality of your experiences. The obstacles along the way can be justified by the amazing feeling of deleting a bunch of code and/or ultimately saving a lot of money, energy and pain every single day.

Enjoy your week.

IMO it’s a qualitative, not just quantitative, difference between deprecating a feature in a popular language vs. a large organization.

The quantaitive difference is a deprecation of a well used feature (and this feature is recommended by linting tools) in Python is probably 1,000x to 1,000,000 more impactful than one large private organization, because there will be at least that many more organizations impacted.

And the qualitative difference is in a private organization you’re being paid to improve the organizations code base. But for a language you’re not being paid to effect this change and your bigger impact is on the users codebase, not yours. So the reasoning has to be more solid to deprecate a feature. is it causing you significant maintenance burden? Is it a footgun for users? Etc.

Not that language clean up isn’t important, but this isn’t even that, at least until there’s a general consensus that this feature shouldn’t be used, which is don’t see happening soon.

3 Likes

Well, defaultdict has this concept, so why not? Or maybe defaultdict could have a fromkeys static method.

Side note: -1 to fromkeys deprecation. Ton of codes will break for no reason. Maybe you skipped the Python tutorial:

Simple assignment in Python never copies data. When you assign a list to a variable, the variable refers to the existing list. Any changes you make to the list through one variable will be seen through all other variables that refer to it.

So, if you want to deprecate fromkeys for this “subtle bug”, I suppose you have to completely change the language and probably a lot of other programming languages.

Just for the sake of sticking to functional programming:

dict(zip(range(3), iter(list, None)))

which makes an unnecessary comparison for each iteration.

dict(zip(range(3), map(list, repeat(()))))

which calls list with a redundant empty tuple.

dict(zip(range(3), map(call, repeat(list))))

which looks slightly verbose.

Pick your poison. :slight_smile:

3 Likes