PEP 814: Add frozendict built-in type

The pep doesn’t mention anything about requiring inner data structures to be immutable. The inner data is required to be hashable if you want to hash a frozen dict instance.

1 Like

frozendict(dict) has a complexity of O(n): items are copied (shallow copy).

keys must be hashable and so must be immutable, but values don’t have to be hashable and so can be mutable. frozendict(x=[1, 2, 3]) is a valid fronzendict for example.

>>> d=frozendict(x=[1, 2, 3])
>>> d['x'].append(4)
>>> d
frozendict({'x': [1, 2, 3, 4]})

If you require an immutable frozendict, you can call hash(frozendict) to ensure that values are immutable (hashable in practice).

8 Likes

Should tuple.copy be added as well?
It is the odd one out now…

1 Like

Please make a new topic/PEP about it, would love that.

1 Like

Ad https://peps.python.org/pep-0814/#construction

frozendict(collection) creates a mapping from the passed collection object. The passed collection object can be:

  • a dict,

  • another frozendict,

  • or an iterable of key/value tuples.

IMHO, when it comes to accepting a mapping, the constructor should mimic the behavior of the dict() constructor, i.e., accept any object which has an attribute keys being an argumentless callable (expected to return an iterable of the mapping’s keys) and whose type implements subscription (__getitem__() – making it possible to perform instance[key] operations).

1 Like

… and …

It’s one thing for keys to disappear from underneath you, but have an object change its type is something else entirely. If you were mutating a list and it suddenly became a tuple in the middle of a function call would you be surprised by that? And don’t forget that dict and frozendict don’t meet until object, so your guards in your code about expecting a dict at the top of your function suddenly turned into a lie.

Remember, this is a built-in type that’s being proposed. That means it will be affecting general Python practices. If we let dicts suddenly change their type in place to become a frozendict we then start encouraging swapping out an object’s type much more often than it is now (which I would argue is extremely rare, and for good reasons). I don’t think these sorts of tricks should be encouraged just to make creating a frozendict instance a bit cheaper.

To be a bit blunt: if there’s a way to make a dict instance a frozendict instance in-place via a method on the object I am -1 on the PEP.

22 Likes

If we execute fd = frozendict(d), can’t we make make d point to the hash table of fd and only copy it when we add, change or remove a key (but not when we clear the dict). That would give us O(1) construction without changing the class and would also work for d = dict(fd).

5 Likes

Thinking out loud, I think it would be possible to get O(1) copies without in-place shenanigans. dict.freeze would have to do two different things:

  1. Return the new frozen dictionary object, instead of freezing the current dictionary.
  2. Clear the items of the current dictionary.

Assuming frozen dictionaries have a similar internal layout, the items of the mutable dictionary could simply be moved into a frozen dictionary and then returned. The usage would be similar to what was proposed above:

mutable = build_complicated_dict()
shared_data = mutable.freeze()
# "mutable" is now empty, but that's fine since we're not planning
# on using it anymore.
start_serving_requests(shared_data)

I agree that this will not work well under concurrency, nor have I thought much about the implications here. I’m only bringing it up because my primary disappointment with frozenset in the past has been that you generally need to copy a set to make a frozen set… which probably isn’t noticeable in practice, but feels painful as a developer.

10 Likes

Question: Will

def f(**kwargs):print(type(kwargs))
f()

print <class 'dict'> or <class 'frozendict'>?

I see that the current implementation prints dict, but will that remain so?

Seems like it could be more efficient, but would also be a large backwards compatibility concern!

Since mutating your **kwargs has no real side-effects, I can only imagine that it is mutated in plenty of codebases which would now break when updgrading.

3 Likes
def foo(a=None, **kwds):
    if a is not None:
        kwds['a'] = a
    return bar(**kwds)

It would not only break the above, but the above is very useful - the benefit (convenience + perf benefit of not needing to copy the dict), IMO, is much greater than what would be gained by making it frozendict.

5 Likes

It is indeed a common pattern to mutate kwargs by popping arguments you can handle and then passing the rest on to another function (e.g., to a superclass constructor via super()). I think this would preclude making kwargs a frozendict.

3 Likes

Nit: objects can support hash() yet be mutable!
The formal contract does not relate hash() directly to mutability, only to equality:

An object is hashable if it has a hash value which never changes during its lifetime (…¹) and can be compared to other objects (…¹).
Hashable objects which compare equal must have the same hash value.

Mutable containers like lists, dicts, sets deliberately blocked hash() because they wanted to compare by content, which can change => their hash would need to change too :person_gesturing_no:.
But objects that compare by identity, are free to hash by identity too => the hash will stay stable :ok:.

And this is not a rare corner case — custom classes by default hash and compare by identity! Custom classes are by default mutable; even without explicit mutating methods, one can assign random obj.attr = ... from outside, and it takes non-trivial effort to guarantee immutability. Even if the class itself is immutable, it frequently allows initialization with some parameters which in turn could be mutable e.g. lists.

  • This might be unusually pedantic view of “mutable” for most code. But it does matter for discussions of shallow vs. deep copy, and cross-thread sharing!

¹ I omitted wording that it “needs” a __hash__() and __eq__() methods, to not confuse explicitly defining such methods vs. supporting them; inheriting their default implementations from object is fine.

3 Likes

Argh not sure how I got the concepts of shallow and deep copies so badly mixed up. It is now so clear to me in retrospect that shallow copies care only about pointers of the items and not the items themselves. Thanks again.

One difference not listed in the PEP according to the reference implementation is that while the augmented union operator of a dict supports any object that either implements the mapping protocol or is an iterable of key-value pairs, the augmented union operator of a frozendict supports only a dict or a frozendict:

>>> d = {'a': 1}
>>> d |= [('b', 2)]
>>> d
{'a': 1, 'b': 2}
>>> fd = frozendict({'a': 1})
>>> fd |= {'b': 2}
>>> fd
frozendict({'a': 1, 'b': 2})
>>> fd = frozendict({'a': 1})
>>> fd |= [('b', 2)]
Traceback (most recent call last):
  File "<python-input-51>", line 1, in <module>
    fd |= [('b', 2)]
TypeError: unsupported operand type(s) for |=: 'frozendict' and 'list'
>>>

Is it meant to be different, or is it simply the reference implementation that is missing the functionality?

Oh. You can also reproduce the issue with dict: dict |= list is supported, but dict = dict | list is not.

>>> d=dict({'a': 1}); d |= [('b', 2)]; d
{'a': 1, 'b': 2}

>>> d=dict({'a': 1}); d = d | [('b', 2)]; d
Traceback (most recent call last):
  ...
TypeError: unsupported operand type(s) for |: 'dict' and 'list'

frozendict doesn’t implement the update operator |=: fd1 |= fd2 works as fd1 = fd1 | fd2.

If we decide to support frozendict | list, we should do the same for dict | list. PEP 584 seems to specify this behavior: PEP 584 – Add Union Operators To dict | peps.python.org So I suggest to not change it (not support dict | list or frozendict | list).

5 Likes

But then it would be possible to do a_dict |= a_list but not a_frozendict |= a_list – which would also seem somewhat inconsistent…

Unless frozendict did provide a dedicated non-mutating (!) implementation of |= – just to make it more lenient about the second operand’s type than bare |. :thinking: But such a non-mutating implementation of __i*__() would be something unusual (even though technically possible), contradicting the official prescription that such methods should attempt to do the operation in-place (modifying self).

This is how I’d do it, yeah (a “copy-on-write” or “must rebucket” flag inside dict would do it). But it’s also an entirely internal optimization that doesn’t need to be in this PEP, provided that this PEP doesn’t guarantee that frozendict(d) is always O(N) and lock ourselves out of improving it later.[1]


  1. Which would be a silly thing to guarantee, but hot topics like this often lead to us guaranteeing silly things because it makes people stop complaining. ↩︎

1 Like

I find this very consistent: frozendict is like a dict, except for the mutable aspects such as inplace operators. In this context “consistent” does not mean that frozendict should be able to replace a dict everywhere, it’s perfectly normal that some operations are not available.

6 Likes

While it would be an optimization for the frozendict performance, it would in some sense penalize the dict’s performance in the way that, let’s say

# Setup phase
d = {} # a huge dict with lots of configs
backup = frozendict(d)  # blazingly fast

... # my game starts and I expect everything to be "fast"

d['username'] = new_name  # gotcha → superslow, gotta copy the config first

For the case where d is never changed after frozendict(d) is called, I agree it’s better.

But I would be very surprised if my expected O(1) operation suddenly was deterministic Ω(n).

Or did I misunderstand something?

1 Like

Dict updates are already formally unpredictable, since rebucketing can occur at any time. But yeah, there’s always a risk here.

Generally Python goes for usability over precise performance characteristics (feel free to use a different language if those few milliseconds matter more than writing the code quickly and conveniently). So rather than adding a hundred new APIs for users to learn and tweak to perfectly control line-by-line performance, we would make the majority case fast at the expense of the minority case (and spend the time to figure out which is which).

3 Likes