PEP 814: Add frozendict built-in type

I’d really like to see finally a frozendict implemented in the stdlib. It will be a dream finally come true. But maybe my dreams are too nerdy…

@csm10495 quoted a frozendict package on PyPI. I took a quick look and this library claims it’s faster than dict.

Is this true? If so, this will be true also for the Python frozendict?

Furthermore, it implements also some nice additional methods, like setdefault, set key, value, index and delete.

Is there a way now to create a new frozendict from an existing frozendict removing some keys? If not, a delete method will be nice.

This seems useful.

Maybe same as in [frozen]set, def union(self, *others) could be a good idea for [frozen]dict.

I’d like to have such a method or operator too, but that method or operator would apply to dict in general and not just frozendict, and therefore belongs to a different discussion.

1 Like

Making frozendict a subclass of dict is IMO a definitively bad idea. frozendict is immutable, while dict is mutable. The contrary has more sense apparently, apart the fact it’s not :wink:

The fact that json and company doesn’t support collections.abc bestiary is still a mistery to me.

In Python, every builtin can be overridden, so why it should be a problem?

What do you mean?

I forgot that discussion. Well, we can implement only delete for one key for now. It will be the “equivalent” of del d[key] (apart the fact the frozendict will be not changed, but a new one will be created instead)

2 Likes

That’s probably because instance checking of an ABC is very slow.

? Is this irony? ^^

I think that what you’re saying is that frozendict(a=1) | {"b": 2} | {"c": 3} creates 3 frozendicts, while only 2 are really needed. In general, you could have n maps, and only 2 frozendicts are really needed.

I suppose the trick is easy: create a dict from the initial frozendict, merge all the dicts in it and create a frozendict from the merge. You only need to create a dict and a final frozendict.

So I think your code can be simplified this way (untested):

def merge_with(
    self, *dicts: Mapping[_KT, _VT]
) -> immutabledict[_KT, _VT]:
    temp = dict(self)

    for d in dicts:
        temp.update(d)
    
    if temp:
        return immutabledict(temp)
    
    return util.EMPTY_DICT

? I really don’t understand what you mean. How can you save another 10k dicts?

I really don’t understand what you mean. How can you save another 10k dicts?

here’s the idea:

start with the easy operation, empty immutabledict merged with other empty mappings:

EMPTY_DICT = util.immutabledict({})

So the first operation, we are merging some other immutabledicts. suppose they are all empty as well (which in our code, means we started with EMPTY_DICT - we never call util.immutabledict({}) a second time):

a = EMPTY_DICT
b = EMPTY_DICT
c = EMPTY_DICT

new_dict = EMPTY_DICT.merge_with(a, b, c)

immutabledict iterates through the empty dictionaries, sees that they are all empty, and says, hey, we have nothing to merge. Let’s return ourselves:

> new_dict is EMPTY_DICT
True

so if the above EMPTY_DICT merge thing happened 10000 times, we created zero new dictionaries. literally nothing happened.

The next scenario, hey, let’s say we’re empty, and one of those dictionaries has something in it:

a = EMPTY_DICT
b = util.immutabledict({"hey": "some data"})
c = EMPTY_DICT

new_dict = EMPTY_DICT.merge_with(a, b, c)

merge_with() right now sees that dictionary “b” is non-empty. so it creates a new immutabledict, fills up the internal dictionary representation with the data from the left side (EMPTY_DICT in this case), then fills it up with the data from “b”. Here’s an older version of the code that uses plain python:

    def merge_with(
        self, *dicts: Optional[Mapping[_KT, _VT]]
    ) -> immutabledict[_KT, _VT]:
        new = None
        for d in dicts:
            if d:
                if new is None:
                    new = ImmutableDictBase.__new__(self.__class__)
                    dict.__init__(new, self)
                dict.update(new, d)
        if new is None:
            return self

        return new

So the above code called 10000 times creates 20000 dictionaries; b = util.immutabledict(...) plus the merge_with operation. We can see that the returned dictionary is new:

>>> new_dict is EMPTY_DICT
False   # obviously, because it's not empty
>>> new_dict is b
False  # also false
>>> new_dict == b
True   # but it has the identical contents

The idea for optimizing the above further is that across a, b, c, EMPTY_DICT, we still have exactly one dictionary that is non-empty. We should just return that:

>>> new_dict is EMPTY_DICT
False   # obviously, because it's not empty
>>> new_dict is b
True  # because it's identical
>>> new_dict == b
True   # but it has the identical contents

So with the above optimization, the code example creates only 10K dictionaries for 10K runs, rather than 20K. that is, it detects that the merge operation produces a dictionary that is identical to one of the dictionaries it already has, so it can return that without making a new one.

In other words:

  • There is only one instance of an empty immutable dict and that is pre-initialized. Defining a „new“ empty immutable dict just assigns the variable to the existing instance.
  • A merge with an empty immutable dict is a no-op.

That‘s indeed neat.

Ok, I got it now. I think you can do this optimization now only if the only one map is an immutabledict. If not, you have to convert it to a immutabledict.

I think you can change the SQLAlchemy code so, where merge_with is called, immutabledicts are used. I think it’s better to create them using the immutabledict(key1 = value1, ...) constructor when possible, so you will avoid to construct an intermediate dict.

Anyway, take a look to the code I posted above for merge_with. IMHO it’s dead simple and faster of some picoseconds. If you want, I can propose you a PR, but I have to create a Github account (yeah, I’m quite lazy).

I get it now. toml is an already existing package. So Python, to not break already existing code, named the Python package tomllib.

I think here the problem doesn’t exist. frozendict is not a package or a module, is a builtin type. So no code will be broken.

? I don’t think Python devs should apologize to anyone. Why?

Yes, as you said, people claimed frozendict, and by many years. And so many people tried to implement it. I wanted it for one project I worked on many years ago, so I did a fast Google search before choosing one. But, as you said, it’s the obvious name for it. Furthermore, no code will be broken. I really fail to see where’s the problem.

I don’t see why we need a merge_with method when we can already unpack dicts as kwargs into the dict/frozendict constructor:

frozendict(a=1, **{"b": 2}, **{"c": 3})

Right, I forgot it! I did a quick benchmark, and the unpacking is marginally slower (less than 8%). Since I did a really hard bench (merging 100 dicts with 1000 elements), I suppose in the real life the improvement is very little.

Anyway, I suppose it could be a “good to have” syntactic sugar. The reason is the same for the d1 | d2. You can also do {**d1, **d2}, but | operator was introduced because this “trick” is quite obscure for beginners (and indeed I forgot it, since when | was introduced by PEP 584, I never used it again :slight_smile: )

If you’re using unpacking, you can’t return the same object:

fd1 = frozendict()
fd2 = frozendict(a=1)
fd3 = frozendict(**fd1, **fd2)
fd4 = fd1 | fd2
print(fd3 is fd2) # False
print(fd4 is fd2) # True
3 Likes

About the merge_with() method and EMPTY_DICT, I implemented the obvious optimization on frozendict1 | frozendict2 when one frozendict is empty: return the non-empty frozendict rather than creating a copy. Extract of tests:

        fd = frozendict(x=1, y=2)
        self.assertIs(fd | frozendict(), fd)
        self.assertIs(fd | {}, fd)
        self.assertIs(frozendict() | fd, fd)
11 Likes

Good optimization. Perhaps the constructor can share the same optimization?

>>> fd1 = frozendict()
>>> fd2 = frozendict(a=1)
>>> fd4 = frozendict(fd2, **fd1)
>>> fd4 is fd2
False # True would be nice
>>> fd4 = frozendict(fd2)
>>> fd4 is fd2
False # True would be nice
>>> fd4 = fd2.copy()
>>> fd4 is fd2
True
3 Likes

as others have mentioned the main thing we want is to avoid creating additional dictionaries when we dont need to. that is, we could call:

a = frozendict(foo="bar")
b = frozendict()
c = frozendict()

Then this line would provide some way to return “a”, and not create any other dictionaries even temporarily:

new_dict = merge_frozendicts(frozendict(), a, b, c)
1 Like

Is there any reason why you created 2 methods: union and merge_with instead of just naming merge_with as union.

Apart from allowing None (and allowing more than 1 other), there doesn’t seem to be much difference between the 2.

set.union accepts many others.
I guess it would make sense for dict’s union as well?

1 Like

this is just historical from when we started as a more direct subclass of dict and didn’t want to change the signature of union(). I would not expect an official Python frozendict to use this name.

seems good yup, my main point is that when you are immutable /frozen, you can do things with these operations that you could not with a mutable collection.

1 Like

I played a bit with your shared code and I think it just makes sense.

This is already being done with tuple, so this is quite familiar:

() is ()    # True

And in case of dict, benefit is greater as dict operations are more expensive in general (e.g. compared to list).


Just wanted to inquire about API choices here.

After careful deliberation, the Python Steering Council is pleased to accept PEP 814 – Add frozendict built-in type.

The absence of an immutable dict counterpart has been a long-standing gap in Python and has been requested for a long time in different forms. We agree this is a clear net positive for the language. We are accepting this PEP with the following modifications:

  • Performance should not be framed as a goal or motivation. The value of frozendict is in its immutability semantics, not in being faster than dict. Any performance implications are not compelling enough to highlight.
  • The thread safety claims should be toned down. frozendict is only shallowly immutable: mutable values inside it can still be modified from other threads without synchronization. The rationale reads better when focused on hashability, safe defaults, and structural immutability rather than leaning on free threading.
  • The docs should clearly note that frozendict is not a dict subclass. We acknowledge that this should be evident, as the design mirrors set/frozenset, but dicts are much more pervasive and developers will naturally try to pass a frozendict where a concrete dict is required (e.g. as the globals argument to exec), leading to exceptions.
  • The docs should also be clear about the hashability semantics of frozendicts, with care taken to describe the requirement that both keys and values must be hashable for the frozendict to be hashable.
  • Stdlib adoption should happen with each module maintainer’s approval, not as a mechanical sweep.

Congratulations to Victor and Donghee! We look forward to seeing frozendict in Python 3.15!

— The Python Steering Council

72 Likes