PEP 603: Adding a frozenmap type to collections

First hash would be expensive (as you say, in addition to calculating hash(frozenset(d.items())), each underlying map also needs to be hashed to confirm it is immutable), but since a successful hash calculation indicates everything is immutable it would be OK to cache the result, making it O(1) after that. (the hash storage field would be excluded from both hashing and equality checks, so it can still be lazily calculated and stored as PEP 416 suggested for frozendict)

I don’t actually have a concrete use case, though (it was just an idle thought), so it might be better left out. The question just never came up before, since there weren’t any hashable mappings to be chained in the first place.

1 Like

The article at The case for immutable dictionaries; and the central misunderstanding of PEP 351 (linked earlier in the thread at its original, now 404, home) is genuinely interesting.

One valid point that it makes is that stronger static type analysis capabilities make explicit assertions about data structure immutability more beneficial. This is something worth bringing up in a revived proposal (even PEP 416 predates PEP 484 by a few years).

3 Likes

Nope, not possible, .maps is documented as a public, mutable list, meaning the ChainMap type can be mutable even if all mappings are not. This would require a new type, FrozenChainMap, but I’m my understanding that fills (almost) the same niche has HAMT? So I don’t think that’s worth aiming for.

3 Likes

A perfect example of this is “Segmentation fault”. We now use paging instead of segmentation, but the name wasn’t changed to “Paging fault”. So, I think this is fine.

2 Likes

Ah, I had completely forgotten about that. You’re right, not worth worrying about in that case.

Hi, considering that some time has passed without a new PEP draft being introduced, here is the draft I mentioned there.

I named it PEP 416-1, but I’m sure that won’t last (maybe it will be the new 416, or the new 603 but it would be weird, or a new number entirely). So I’m not sure whether we should continue discussing this here, open a new thread or whatever else.

Feel free to give some feedback !

2 Likes

PEP editor opinion: This should almost certainly be a new PEP, or possibly a rewrite of PEP 603 if the author of that PEP agrees to it. PEP 416 is ancient history and should not be changed.

7 Likes

This is what I currently use due to the missing feature:

from collections.abc import Hashable


type frozendict[K: Hashable, V: Hashable] = frozenset[tuple[K, V]]


def freeze_dict[K: Hashable, V: Hashable](d: dict[K, V]) -> frozendict[K, V]:
    return frozenset(d.items())


def unfreeze_dict[K: Hashable, V: Hashable](fd: frozendict[K, V]) -> dict[K, V]:
    d = dict(fd)
    assert len(d) == len(fd)
    return d


d = {1: 2, 3: 4}
fd = freeze_dict(d)
assert repr(d) == repr(unfreeze_dict(fd))
hash(fd)  # works

For my use case (core algorithm that is part of a complex AI system), these were my requirements, from high to low priority:

  1. Must be hashable and immutable
  2. Must work for type annotations
  3. Must be intuitive to use and understand, like dicts
  4. Must have at least the same restrictions that dicts have (e.g. disallowing duplicate keys)
  5. Must have a simple implementation and not be overengineered, to prevent bugs
  6. Must easily support future Python versions
  7. Where they’re used, must not be significantly slower than dicts

With these requirements, frozendict can be used in data structures that require hashable objects, and developer experience is as good as it can get. It would be great to have better time complexity, actual dict semantics, and a function like frozenset().

Over the years, I’ve often found myself needing to reimplement something like this. It’s time that we add frozendict to Python.

5 Likes

I think the assumptions for anybody reading OrderedDict is indeed that methods like move_to_end() have O(1) performance. While the exact implementation isn’t specified, algorithmic behavior is important to users

I’d like to chime in from a user’s point of view. I am the lead maintainer for the Vyper compiler, and having an immutable map is extremely useful for certain algorithms. During analysis, there are a lot of analyses which maintain a dict (of whatever) at every point in the program. We usually do this as copy-on-write (vyper/vyper/venom/analysis/liveness.py at d0e3e90e8ee6416fabb5df3f59dd6d17c7d1df04 · vyperlang/vyper · GitHub), and the performance is very sensitive to copy performance. Another example of such an analysis would be available expression analysis (I’d post a link, but as a new user, I am limited to 2 links per post)

I think the data structure is generally useful. Another example that comes to mind is a checkpointing dict – if you are implementing a kind of dict with journaling, with mutable hashmaps, at every checkpoint you need to either copy the current dict at checkpoint or rollback time. With a HAMT, it’s an O(1) reference copy.

Furthermore, I think the usefulness of the datastructure is already motivated by the fact that the standard library uses it already in contextvars.

The case for it shipping with CPython would be that it’s a generally useful data structure. For us, due to the risk of supply chain attacks, we prefer to keep dependencies as few as possible. While there is certainly the possibility of using a pypi package, e.g. immutables or pyrsistent each dependency adds some supply chain risk that we would prefer not to take on.

As for the naming, I think collections.immutabledict or collections.immutablemap signals the right thing – it’s a bit niche data structure, and it’s immutable (in the functional data structures sense) rather than frozen, which to me just signals that it’s blocked from being mutated.

2 Likes

Having finally read through this thread[1], I thought I’d at least chime in with my own thoughts/wishes

Overall, my own ideal scenario if I had a magic wand and decision rights would be that both a PEP 416 style frozendict was added as a builtin and a PEP 603 style HAMT based immutable not-a-dict mapping was added to collections.

They both serve different purposes and would both be useful in different scenarios, and I imagine having just the HAMT based PEP 603 frozenmap would lead to inevitable confusion (seen a couple of times throughout this thread) that it works just like dict when that is not at all what is optimized for.

Alyssa’s suggestion of the name SnapshotMap is my personal favorite suggestion I’ve seen for the bikeshed-naming question and would fit in well alongside ChainMap in the collections module.

In my mind, it immediately evokes what the mapping is good at, i.e. creating space- and time-efficient snapshots of the mapping at different states. The fact that it’s a mapping that’s frozen/immutable isn’t the actual useful, interesting part, it’s that you can easily create modified versions and then rollback those changes to a previous version when needed.

The most compelling argument I’ve seen for it being in the standard library is that the implementation already is, so why not[2] give it a public API and let everyone benefit from the great work that was done in implementing contextvars.

Similarly, the most compelling argument I’ve seen for adding a builtin frozendict is that it feels like the odd one out of the builtin collections, with

  • list: tuple
  • set: frozenset
  • dict: ???

Ideally it could reuse all of the implementation and optimizations that have gone into making dict efficient, keep all the same semantics as dict (e.g. O(1) lookup time, remembering insertion order, equality based on unordered keys and values), but just truly forbid anything that mutates it and add hashing support similar to how tuples work (i.e. hashable if all the values are hashable, raising a TypeError otherwise), then it seems like it would be a useful addition in some cases.

I personally don’t use frozensets very often, but there have been a couple of times where it’s fit a need perfectly and I’ve been very happy to have it, and I imagine that frozendicts would end up the the same


  1. And hopefully not being too annoying by adding on again to an old and very long thread ↩︎

  2. I know that is actually minimizing the additional maintenance and backwards-compatibility concerns of making something public, but I’ve got the magic wand to get exactly what I personally want ↩︎

6 Likes

Inspired by Add zero-copy conversion of `bytearray` to `bytes` by providing `__bytes__()` - #18 by storchaka, what I would love to have is a zero-copy dict.freeze method that constructs in O(1) time a frozendict instance, a hashable, picklable and immutable object that shares the same hash table with the original dict. This allows a dict to be efficiently made “gold” after it’s iteratively built.

And then one of the following two things can happen after dict.freeze:

  1. The original dict gets cleared by getting a new allocation of an empty hash table if its reference count is 1. Make a copy if the reference count isn’t 1.
  2. The original dict keeps a pointer to the frozendict such that if the dict ever gets mutated (via either Python or C API), the frozendict would then make a copy of the hash table and detaches.

To me option 1 would be good enough for my use cases, while option 2 offers more flexibility at the cost of an extra pointer per dict.

1 Like

Chiming in. Having a persistent map type is something I want as a python user and have seen as an omission. The term I’ve searched for for such a thing is frozendict as a natural counterpart to my mind of the useful and interesting frozenset. I don’t really have a strong conception of the distinction between dict and other notions of mapping in python that seems to have motivated the name frozenmap.

I implement theorem proving algorithms that often involve backtracking with some environment mapping in GitHub - philzook58/knuckledragger: A Low Barrier Proof Assistant . I also have had to do a moderate amount of caching to improve performance. When I port things from the functional programming world, the lack of a persistent map type is felt. Basically, I get by by constantly copying dicts every time I use them and this is probably fine but makes me uneasy, and that uneasiness increases my cognitive load. I think it is a slight headwind to writing certain kinds of mathematical algorithms in python and gives the impression that the language is somewhat inhospitable to them. There are some experiments I haven’t even done because the copying feels too egregious. Perhaps taking on a persistent map dependency is acceptable to me.

I also felt the need for a hashable dictionary type in this blog post A Python frozenset interpretation of Dependent Type Theory | Hey There Buddo!

4 Likes

From my standpoint as a scientist who often has to deal with (potentially large) dicts to be merged/updated and where accidental mutations are a huge concern, I think even if calling the hamt.c implementation a frozenmap ends up being the wrong term, having a hamt object exposed in the standard library, the same way there’s a heapq would be amazing and cut down on the burden of ensuring nothing breaks with yet another dependency. This is especially true since the implementation already exists in CPython and would only need the bindings to be exposed.