PEP-603: Adding a frozenmap type to collections

+1 on adding memory usage benchmarks comparing at least dict and frozenmap. ChainMap would also be useful though and not require much additional effort.

Edit: Working on this, I’ll upload the gist for it later.

I believe it’s the latter.

So it isn’t explicitly tied to dicts and provides a way to quack enough like a mapping type to be consumable into a frozenmap instance. It also means dict is automatically supported even without explicit support.

@yselivanov I created a benchmark that compares the construction speed and memory usage between dict, ChainMap, and frozenmap. Feel free to include it anywhere and modify it in any manner: https://gist.github.com/aeros/ec2678304b45245a78310780fd47ec19.

I mostly did it to sate my own curiosity, but figured that you might get some use out of it for the PEP.

Edit: The benchmark I created uses the default ChainMap and is only setting values to the surface level mapping. A benchmark which writes to deeper chains while providing a fair comparison (as @cben was asking for) would require some additional complexity. But, this benchmark still provides some useful comparative information between the three container types.

2 Likes

From MicroPython’s point of view this is a welcome addition. MicroPython already has fixed/frozen dicts internally, used for built-in modules and classes (so you’re unable to edit the members of a built-in module/class). In such a case the dict can go in ROM rather than take up RAM.

Allowing the user to specify when a dict is fixed/frozen/persistent would be very handy because then the compiler could put that data structure in ROM.

(Probably MicroPython would not implement it using a HAMT in order to keep it minimal.)

2 Likes

I think it should be pretty straightforward to implement all the necessary infra without importing typing. I can help with this after the initial implementation lands. Also PEP 585 will simplify this a lot if accepted.

1 Like

It’s more that type checkers need to handle it. Currently our type checker for VS and VS Code understands subclasses of typing.Generic, which means if there is an annotation class that includes that we will handle it correctly (because we specialise the entire typing module). But if FrozenMap becomes its own generic type definition then we have to manually add it to our type checker before support will appear.

PEP 585 would be great, though has the same problem that type checkers cannot infer never-before-seen types based solely on source code. Using meta-type-annotations in typing makes things future proof. (Though in this case, FrozenMap will almost certainly not have a pure-Python equivalent that we can analyse anyway, so it’ll require all type checkers to release updates for it regardless.)

Mypy (and many other type checkers) don’t use the stlib sources and instead uses typeshed stubs. And indeed, for C files, stub files is the only option.

Yet another implementation: https://github.com/isi-vista/immutablecollections

(I’m the author, not that there’s any meaningful creativity involved in implementing it)

1 Like

Instead of creating a whole new data structure with different and surprising properties, I propose that we expose the existing mappingproxy() wrapper, perhaps renaming it to something more beautiful and perhaps imbuing it with additional functionality.

This will be much easier to use than a data structure that is frozen from the outset since it will allow full use of the existing dict API during the data accumulation phase.

It will work with existing hard-wired language syntax such as dict literals and dict comprehensions. It will work great with existing tools that already return dictionaries. Also, it can be used with existing dict variants such as defaultdict, vars(Namespace), os.environ, etc.

d1 = mapping_proxy({'queen': 9, 'rook': 5})     # literals

d2 = mapping_proxy({x: f(x) for x in data})     # comprehensions

# example with a defaultdict
d3 = defaultdict(list)          # accumulation phase
for x in somedata:
     if somepred(x):
           k = feature(x)
          d[k].append(x)
d3.default_factory = None
d3 = mapping_proxy(d3)          # become frozen without rebuilding from scratch
2 Likes

I just created a version of mine of frozendict and I discovered that there’s a PEP discussing about a frozenmap.

There are some differences I’d like to discuss:

  1. the PEP propose to add it to collections module. IMHO for symmetry with set and frozenset, it should be added to builtins

  2. the PEP propose the introduction of frozenmap.including(key, value), frozenmap.excluding(key) and frozenmap.union(mapping=None, **kw) for creating new frozenmaps with new keys, updated keys or removed keys. On the contrary, my frozendict simply implements __add__() and __sub__(), so you can do, for example:

frozendict({"Sulla": "Marco", 2: 3}) + {"Sulla": "Marò", 4: 7, 5: 1}
# frozendict({'Sulla': 'Marò', 2: 3, 4: 7, 5: 1})

frozendict({"Sulla": "Marco", 2: 3}) - [2]
# frozendict({'Sulla': 'Marco'})
  1. The PEP propose a frozenmap.mutating() for returning a mutatable version of the frozenmap, that can be used in a context. What are the advantage against a simple dict(myfrozenmap)?

  2. The PEP does not want to preserve the insertion order, like dict does from CPython 3.6. Probably frozendict is much slower, since its implementation is in pure Python and it uses MappingProxyType, but at least the API is completely conformant to the dict API and insertion is preserved.

I think that Raymond’s suggestion of exposing the mappingproxy wrapper
(with or without a new name) is an excellent idea worth considering, but
there is one pretty critical difference between the proxy and a “real”
frozendict: if you have a reference to the underlying dictionary, you
can mutate the proxy.

>>> mappingproxy = type(vars(object))
>>> d = {}
>>> e = mappingproxy(d)  # a frozen, immutable dict, right?
>>> d[None] = "Surprise!"
>>> e
mappingproxy({None: 'Surprise!'})

I fear that this will become a “Gotcha” and a FAQ “Why can you mutate
frozen dicts?”.

Additionally, the proxies aren’t hashable, so that would need to change
if we are to support the “dict as key” use-case (and I think we should).

Still, we already have mappingproxy, so this is a low-impact change that
lets us dip our toe in the water, without committing to a whole new
data type. Expose the mappingproxy, make it hashable, and see if that
solves the use-cases for frozendict.

See my implementation, you have not access to the original dict that creates the MappingProxyType, and frozendict is hashable and can be used as dict key.

Anyway this is completely out of the scope of my questions. I repeat:

  1. IMHO frozenmap should be added to builtins
  2. IMHO instead of including(), excluding() and union(), frozenmap should simple implement __add__() and __sub__()
  3. I do not understand the usefulness of mutating()
  4. IMHO frozenmap should preserve the insertion order.

Well, I’ve done some benchmarks. IMHO, apart cloning, the result is not so enthusiastic comparing dict to frozenmap (immutables.Map). The source code of benchmark is here.

Marco, the results are expected as most of the proposed frozenmap methods are O(log N) and not O(1). The core advantage of frozenmap is that producing a modified version of your frozendict is an O(N) operation, but for the proposed frozenmap it is still O(log N); where log N is actually log32 N.

To all in this thread: I’ll update the PEP eventually and address all comments/questions in that update.

4 Likes

@yselivanov ok, but what do you think about __add__ and __sub__? IMHO is much more simple and similar to what other immutables do, like tuples and strings.

I like them and it’s no problem to add support for them, but I want to wait until there’s resolution on PEP 584 acceptance.

1 Like

For clarification on this, is it O(log32 N) because each node in the data structure contains 32 bits of available space for pointers to reference a child node within a bitmap? Additionally, would it be feasible for the user to customize the amount of bits per node?

I don’t suspect this would make it into the initial PEP and wouldn’t be added to the API without significant user demand, but it seems like it would be potentially useful to be able to customize the amount depending on the specific needs of the user (such as 16 or 64). From my understanding, increasing the bits per node would improve the lookup speed and decrease the copying speed, while reducing the bits per node would do the opposite. These effects would become apparent as the number of items becomes larger.

I’m not claiming this has adequate practical value to justify adding; it was more of a thought experiment to better understand the HAMT implementation.