PEP 603: Adding a frozenmap type to collections

+1 on adding memory usage benchmarks comparing at least dict and frozenmap. ChainMap would also be useful though and not require much additional effort.

Edit: Working on this, I’ll upload the gist for it later.

I believe it’s the latter.

So it isn’t explicitly tied to dicts and provides a way to quack enough like a mapping type to be consumable into a frozenmap instance. It also means dict is automatically supported even without explicit support.

@yselivanov I created a benchmark that compares the construction speed and memory usage between dict, ChainMap, and frozenmap. Feel free to include it anywhere and modify it in any manner: Construction time benchmark for frozenmap (immutables.Map) · GitHub.

I mostly did it to sate my own curiosity, but figured that you might get some use out of it for the PEP.

Edit: The benchmark I created uses the default ChainMap and is only setting values to the surface level mapping. A benchmark which writes to deeper chains while providing a fair comparison (as @cben was asking for) would require some additional complexity. But, this benchmark still provides some useful comparative information between the three container types.

3 Likes

From MicroPython’s point of view this is a welcome addition. MicroPython already has fixed/frozen dicts internally, used for built-in modules and classes (so you’re unable to edit the members of a built-in module/class). In such a case the dict can go in ROM rather than take up RAM.

Allowing the user to specify when a dict is fixed/frozen/persistent would be very handy because then the compiler could put that data structure in ROM.

(Probably MicroPython would not implement it using a HAMT in order to keep it minimal.)

3 Likes

I think it should be pretty straightforward to implement all the necessary infra without importing typing. I can help with this after the initial implementation lands. Also PEP 585 will simplify this a lot if accepted.

1 Like

It’s more that type checkers need to handle it. Currently our type checker for VS and VS Code understands subclasses of typing.Generic, which means if there is an annotation class that includes that we will handle it correctly (because we specialise the entire typing module). But if FrozenMap becomes its own generic type definition then we have to manually add it to our type checker before support will appear.

PEP 585 would be great, though has the same problem that type checkers cannot infer never-before-seen types based solely on source code. Using meta-type-annotations in typing makes things future proof. (Though in this case, FrozenMap will almost certainly not have a pure-Python equivalent that we can analyse anyway, so it’ll require all type checkers to release updates for it regardless.)

Mypy (and many other type checkers) don’t use the stlib sources and instead uses typeshed stubs. And indeed, for C files, stub files is the only option.

Yet another implementation: https://github.com/isi-vista/immutablecollections

(I’m the author, not that there’s any meaningful creativity involved in implementing it)

1 Like

Instead of creating a whole new data structure with different and surprising properties, I propose that we expose the existing mappingproxy() wrapper, perhaps renaming it to something more beautiful and perhaps imbuing it with additional functionality.

This will be much easier to use than a data structure that is frozen from the outset since it will allow full use of the existing dict API during the data accumulation phase.

It will work with existing hard-wired language syntax such as dict literals and dict comprehensions. It will work great with existing tools that already return dictionaries. Also, it can be used with existing dict variants such as defaultdict, vars(Namespace), os.environ, etc.

d1 = mapping_proxy({'queen': 9, 'rook': 5})     # literals

d2 = mapping_proxy({x: f(x) for x in data})     # comprehensions

# example with a defaultdict
d3 = defaultdict(list)          # accumulation phase
for x in somedata:
     if somepred(x):
           k = feature(x)
          d[k].append(x)
d3.default_factory = None
d3 = mapping_proxy(d3)          # become frozen without rebuilding from scratch
3 Likes

(post deleted by author)

I think that Raymond’s suggestion of exposing the mappingproxy wrapper
(with or without a new name) is an excellent idea worth considering, but
there is one pretty critical difference between the proxy and a “real”
frozendict: if you have a reference to the underlying dictionary, you
can mutate the proxy.

>>> mappingproxy = type(vars(object))
>>> d = {}
>>> e = mappingproxy(d)  # a frozen, immutable dict, right?
>>> d[None] = "Surprise!"
>>> e
mappingproxy({None: 'Surprise!'})

I fear that this will become a “Gotcha” and a FAQ “Why can you mutate
frozen dicts?”.

Additionally, the proxies aren’t hashable, so that would need to change
if we are to support the “dict as key” use-case (and I think we should).

Still, we already have mappingproxy, so this is a low-impact change that
lets us dip our toe in the water, without committing to a whole new
data type. Expose the mappingproxy, make it hashable, and see if that
solves the use-cases for frozendict.

(post deleted by author)

Marco, the results are expected as most of the proposed frozenmap methods are O(log N) and not O(1). The core advantage of frozenmap is that producing a modified version of your frozendict is an O(N) operation, but for the proposed frozenmap it is still O(log N); where log N is actually log32 N.

To all in this thread: I’ll update the PEP eventually and address all comments/questions in that update.

5 Likes

I like them and it’s no problem to add support for them, but I want to wait until there’s resolution on PEP 584 acceptance.

1 Like

For clarification on this, is it O(log32 N) because each node in the data structure contains 32 bits of available space for pointers to reference a child node within a bitmap? Additionally, would it be feasible for the user to customize the amount of bits per node?

I don’t suspect this would make it into the initial PEP and wouldn’t be added to the API without significant user demand, but it seems like it would be potentially useful to be able to customize the amount depending on the specific needs of the user (such as 16 or 64). From my understanding, increasing the bits per node would improve the lookup speed and decrease the copying speed, while reducing the bits per node would do the opposite. These effects would become apparent as the number of items becomes larger.

I’m not claiming this has adequate practical value to justify adding; it was more of a thought experiment to better understand the HAMT implementation.

I have found myself wishing for it and probably said so on comp.lang.python at one time or another. My one reaction is I don’t care much about preserving insertion order, but would like to be able to traverse in key order, to find the next key larger or smaller than a given key, or to find the nearest key to something. These are traditional operations for balanced trees, like in Haskell’s Data.Map or in C++ std::map, SQL tables with B-tree indices, etc. Balanced trees lend themselves easily to persistent implementation. I’ve wished for this in Python when I was writing something that needed to schedule and unschedule a lot of cancellable timeouts. The Linux kernel used to do that with “timer wheels” but the “right” way to do it is with balanced trees, and the current GHC i/o manager uses balanced trees (maybe Data.Map) that way, I think. HAMT may be a little faster but it gives up this very useful functionality. I’d be surprised if the speed benefit is enough to matter much of the time.

I do like the proposal in general since it can make transactional operations in a concurrent program a lot simpler. Each client just has its own “copy” of the complete dataset, but those copies share structure so they don’t burn store the way true separate copies would

A great old book on the general topic of immutable data structures is “Purely Functional Data Structures” by Chris Okasaki, if anyone wants to check out the topic further.

Added: Issue 40411: frozen collection.Counter - Python tracker is a recent rejected suggestion I entered for being able to freeze collections.Counter so it could be used as a dict key. Frozenmap would handle that nicely if a Counter can be converted to a dictionary and then into a frozenmap.

I just came across this discussion / PEP-603 / PEP-416 after searching for this feature. I have a couple of dictionaries that are used to look up classes. Imagine a machine learning library where you can configure a pipeline via configuration files. You can define which preprocessing steps (classes) you want via string. Currently, those classes are looked up like this:

PreprocessingStepNames = Literal["scale", "crop", "shift", "rotate"]
lookup: Dict[PreprocessingStepNames, PreprocessingStep] = {
    "scale": Scale,
    "crop": Crop,
    "shift": Shift,
    "rotate": Rotate,
}

So I need to repeat every key.

I was hoping to be able to let mypy figure out the keys automatically by definining the dict as a FrozenDict / FrozenMap or similar.

There is no activity for several months in this PEP and Python 3.9 is out for a while now. Is this PEP rejected? Should the “Python-Version” be updated in the PEP?

1 Like

Sorry Martin, I don’t see the connection between frozen dicts and your
reported issue that you have to repeat the keys. In a frozen dict, you
would still have to repeat the keys. A frozen dict is an immutable
key:value mapping, it doesn’t infer the value from the key.

Can you explain further how frozen dicts would help your use-case?

1 Like

What are the current blockers for this?

I don’t recall it being discussed on the Python-Ideas mailing list
(although I could be wrong). If it was, what was the result? Was there
consensus that it is a good idea?

Did the core devs argue against it or for it, or simply disinterested?