PEP-603: Adding a frozenmap type to collections

Formatting broken?
Finally off my phone and looking at an example in two browsers it seems the formating was lost somewhere. I think the formatting on one of your examples above should be:

numbers = frozenmap((i, i ** 2) for i in range(1_000_000))

with numbers.mutating() as copy:
    for i in numbers:
        if not (numbers[i] % 997):
            del copy[i]

    numbers_without_997_multiples = frozenmap(copy)

    # at this point, *numbers* still has 1_000_000 key/values, and
    # *numbers_without_997_multiples* is a copy of *numbers* without
    # values that are multiples of 997.

    for i in numbers:
       if not (numbers[i] % 593):
            del copy[i]
    
    numbers_without_593_multiples = frozenmap(copy)
    
    print(copy[10])  # will print 100.

print(copy[10]) # This will throw a ValueError as copy
# has been closed when the “with” block
# was executed.

The value of this PEP can’t be really understated … can I ask whether this could be extended to becoming a standard method of “casting” such as might be seen in changing int type to a string type for all classes? Eg

frozen_somenumber = frozen (some number)

somefunction(frozen (variable))

I acknowledge it broadens the scope of the PEP but a common “frozen” progressively implemented keyword may be perceived as simpler in the long term.

OK, now I’m off my phone and on my laptop, I can express myself more clearly. In the with numbers.mutating() as copy: example, (which I have hopefully mentioned a fix for the formatting earlier), You can hack and add to copy as needed then applying frozenmap(copy) creates the new frozen map with efficiencies for data common to both numbers and numbers_without_593_multiples.

I was wondering if numbers_without_593_multiples could be created from numbers using a comprehension expression rather than the with statement and with the same efficiencies?
(`cos I like comprehensions :slight_smile:)

You know Oscar, that seems horribly like you are complaining that the

code is too fast and efficient. "Why would I want a fast constant time

operation when I could have a slow O(N) operation???"

wink

I believe that the idea is that the second mapping is a modification of

the first. We frequently do that with other immutable collections:

new_tuple = old_tuple[:5] + (2, 3, 4) + my_tuple(1:8)

and especially strings. I don’t think I like this “mutating” method, for

reasons I shall explain in another post, but the idea of copy-with-

variation seems very useful.

All numeric types in the standard library are already frozen
(immutable), so that would be a no-op.

See PEP 351.

Immutable dicts have been requested and implemented many times.
PEP 416
describes some of the many third-party implementations.

Here is one which doesn’t seem to be listed in the PEP (perhaps
it is too new?):

Another implementation not found in the PEP is a port of immutable
datastructures from Clojure, Funktown.

Discussions on Stackoverflow:

Another discussion on python-list
(note that I no longer hold the opinions expressed in that thread).

A very old PEP proposed a freeze protocol
which would allow freezing dicts.

A couple of articles justifying why immutable dicts would be useful:

https://www.cs.toronto.edu/~tijmen/programming/immutableDictionaries.html

I’m not sure whether the specific implementation and API of this
frozenmap is a good one, but some kind of immutable, hashable mapping
seems to be commonly requested and commonly re-invented.

Just a few observations on the PEP.

That’s not what dict accepts.

I think it might be simpler to simply say that frozenmap duplicates the

dict constructor API.

Why the limitation to a single key and value at a time?

Given that dicts have an “update” method, perhaps “updated” would be a

better name for the method, with a similar API:

frozenmap.updated([obj,] \*\*kwargs)

where obj (if present) is either a mapping with a keys() method, or

an iterable of (key, value) pairs.

Likewise for the “excluding” method – why on a single key at a time?

Is it intentional that the union method accepts None as the first

positional argument? I think that should be an error.

Immutable dicts have been requested many times and it’s not hard to see situations where they could be used. The PEP emphasises though that this is not an immutable “dict” because it differs in some ways. I haven’t seen a request for something with these different characteristics. Also the PEP defines new methods/types that don’t exist for dict: I haven’t seen requests for those either.

An immutable dict on its own would seem a simpler proposal than the PEP. The PEP might be better but I would say that I haven’t seen a user request for the proposed functionality as an alternative to an immutable dict.

Oscar

Yury may respond for himself, but I did have this discussion with him and the primary issue is that dict now retains insertion order and this one does not (at least not without sacrificing its performance benefits). So we can’t provide the equivalent of the current dict, though the 3.6 and earlier one would have been fine (note that 3.6 did preserve insertion order, but it was not guaranteed as part of the dict specification).

Otherwise, despite lookup technically not being O(1) (it’s logN, where the base is 32 which makes it close enough to O(1) for the first few billion or so elements), it probably could act as an equivalent for dict. I too wish it could be, but that ship sailed when we guaranteed insertion order.

1 Like

(I’m a new user on discuss… consequently I only get two links… sorry, you will have to go search for them!)

Please add a question/answer pair for “How difficult will this be for other implementations (PyPy, CircuitPython, MicroPython, IronPython, Jython)?”

Jython would use the existing and substantial persistent collection work in the JVM space. In particular, kotlinx.collections.immutable would be a good starting point. Interestingly, it also supports insertion ordering! No need for frozenmap, we can have frozendict in Python. Read on if you are interested.

Per https://github.com/Kotlin/kotlinx.collections.immutable/issues/6#issuecomment-239132434:

@ilya-g, yes CHAMP can be used to implement hash-sets or hash-maps that preserve element insertion order. I implemented such a data structure last year for another project and now started to back port that data structure to the github repo usethesource/capsule (removed link - jimbaker). You can expect the implementation to appear in the codebase by September [2016].

It was in fact in fact implemented; the implementation for this code, in Kotlin, can be found in the github repo for kotlinx.collections.immutable, specifically persistentOrderedMap

The algorithm is documented in papers/presentations that are linked from the github repo usethesource/capsule, especially his PhD thesis. It is possible that this algorithm is too tightly tied to the JVM, eg code generation, but worthwhile to explore further. (I think the codegen is more about layout/access, we do similar things in Jython for Python arrays, but I have not yet read closely this dissertation or the Kotlin code.)

It would be cool if CPython would port code from the JVM space. Return the favor? :wink: Regardless, very much doable.

1 Like

Yes, freezing a dict has come up many times before. It is such an obvious thing to think about.

This PEP-603 is somewhat different and unprecedented. It proposes a complete new mapping type base on HAMT and that is optimized for fast copies at the expense of lookups. It is profoundly different. As far as I can tell, this function has never been requested (let alone imagined) in Python’s history.

If all we want is to freeze a dict, there are a lot of simple ways to do that. But the PEP is about something different – it is algorithm focused. Usually, we don’t do that. We offer core APIs when thery are essential and talk very little about the implementation choice. We don’t have module littered with every interesting data structure we can think of (treaps, skiplists, pairing heaps, HAMT, etc. That is left for PyPI.

I would really like to see multiple cases of real-world code that would be substantially improved by having this in the standard library.