Formatting broken?
Finally off my phone and looking at an example in two browsers it seems the formating was lost somewhere. I think the formatting on one of your examples above should be:
numbers = frozenmap((i, i ** 2) for i in range(1_000_000))
with numbers.mutating() as copy:
for i in numbers:
if not (numbers[i] % 997):
del copy[i]
numbers_without_997_multiples = frozenmap(copy)
# at this point, *numbers* still has 1_000_000 key/values, and
# *numbers_without_997_multiples* is a copy of *numbers* without
# values that are multiples of 997.
for i in numbers:
if not (numbers[i] % 593):
del copy[i]
numbers_without_593_multiples = frozenmap(copy)
print(copy[10]) # will print 100.
print(copy[10]) # This will throw a ValueError as copy
# has been closed when the âwithâ block
# was executed.
The value of this PEP canât be really understated ⌠can I ask whether this could be extended to becoming a standard method of âcastingâ such as might be seen in changing int type to a string type for all classes? Eg
frozen_somenumber = frozen (some number)
somefunction(frozen (variable))
I acknowledge it broadens the scope of the PEP but a common âfrozenâ progressively implemented keyword may be perceived as simpler in the long term.
OK, now Iâm off my phone and on my laptop, I can express myself more clearly. In the with numbers.mutating() as copy: example, (which I have hopefully mentioned a fix for the formatting earlier), You can hack and add to copy as needed then applying frozenmap(copy) creates the new frozen map with efficiencies for data common to both numbers and numbers_without_593_multiples.
I was wondering if numbers_without_593_multiples could be created from numbers using a comprehension expression rather than the with statement and with the same efficiencies?
(`cos I like comprehensions )
Iâm not sure whether the specific implementation and API of this
frozenmap is a good one, but some kind of immutable, hashable mapping
seems to be commonly requested and commonly re-invented.
Immutable dicts have been requested many times and itâs not hard to see situations where they could be used. The PEP emphasises though that this is not an immutable âdictâ because it differs in some ways. I havenât seen a request for something with these different characteristics. Also the PEP defines new methods/types that donât exist for dict: I havenât seen requests for those either.
An immutable dict on its own would seem a simpler proposal than the PEP. The PEP might be better but I would say that I havenât seen a user request for the proposed functionality as an alternative to an immutable dict.
Yury may respond for himself, but I did have this discussion with him and the primary issue is that dict now retains insertion order and this one does not (at least not without sacrificing its performance benefits). So we canât provide the equivalent of the current dict, though the 3.6 and earlier one would have been fine (note that 3.6 did preserve insertion order, but it was not guaranteed as part of the dict specification).
Otherwise, despite lookup technically not being O(1) (itâs logN, where the base is 32 which makes it close enough to O(1) for the first few billion or so elements), it probably could act as an equivalent for dict. I too wish it could be, but that ship sailed when we guaranteed insertion order.
(Iâm a new user on discuss⌠consequently I only get two links⌠sorry, you will have to go search for them!)
Please add a question/answer pair for âHow difficult will this be for other implementations (PyPy, CircuitPython, MicroPython, IronPython, Jython)?â
Jython would use the existing and substantial persistent collection work in the JVM space. In particular, kotlinx.collections.immutable would be a good starting point. Interestingly, it also supports insertion ordering! No need for frozenmap, we can have frozendict in Python. Read on if you are interested.
@ilya-g, yes CHAMP can be used to implement hash-sets or hash-maps that preserve element insertion order. I implemented such a data structure last year for another project and now started to back port that data structure to the github repo usethesource/capsule (removed link - jimbaker). You can expect the implementation to appear in the codebase by September [2016].
It was in fact in fact implemented; the implementation for this code, in Kotlin, can be found in the github repo for kotlinx.collections.immutable, specifically persistentOrderedMap
The algorithm is documented in papers/presentations that are linked from the github repo usethesource/capsule, especially his PhD thesis. It is possible that this algorithm is too tightly tied to the JVM, eg code generation, but worthwhile to explore further. (I think the codegen is more about layout/access, we do similar things in Jython for Python arrays, but I have not yet read closely this dissertation or the Kotlin code.)
It would be cool if CPython would port code from the JVM space. Return the favor? Regardless, very much doable.
Yes, freezing a dict has come up many times before. It is such an obvious thing to think about.
This PEP-603 is somewhat different and unprecedented. It proposes a complete new mapping type base on HAMT and that is optimized for fast copies at the expense of lookups. It is profoundly different. As far as I can tell, this function has never been requested (let alone imagined) in Pythonâs history.
If all we want is to freeze a dict, there are a lot of simple ways to do that. But the PEP is about something different â it is algorithm focused. Usually, we donât do that. We offer core APIs when thery are essential and talk very little about the implementation choice. We donât have module littered with every interesting data structure we can think of (treaps, skiplists, pairing heaps, HAMT, etc. That is left for PyPI.
I would really like to see multiple cases of real-world code that would be substantially improved by having this in the standard library.
Is there any particular reason as to why this hasnât already been implemented? Regardless of whether or not this PEP is accepted, the demand for a standard frozendict seems to be rather unanimous.
This has been on PyPI as âimmutablesâ since December 2018, so perhaps some download and usage statistics would provide some useful insight. Testimonies from users of the package would also be helpful in determining some of the pragmatic applications.
IIRC, previous efforts stalled to due lack of real interest and need. Once we had frozenset(), it was obvious to think of frozendict(), but it didnât seem to be essential enough for anyone to push it through. There are some implementations on PyPI but they donât seem to have gained traction. Even frozenset() is in the rarely used category.
I think this is a great idea; canât wait to start using it.
Iâve got a couple of questionsâŚ
A persistent data structure is defined as a data structure that
preserves the previous version of the data when the data is modified.
Does it give you access to the previous versions explicitly, or is this implicit by holding onto references to the previous version?
frozenmap(collection) ⌠an object with an items() method that is expected to return
a series of key/value tuples
What is the use case for this? I think frozenmap(collection.items()) is sufficient here, though I think dict should be supported explicitly.
On the including API, I tend to agree with others that the use cases seem to be covered by union. Would it be possible and more efficient to provide an API that excluded some keys, whilst adding others in a single operation/version/âtransactionâ? Something like
On FrozenMapCopy, the description covers the API mechanics well, but I donât understand the motivation behind .close/context manager - whatâs the use case for this?
Hmm, if even frozenset hasnât seen that much usage across the user base and the more standard implementations of frozendict on PyPI havenât seen much usage, I can see why you might be hesitant about the amount of usage that frozenmap will receive. To get a rough idea for user demand, I decided to start looking into the download statistics via pypistats.org.
The package immutables has had a significant number of downloads since itâs creation: PyPI Download Stats. The recent popularity might be slightly inflated due to the PEP proposal, but the daily downloads from April 2019 to now have sustained in the several thousands so there does seem to be some demand for this.
For comparison, the package frozendict that simply implements a frozen dictionary without any of the other changes has similar popularity: https://pypistats.org/packages/frozendict. The other packages on PyPI were significantly less popular, so I think this is the main implementation for a standard frozendict.
Of course, the download metrics alone are not an entirely reliable indicator of a packageâs usage. They would be significantly more useful if downloads from unique IP addresses were tracked separately. But either way, theyâre still useful for a gathering a rough approximation of user demand.
The PEP text doesnât appear to clearly state if itâs the keys or the values or both that cannot be changed. Please make that clear.
On order
It took a long while to get dicts that remember the key order, I feel this proposal is incomplete if order is lost.
On naming
My gut tells me it should be frozendict.
.including() seems weird, in a sense that itâs not the word Iâd naturally call it. attrs uses .evolve() which is also an odd name. dataclasses.replace() is cumbersome.
perhaps the most common operation should be called .update(kwarg1=..., kwarg2=...)?
or binary operations could be used, like d1 + d2 and d1 + dict(a=42) and d1 - ("name",)?
On CoW mutable API
The idea is great, but API needs polish.
Contrast with immer.js (adapted to fake Python syntax): state = immer.produce(state, lambda draft: { draft["x"] = 2*state["y"]; }) which of course wonât work as lambdas cannot have statementsâŚ
The idea of a context manager is really nice, and perhaps it could be more concise:
earth = forzendict(radius=1e7, ...)
with earth as moon:
moon.diameter = 1.7e6
print(moon)
My understanding is that the proposal is for a mapping type that is better known in the functional programming community (Haskell, et al) where immutability is the norm. And in those contexts, I believe it is a useful data structure. As such, this seems like something that would be entirely reasonable as a PyPI module - whether it would get high usage or remain niche is less relevant in that context.
However, in this case, the implementation is already almost entirely present in core Python because it is used in the implementation of an existing feature (context variables). So that changes the balance somewhat - publishing on PyPI would mean duplicating (and keeping in sync with) the core implementation, whereas exposing the type in the stdlib would be mostly just a case of layering an API on top of the existing implementation.
While adding a class to the stdlib âbecause we canâ is not a good justification, the fact that the implementation already exists does, in my view, lower the bar for how much benefit needs to be proved in advance. It should also be noted that context variables are (apparently ) an existing use case - albeit one that weâve now provided so that users donât need to reimplement it.
Maybe a useful approach would be to expose the type as an experimental feature, with the API (the part of the implementation that has not yet been proven) subject to change based on usage and experience?
I will say, however, that Iâve never yet needed a frozen dictionary in my code. So any use I can (currently) imagine would just be experimenting with a novel and potentially useful data type.
@yselivanov the 2nd graph pep-0550-lookup_hamt.png is broken in https://www.python.org/dev/peps/pep-0603/.
Both graphs are broken in your post here, as are many code blocks, especially with numbers.mutating() ...
(maybe pandoc -f rst -t markdown can help)
Love the direction! Not sure this is the ideal answer, but happy itâs getting discussion.
I think the PEP could better explain its motivation by the following simple question:
If a mapping type is immutable, how does one actually populate it?
For tuples & frozensets, a straightforward & efficient answer is frequently âfrom an iteratorâ;
For mappings, itâs very common to need more complicated logic, frequently reading back previous value, e.g.:
d.setdefault(x, {}).setdefault(y, 0)
d[x][y] += 1
The PEP generally needs more worked scenarios of populating the mapping, specifically covering cases that proved annoying with dict â nested dictionaries, defaultdict.
I think the proposed interface is missing a .setdefault() equivalent â copy and add a k/v pair iff the key is not there yet?
Of course, one could populate a mutable dict, then freeze it. This is what all âjust a frozen dictâ implemetations force you to do. This requires O(N) extra work to make a frozen copy.
Wait, thatâs actually asymptotically faster, compared to N calls to including() each taking O(log N)!
But copying a dict becomes much slower AND memory-hungry when you actually need persistence â when you populate, take a snapshot, populate some more, take a snapshot⌠Use cases:
Youâre working with conceptually mutable state, but want to retain frequent snapshots.
Backtracking â passing down modified state, returning to current state after every call. (Example: chess move search)
Youâre building/changing the mapping by a âpipelineâ of small steps, and want to express each as pure function, without mutation.
Itâs more about convenience than performance?
Some parts of the interface, notably âCalling the union() once to add/replace N keysâ and .mutating(), feel like optimizations.
However, FrozenMapCopy get/set is still O(log N) which means itâs not strictly optimal, even âwhen the frozenmap in question contains thousands of key/value pairs and thereâs a need to update many of themâ => when doing sufficiently heavy processing, itâs asymptotically possible for conversion to dict and back to beat FrozenMapCopy (?)
Is mutating a FrozenMapCopyM times m[k] = v any faster than doing m = m.including(k, v)M times? Please clarify this in the PEP.
Please document the complexity of constructing from a large mapping. Is it O(N ¡ log N) or better?
However, as the PEP says, the log(N) is nearly constant in practice, and the exact performance misses the point of immutable data structures.
The point is convenience/elegance and safety, with good enough performance. (Of all operations, including copy. Under this view, itâs actually dict and list that are weird, theyâre optimized for fast access/mutation at expense of copy taking prohibitive O(N), which warped coding style so deeply that we imperative programmers no longer notice it ).
The point of .mutating() is nicely exposing the duality between immutable and mutable coding style, while sharing the same good-enough persistent representation underneath.
Memory is more compelling than speed
This is actually an aspect of performance that easily beats dict â with many snapshots much of the memory can be shared (similar to how Git stores many commits densely)
Can you add some benchmark showing memory usage, with sharing?
Ideally compare it also to a deep chain of ChainMap, which is current âpoor manâs persistent mapâ.
Could dict benefit from .including(k, v), .excluding(k) methods?
Part of the gain in this PEP is introducing easy âcopy with changesâ methods that were so far lacking on dict! Maybe those are worth adding?
For comparison, Ruby hashes are very similar to Python dicts but expose a rich set of non-mutating methods like h.merge(k1 => v1, k2 => v2), h.slice(k1, k2), h.select { ... }, h.reject { ... } etc.
(most of which have in-place mutating versions .merge!, .reject! etc.)
Code like options = options.merge(k => v) is extremely common in Ruby to avoid mutating your arguments.
[Strawman: Python collections did historically shy away from exposing operations they canât implement efficiently. But most collections do expose some expensive but handy operations: lst[100:200] and [1] + lst + [2] for lists; dict(d, k1=v1, k2=v2) or dict(d1, **d2) for merging dict as long as keys are strings; most set operations â and all these are widely used. Also dict.keys() / .values() / .items() were expensive in python 2 and nobody cared.]