PEP 814: Add frozendict built-in type

frozendict implements the Mapping ABC and has almost the same interface as dict (differences with dict are listed in the dedicated section). dict is a superset of Mapping. The interface is defined by the Specification section.

frozendict has no __ior__() (a |= b) method, but implements __or__() (a | b). Python implements a |= b for frozendict. It’s the same as:

class MyClass:
    def __init__(self, value):
        self.value = value
    def __or__(self, other):
        return MyClass(self.value | other)

a = MyClass(1)
a |= 2
print(a.value)  # output: 3

frozendict[str, list[int]] is a valid frozendict. Why would it raise an exception? Or do you expect a linter to emit a warning/error on hash(x)?

It can be easily added after PEP 814 implementation. IMO the PEP doesn’t have to cover the full scope of all possible frozendict usages. typing.TypedDict was already discussed earlier.

3 Likes

I get understand the motivation to be consistent with the language and frozenset and friends. The language has evolved and we don’t really use += on a str or list anymore, and people are often confused or surprised when they see += on a tuple. The |= seems like it’s in the same boat.

I’m talking about in the context of mypy and other static analysis tools.

I agree. There’s a lot of typing leaning Python users now.

The language still provides for executing |= by using __or__ and an assignment (__ior__ is just the ability to optimise, it isn’t the only way the operation exists). If you’re proposing to remove this from the language, this is definitely not the right PEP or thread.

Victor’s point is that |= works naturally because of the language. The PEP doesn’t have to indicate this one way or the other - those who know how augmented assignments work will understand, those who don’t will learn something, and those who will evaluate the PEP are (hopefully) drawn from the first group.

4 Likes

But we use it on int. And the exact same mechanisms that make it work on int make it work on str, tuple and |= with frozendict. [1]

If you genuinely think this is a footgun, make a new thread. But be prepare for that to be a massive uphill battle.

In general, I would suggest you make sure you understand how and why the language works before making sweeping claims, especially when using language like “we don’t” implying you speak for the community when you may not be doing that.


  1. Note that list is not part of this of …list because it actually has a difference between a += b and a = a + b ↩︎

6 Likes

My apologies. My intent wasn’t to derail the discussion.

2 Likes

Speaking as a person who has implemented TypedDict features in type checkers: There’s enough specialized knowledge§ to get type system changes specified (and implemented in type checkers) that I’d advocate keeping any TypedFrozenDict out of the scope of the initial PEP.

§ There’s different specialized knowledge to get a built in type like frozendict landed in CPython

11 Likes

We submitted PEP 814 to the Steering Council! Thanks everyone for this constructive discussion! We tried to address all feedback in PEP 814. For example, a Type annotation section has been added.

36 Likes

I’m excited for this and have some clarifying questions for the PEP.

The PEP mentions:

Using an immutable mapping as a function parameter’s default value avoids the problem of mutable default values.

Could an example be given to make it more clear how and when frozendict should be used to obtain these benefits?

If I understand correctly, previously one could erroneously write:

def f(key, value, d={}):
    d[key] = value
    return d

f("a", 1) # {"a": 1}
f("b", 2) # {"a": 1, "b": 2}

And thus would instead have to write:

def f(key, value, d=None):
    if d is None:
       d = {}
    d[key] = value
    return d

f("a", 1) # {"a": 1}
f("b", 2) # {"b": 2}

As I understand, with frozendicts, the idea would be:

def f(key, value, d=frozendict()):
    d[key] = value # error
    return d

But this would fail for trying to update an immutable.

So the expected case is instead:

def f(key, value, d=frozendict()):
    d |= {key: value} # O(n)
    return d

f("a", 1) # {"a": 1} (frozendict)
f("b", 2) # {"b": 2} (frozendict)

With the caveat being that this and all future updates are O(n)?

And thus to avoid this, would the following need to be written instead?

def f(key, value, d=frozendict()):
    d = dict(d) # O(n)
    d[key] = value # not O(n)
    return d

f("a", 1) # {"a": 1} (dict)
f("b", 2) # {"b": 2} (dict)

Am I understanding the use case correctly to not have a way to replace the if d is None syntax without an O(n) slowdown?

1 Like

For now, it seems so.

This is the opposite issue - turning a frozendict → dict, but it’s a similar case.

So at some point, there may be a way - but this PEP is really focused on getting the feature into the language first :slight_smile:

1 Like

Yes but your example is not typical:

What is this function? What name would you give to it rather than f that would describe what it does? Something like set_key_in_dict_or_make_new_dict?

The function mutates a dict that was passed in but the argument to be mutated is optional and if you don’t pass it then it creates a different dict and returns that? Why does the function both mutate its argument and return the mutated result?

The more common case here is that you don’t really want to mutate the argument whether passed in or using the default value. Although you don’t intend to mutate it you don’t like risking the possibility that some unthinking code added in future might mutate it or that the function might return a reference that some other code mutates.

Seeing that the default value is immutable immediately reassures that obvious mistakes that could mutate it will not happen without needing to trace all of the rest of the code to verify that nothing mutates it or leaks a reference to it.

5 Likes

Correct.

The use case from the PEP is the if d is None one, but when the code already doesn’t mutate d. It’s more of a safety mechanism so you don’t accidently mutate d (which is typically a bug that’s hard to notice if you use a dict as the default value).

3 Likes

Yes but your example is not typical …

What is this function? …

Apologies for not specifying, the example was purely hypothetical to match what is typically seen for explaining how mutable arguments can go wrong.

The questions raised regardless are quite valid!

you don’t really want to mutate the argument whether passed in or using the default value.

Agree, hence it may be helpful to provide examples of how best to use frozendict for mutable defaults.

Kind of, but with one nuance - it is not frozendict’s job to do what dict does.

Object is either mutable or not and somehow you expect the object to be in a quantum state of being both at the same time.


The use case for it in defaults is the same as the one for tuple.

def foo(these_are_ok=('a', 'b')):
    # I need to extend once
    these_are_ok += ('c',)
    # Or if many times
    these_are_ok = list(these_are_ok)
    these_are_ok.append
    these_are_ok.append
    these_are_ok.append

The point here is to have a default which can not be modified.

If you need to extend it, you find the solution that serves your case best.

In short, you can draw examples and best practices from tuple default cases.


Now this is regarding convenience.
But as you have noted complexities, I suppose you are worried about performance as well.

On this I can not comment much as I haven’t investigated it.
All I know is that it hasn’t become an issue for tuple despite long practice of such use.

Same might not apply to frozendict.
But if does need addressing, it is unlikely to be worth breaking any semantics for it.

I think your input with real life use cases where you are concerned with performance would be very helpful in this regard.

As far as I have seen, there are ideas for optimizations, but I think all input is valuable to determine the order of importance.

1 Like

From the PEP:

Keys must be hashable and therefore immutable, but values can be mutable.

I think someone mentioned this above, but it’s worth repeating: hashable doesn’t imply immutable.

Using immutable values creates a hashable frozendict.

I don’t think immutable implies hashable either. Perhaps these two sentences should be re-written to avoid referring to mutability at all, something like:

Keys must be hashable, but values need not be. Using hashable values creates a hashable frozendict.

5 Likes

Are we at all worried about a name collision with frozendict · PyPI ?

I guess if someone imports it already they would override the new builtin?

Still: for things like toml we went out of our way to not overload an existing implementation. So devil’s advocate: Why not here?

1 Like

Clashing with tomllib would mean import tomllib (or whatever was chosen) would change behavior based on whether a package was installed.

I don’t see a way for frozendict({y: z}) to change behavior based on whether frozendict is installed. Within a scope, either you import frozendict, in which case it’s shadowed in py315+ throughout the scope, or you don’t, and it’s not.

2 Likes

Because we shouldn’t be worried about name collisions at all, really. Python is one of the few languages that equips the developer to deal with them themselves, so we should get to use the most-builtin-like name for builtins, even if other people previously chose builtin-like names for their builtin-like types.

In five-ten years time, people will be calling a weird builtin name a “wart”, rather than a considerate design. We should be designing for what people at that time will think, and apologising to those who have tried to claim obvious builtin names now.

21 Likes

hello -

I’d like to chime in with support for this proposal, and I invite everyone here to look at SQLAlchemy’s current frozendict implementation which can be seen in its cythonized version at
https://github.com/sqlalchemy/sqlalchemy/blob/83ded020c92609c3dd228abdc4d8db96d7a7e915/lib/sqlalchemy/util/_immutabledict_cy.py#L109
where we call it “immutabledict”. SQLAlchemy uses “immutabledict” all over the place to represent default sets of options for many functions and classes. It of course provides a great solution to the “dictionary as default kw argument” problem but we also have lots of cool constructor patterns we use to get immutabledicts built up from plain mappings (like a merge_with() method that receives any number of mappings and merges them in), here’s an example ( I would link them however there seems to be a limit on links):

    if execution_options:
        execution_options = util.EMPTY_DICT.merge_with(
            execution_options,
            {
                "_sa_orm_load_options": load_options,
            },
        )
    else:
        execution_options = {
            "_sa_orm_load_options": load_options,
        }

in particular the pattern we use having a global EMPTY_DICT that we can use as a default “empty mapping” default argument anywhere we want, then we can build it up with union / | / merge_with() is one of my favorite patterns in the whole codebase. I’ve paraphrased a common pattern we use in the example below, a class we have where we have a class level default empty dictionary; instances of the class are used as elements in a larger composition with other objects that also use the same dictionary attribute, but they support an instance level version of it. By building on “immutabledict” and using “union” or “merge_with” as needed, we achieve a huge savings on how many new dictionaries we need to create in practice, which has significant performance / memory savings at scale. Issues where these shared dictionaries leak details across different objects are nonexistent and this pattern is definitely one of my favorite coding patterns in SQLAlchemy. CompilerColumnElement is some atomic unit within a larger composition:

class CompilerColumnElement:

    __slots__ = ()

    _propagate_attrs = util.EMPTY_DICT

    ...

note the slots above. it means we can’t set _propagate_attrs at the instance level. How is this useful? Well CompilerColumnElement is only a building block inside of other kinds of objects, lots of which don’t use slots, so as we build up objects they do things that are effectively equivalent to this:

class SomeSQLThing:
    def __init__(self, *elements):
        self._propagate_attrs = util.EMPTY_DICT.merge_with(*[e._propagate_attrs for e in elements])    

So every CompilerColumnElement has that mapping already set up on it for free, no __init__ or new dict every time needed, then on other classes that need to do more with the attributes, it can use union / merge to build up new dictionaries. The merge_with() instruction is optimized to not waste new dictionaries if not needed:

>>> util.EMPTY_DICT.merge_with({}, {}) is util.EMPTY_DICT
True

So another class that needs to be able to populate its _propagate_attrs would do something like this:

class SomeOtherSQLThing:
    def __init__(self):
        # no dictionary construction time/memory overhead!
        self._propagate_attrs = util.EMPTY_DICT

    def add_option(self, new_token):
        # only if we need it!
        self._propagate_attrs = self._propagate_attrs | {"new_token": new_token}

Here’s a demo of the above three classes:

>>> cce = CompilerColumnElement()
>>> other_sql_thing = SomeOtherSQLThing()
>>> other_sql_thing._propagate_attrs
immutabledict({})
>>> other_sql_thing.add_option("some option")
>>> other_sql_thing._propagate_attrs
immutabledict({'new_token': 'some option'})
>>> some_sql_thing = SomeSQLThing(cce, other_sql_thing)
>>> some_sql_thing._propagate_attrs
immutabledict({'new_token': 'some option'})

So above we have three different objects, all of which refer to a local “propagate_attrs” dictionary. Suppose the above code happens 10000 times. In that case, the above code uses exactly 20001 “real” dictionaries; the global util.EMPTY_DICT dictionary, and then the 10000 dictionaries we created in each “add_option” call, then the merge_with() created a new dictionary because it received one that was populated (merge_with() could be further improved so that if only one of its dictionaries is non empty, it returns just that non-empty immutable dictionary, so that would save another 10K dictionaries). That is compared to the 30000 dictionaries that would be needed at the least to do the equivalent pattern with plain mutable dictionaries (noting it would be a non-starter to have CompilerColumnElement’s class level dict be mutable, and without having to add complex logic to have _propagate_attrs be None by default and provide a facade around that).

But that example assumes we called add_option() - this mutation of options case is actually for us relatively infrequent. So if add_options() was not called at all, then 10000 calls creating those three objects would still only require exactly one real dictionary, rather than the 30000 that would be needed to accomplish the above patterns with plain mutable dictionaries. Even if we say “well you could just have _propagate_attrs be None if you didnt need it” and wanted to deal with the extra complexity, the immutabledict pattern also works if some of the classes have fixed options at the class level like this:

class IHaveOptions:
    _propagate_attrs = util.immutabledict({"fixed_option": True})

being able to create these immutabledicts everywhere and never having to worry that incorrect code is going to cause leakages of data between classes / instances is a gigantic win. There’s no need for code to be careful about it, it just works.

so anyway, if “SQLAlchemy likes the idea and already uses it heavily” is an effective endorsement, then great! Otherwise, if “SQLAlchemy likes the idea and already uses it heavily” is a glaring red flag “oh no those SQLAlchemy goofs use this, forget it”, I apologize ! :slight_smile:

11 Likes

Done (commit), I updated the PEP to not link hashable values to immutable values/frozendict.

3 Likes