Dict constant folding, new frozendict type

Hello community! While Python isn’t a fast language, that’s no reason to skip useful optimizations that benefit everyone’s workload. I recently decided to pursue an idea I’ve had for some time: dict constant folding. The basic idea is to enable the same tuple/list/set constant folding optimization for dictionaries. Instead of constructing them each time, use a LOAD_CONST in safe contexts. To achieve this, I propose introducing a simple frozendict type.

:link: Demo fork: GitHub - Zaczero/cpython at dictfold


TL;DR

>>> def foo(bar): return {'a': 1, 'b': 2}.get(bar)
... 
>>> dis.dis(foo)
  1           RESUME                   0
              LOAD_CONST               0 ('a')
              LOAD_SMALL_INT           1
              LOAD_CONST               1 ('b')
              LOAD_SMALL_INT           2
              BUILD_MAP                2
              LOAD_ATTR                1 (get + NULL|self)
              LOAD_FAST_BORROW         0 (bar)
              CALL                     1
              RETURN_VALUE

becomes

>>> def foo(bar): return {'a': 1, 'b': 2}.get(bar)
... 
>>> dis.dis(foo)
  1           RESUME                   0
              LOAD_CONST               0 (frozendict({'a': 1, 'b': 2}))
              LOAD_ATTR                1 (get + NULL|self)
              LOAD_FAST_BORROW         0 (bar)
              CALL                     1
              RETURN_VALUE

dict constant folding:

During bytecode generation/optimization, apply a technique similar to existing tuple/list/set constant folding. In safe, side-effect-free contexts with provably constant dictionaries, replace their build instructions with a single LOAD_CONST. The following example is not constant and therefore not optimized:

>>> def foo(bar): return {'a': 1, 'b': bar}.get(bar)
... 
>>> dis.dis(foo)
  1           RESUME                   0
              LOAD_CONST               0 ('a')
              LOAD_SMALL_INT           1
              LOAD_CONST               1 ('b')
              LOAD_FAST_BORROW         0 (bar)
              BUILD_MAP                2
              LOAD_ATTR                1 (get + NULL|self)
              LOAD_FAST_BORROW         0 (bar)
              CALL                     1
              RETURN_VALUE

frozendict:

Initially, I wanted this to work without introducing a new type. It quickly became apparent that one would be needed. The amount of patching and workarounds required to support unhashable types in LOAD_CONST is simply too much, not to mention how fragile it would be against accidental mutations.

Beyond the optimization benefits, frozendict addresses a real gap in Python’s type system.

My draft looks like this:

class frozendict(object)
 |  frozendict() -> empty frozendict
 |  frozendict(mapping, /, **kwargs) -> immutable copy of mapping
 |  frozendict(iterable, /, **kwargs) -> from an iterable of (key, value) pairs
 |  frozendict(**kwargs) -> mapping built from keyword arguments
 |
 |  Return an immutable mapping containing the supplied key/value pairs. Keys
 |  and values must be hashable; TypeError is raised otherwise. The result
 |  preserves insertion order, provides the standard read-only mapping API, and
 |  is itself hashable.
 |
 |  Methods defined here:
 |
 |  ... truncated magic methods ...
 |
 |  get(...)
 |
 |  items(self, /)
 |
 |  keys(self, /)
 |
 |  values(self, /)

Like frozenset, frozendict can only be initialized and cannot be modified. It’s always hashable, which means all its keys and values must be hashable too. It’s conditionally hashable, based on whether all of its values are hashable (just like tuple). It supports the same constructors as dict and exposes the read-only methods of dict.

Later, it should include update utilities to make code like frozendict | dict possible. Maybe an .update(...) method returning an updated copy would be nice. For now, I didn’t bother.

I propose that frozendict should be a new builtin type. While I understand adding builtins is discouraged, in this case it seems like the right choice.

  1. Having a frozendict will logically complete this builtin pattern:
Type Mutable Immutable
Sequence list tuple
Set set frozenset
Mapping dict frozendict
  1. This type will also appear in .pyc files. It feels wrong for these to contain types that require importing additional modules. It will be marshalable under the } key, bumping the marshal version 5→6:
Type Mutable Immutable
Sequence [ ( )
Set < >
Mapping { }

Ref: cpython/Python/marshal.c at dictfold · Zaczero/cpython · GitHub

You should have a look at PEP 416 – Add a frozendict builtin type.

2 Likes

It doesn’t sound bad to me.
But…

and an example used:

{'a': 1, 'b': []}
1 Like

Hey! I’m vaguely aware of attempts to introduce the same/similar type into the Python ecosystem.

The core of the idea is to introduce dict const folding, and I actually succeeded in implementing it on the normal dict type. But the amount of “special case” handling to use a mutable type is unacceptable to me. It is difficult to maintain and fragile.

I also experimented with a hybrid solution, where _frozendict is an internal type, and .pyc files store it publicly as a dict. But it feels wrong and convoluted. Plus, anyone working with .pyc files would likely need to reimplement this internal type in some form (or special case the dict) instead of simply using the new type.

The introduction of a new frozendict type seems like the most idiomatic way if we want to enable dict const folding. Hopefully this concrete implementation will give it another chance.

Rather than (in effect) creating a new, competing, proposal for a frozendict type, you’d be better putting your support behind one of the existing proposals[1], and using your idea of dict constant folding as an example of a use case where having a frozen dict type would offer additional benefits.


  1. Whichever one works best with your constant folding approach ↩︎

5 Likes

I found PEP 603. Are there any others? I used PEP-Explorer find and filter Python Enhancement Proposals but I have no idea whether my searches were exhaustive.

From a reading, I found certain unwanted properties:

  • different performance characteristics (slower access[1])
  • “Iteration in frozenmap, unlike in dict, does not preserve the insertion order” could be a potential conflict issue as well

It seems like frozenmap is quite complex in its spec, whereas I originally envisioned frozendict to be closer to frozenset in simplicity – like a dict, just frozen.

What do you think I should do in such a case? Voice my opinion on modifying PEP 603 and wait? Or maybe voice my opinion and attempt the frozendict idea in parallel? frozenmap doesn’t seem to me like a good match for the dict constant folding optimization. I am very new to PEP stuff.

If you are new to frozendict trend in Python, PEP stuff might be quite far down the line so don’t worry too much about it if you want to make things happen and not only write a PEP which will most likely be rejected.

So the way I understand it:

  1. You have a proposal to do “dict constant folding”
  2. For that to happen, there needs to be a functional frozendict type
  3. There has been a lot of effort towards frozendict, indicating that this is most likely a bottleneck for your idea.

So your option are:

  1. Using your existing implementation (or any other) to prove that constant folding delivers benefit. What are preconditions (minimum performance threshold) for frozendict implementation that would make “dict constant folding” worthwhile? This could be one of valuable inputs that could potentially affect potential future implementation of frozendict. Then relax and wait until/when/if frozendict gets implemented while nudging ongoing attempts if they aren’t sufficient to extract the “dict constant folding” benefit (given you have proven that the benefit is significant enough to make effort/trouble - collected real life use cases where this would apply and benchmarked).
  2. Start working on frozendict in parallel to “dict constant folding” idea. This would be a bit more involved. I haven’t done any work here so don’t have much to suggest on main obstacles of “why this hasn’t happened yet”. But superficially, if community wants it and it hasn’t happened yet, then the “right” approach hasn’t been found yet - the one where desirable benefits are shining, the cost is minimised, all decisions are nailed down perfectly (e.g. naming and location) and all the best knowledge gathered so far incorporated (e.g. HAMT vs dict tradeoffs of memory and copy vs access performance - which one is better for frozendict? Why? Maybe some hybrid? Why? Maybe something else entirely?).

For (2), here are some starting points (will repeat already mentioned for completeness):

  1. PEP 416 – Add a frozendict builtin type | peps.python.org
  2. PEP 603 – Adding a frozenmap type to collections | peps.python.org
  3. Discussion: PEP 603: Adding a frozenmap type to collections
  4. from _testinternalcapi import hamt
  5. Also, some thoughts on comprehensions as expanding number of core types puts a strain on syntax. This is not necessarily related, but personally if I was to work on frozendict I would keep an eye on that as well:
2 Likes

How can “provably constant dictionaries” have “accidental mutations”?

1 Like

What I meant is that it would be possible to accidentally modify the data (for example, due to a subtle bug), whereas a new type that guards against mutations would always protect against it, making it easier to reason about the code and maintain it. Also, in user code, one may access foo.__code__.co_consts (returns a tuple of constants) and modify the mutable dict, possibly opening a new can of worms:

>>> def foo(bar): return {'a': 1, 'b': 2}.get(bar)
... 
>>> foo.__code__.co_consts
(frozendict({'a': 1, 'b': 2}),)

Note that tuples can store mutable values and can still be constant folded:

>>> dis.dis("(1, 2)[i]")
  0           RESUME                   0

  1           LOAD_CONST               0 ((1, 2))
              LOAD_NAME                0 (i)
              BINARY_SUBSCR
              RETURN_VALUE
>>> dis.dis("(1, [])[i]")
  0           RESUME                   0

  1           LOAD_CONST               0 (1)
              BUILD_LIST               0
              BUILD_TUPLE              2
              LOAD_NAME                0 (i)
              BINARY_SUBSCR
              RETURN_VALUE
1 Like

That’s a really good catch.

>>> a = ({},)
>>> hash(a)
Traceback (most recent call last):
  File "<python-input-18>", line 1, in <module>
    hash(a)
    ~~~~^^^
TypeError: unhashable type: 'dict'
>>> a = (0,)
>>> hash(a)
-8753497827991233192

So there is one more way of doing the dict constant folding: make dict hashable when all of its keys and values are hashable, copying tuple’s idea. I haven’t thought of that before, so I don’t know yet how good or bad it would be.

Edit1: It would still suffer from the potential issues described here.

Edit2: I dislike that because dict is fundamentally mutable, and once it’s hashable, it must remain that way, which implies it is immutable (frozen). Tuple does not suffer from this conflict because it’s immutable, so conditional hashing works well.

I’m not sure what you’re saying here. In your first example, there are no mutable values, and LOAD_CONST is used. In the second, a list is built and then a tuple from the list. How is this constant folding?

2 Likes

I think they meant that tuple (similarly to dict) allows mutable objects to be stored within it. And yet, tuple can be a part of LOAD_CONST while dict cannot. This is because tuple is conditionally hashable. LOAD_CONST requires hashability.

1 Like

A tuple can become a constant only when it is entirely constant. This depends on the values inside it. (I don’t think LOAD_CONST requires hashability, but the two will go together fairly strongly.)

Tuples are conditionally hashable. If the OP wants frozendicts to always be hashable, then the natural consequence is that everything that contributes to the hash must be hashable.

1 Like

If the OP wants frozendicts to always be hashable

I am now thinking that frozendict should inherit this conditionally hashable property from tuple. Allowing it to store hashable and unhashable values. That would still satisfy constraints needed for const folding and make it nicer to use in user code.

1 Like

Yeah, exactly. The immutable values restriction didn’t seem necessary to me.

1 Like