Introduce nested creation of dictionary keys

jcampbell05 · January 13, 2025, 10:51pm

I recently came across a use case where I needed to created nested keys, basically it was where I needed convert a web form like datastructure with keys like this “customer[profile][name]” into a deeply nested JSON structure.

In the end it was easier to pull in a specific library to handle this rather than hand roll the nesting logic.

But it would have been great to have the same os.path.mkdirs type API but for dictionaries, in an ideal world the API would look something like this

tree = {}
tree.set(('customer', 'id', 'name'), {})

This would give us a dict that would look like this

tree = { 
   'id': {
      'name': {
      }
   }
}

This would throw an exception if trying to access a nested key that is set to anything other than a dict and isn’t a leaf node i.e if “profile” was an array or string

It would also throw an error if set already

blhsing · January 14, 2025, 12:58am

How about using a recursive collections.defaultdict as suggested by this StackOverflow answer?

from collections import defaultdict

def recursive_defaultdict():
    return defaultdict(recursive_defaultdict)

tree = recursive_defaultdict()
tree['id']['name'] = {}

mikeshardmind · January 14, 2025, 2:37am

The recursive defaultdict above should probably work fine for most cases where you control the creation of the objects in question.

Even if you don’t though, I’m not sure this belongs in the standard library (I am in favor of the std library’s json module gaining jsonpath support as functions but not as methods on dicts) as you’ve got a few requirements here that likely don’t apply broadly, namely:

Which would prevent the use of such a standard library inclusion to update existing data.

As it stands, for the nested dict-only case you have, the below should work, you can tweak it to also error on the last line if d[last] exists prior to setting it.

def nested_set(d: dict[str, Any], path: tuple[str, ...], value: Any):
    if not path:
        return ValueError("some message about needing to provide keys here")
    *most, last = path
    for k in most:
        d = d.setdefault(k, {})
    d[last] = value

sayandipdutta · January 14, 2025, 12:56pm

Similar to @blhsing 's solution, you could also use the __missing__ method, if you want to maintain a dict-like repr, and add some of your own utility methods, e.g. throwing error on attempting get on non-existent keys:

from functools import reduce
from operator import getitem


def _get_if_exists(tree, k):
    if k in tree:
        return tree[k]
    raise KeyError(k)

class nesteddict(dict):
    def __missing__(self, key):
        return self.setdefault(key, nesteddict())
    def deepset(self, *keys, value):
        *path, key = keys
        reduce(getitem, path, self)[key] = value
    def deepget(self, *keys):
        return reduce(_get_if_exists, keys, self)

d = nesteddict()
d["a"]["b"] = 5              # same as d.deepset("a", "b", value=5)
print(d.deepget("a", "b"))   # 5
d.deepset("a", "b", value=6)
print(d.deepget("a", "b"))   # 6
print(d)                     # {'a': {'b': 6}}
print(d.deepget("a", "c"))   # KeyError: 'c'

If you want to deepset only for existing paths, you could reuse deepget inside deepset.

gerardw · January 14, 2025, 5:33pm

What happened to “customer”?

jcampbell05 · January 14, 2025, 11:13pm

I guess it wouldn’t matter if no exception thrown if that’s the only reason this wouldn’t be considered as a API inclusion.

I do wonder if this could be a nice utility function in a collectiontools library (If there is such a thing ? )

That was a mistake in my sample that indeed should be the initial dict.

gerardw · January 15, 2025, 6:54pm

Isn’t this just:

tree = {'customer': {'id': {'name':{}}}}

blhsing · January 15, 2025, 11:25pm

No, this assigns a new dict rather than updating an existing one.

UltimateLobster · January 16, 2025, 7:05am

You can also use setdefault which can help when nesting lists as well:

d = {}

d.setdefault('a', {}).setdefault('b', {}).setdefault('c', []).append('d')

jsbueno · January 16, 2025, 1:16pm

I implement similar resources in a library.

What I have to say is: this kind of thing is complicated. It has edge cases. Tons of them. And it is hard to use - as people might have different ideas of what should be the best approach.

As faras I know there is no preferred library or idiom for this.

Using my code, you can do:

from extradict import NestedData

>>> tree = NestedData({"customer.id.name": "James"})
>>> tree
{'customer': {'id': {'name': <str>}}}
>>> tree.data
{'customer': {'id': {'name': 'James'}}}

The “tree.data” attribute is a regular dictionary - NestData does some dynamic wrapping if it is used directly.

Feel free to interact with it, and find what could improve in ergonomy.
So, for the the time being we can build a reasonable consensus on that could be better approach to this feature. (and we may find that the better thing is to keep this kind of functionality in 3rd party libs),

best regards!

jsbueno · January 16, 2025, 1:17pm

that is certainly not easy to type or to read - a workaround, but certainly not an argument for not saying the O.P. described feature wouldn’t be welcome.

NeilGirdhar · January 16, 2025, 7:54pm

Nice, and very interesting.

Here’s another example from the wild: flax/flax/nnx/traversals.py at main · google/flax · GitHub

See flatten_mapping and unflatten_mapping, which flattens a nested dictionary into {(customer, id, name): 'James', and back. This allows similar access patterns as requested.

Lucas_Malor · January 16, 2025, 10:16pm

This is interesting, but what’s the use case?

NeilGirdhar · January 17, 2025, 2:02am

Did you look at the rest of the project?