Deep get functionality for dictionaries

Feature or enhancement

Proposal:

Right now, if I have a nested dictionary (Common occurrence when dealing with a JSON file), and I want to safely reach a value inside a nested dict, I have 2 ways (maybe more technically, but these are the two main ways)

try:
    name = data["user"]["name"]
except KeyError as e:
    name =  "default_value"

Or

data.get("user", {}).get("name", "default_value")

I would like to propose a new method called deepget.
the method will be called like here:

name = data.deepget(["user", "name"],  "default_value")

Or

name = data.deepget("user", "name",  default="default_value")

I would also like your opinion on which way looks better

1 Like

I’m wondering what would be the simplest way to implement deepget as a function in existing Python? If a very simple way exists it might obviate the need for a new method.

This solution was proposed on stackoverflow in 2010:

def get_nested(d, list_of_keys, default):
    for k in list_of_keys:
        if k not in d: 
            return default
        d=d[k]
    return d

Here’s a recursive solution:

def deepget(d, keys, default_value=None):
    if keys[0] not in d:
        return default_value
    if len(keys) == 1:
        return d[keys[0]]
    return deepget(d[keys[0]], keys[1:], default_value)

Or using reduce:

from functools import reduce

def deepget(d, keys, default_value=None):
    final_dict = reduce(lambda d, key: d.get(key, {}), keys[:-1], d)
    return final_dict.get(keys[-1], default_value)

A different way of solving the problem might be to create a new dict class that allows tuple indexing.

E.g.

from collections import nesteddict

my_dict = {"user": {"name": "Jill"}}
nested_dict = nesteddict(my_dict)
print(nested_dict["user", "name"])

or

print(nested_dict.get(["user", "name"], "default_value")

If you’re willing to use a third-party library, there’s glom · PyPI

1 Like

It’s definitely possible to implement it in python but i felt like this feature could be a part of python core functionality, up until now i wrote in some random util file a nested_get function but i felt like it’s common enough that it should be considered to be added to python.

The implementation is not complicated but so is the implementation for dict.get()

def get(d, key, default=None):
    if key in d:
        return d[key]
    return default

as for implementation i was able to add it to cpyhon code with relative ease
(gh-136776: Adding deepget method to dictionary by omerpresler · Pull Request #136805 · python/cpython · GitHub)

It’s also worth to note that if the case is indeed common enough the implementation in C can decrease run time as opposed to creating a new class, running loops and so on.

P.S. If anyone is curious let me know and i will create a benchmark post it and it’s results :slight_smile:

1 Like

Why? The current easily understood one-liner has four variables. The 3-argument and therefore deficient replacement is just as long. No saving in typing. The true replacement is more typing and to me less clear.

3 Likes

Also, check the existing discussions on the None coalescing operator -
Which could allow for example: data["user"]?["name"]? sans the bounding try/except.

That will help you grab even more motives on why such a feature would be a nice to have, and it could get the ball rolling for a (mostly) consensual approach.

As a side note, I myself have published similar features in extradict.NestedData - so give it a try, if you will. Currently it requires the key to be strings with a . separator for nested keys, at the moment . (I think I will just allow the use of tuples as nested keys in there for a coming release - and explicitly disallow non-string keys for the nested dictionaries - something regular dictionaries can’t do.

The one-liner {}.get("name", "default_value") will return "default_value" if "name" doesn’t exist. Similarly, data.get("user", {}).get("name", "default_value") will also return "default_value"—not {} if "user" is missing.

Both my implementation and the proposed one-liner return only "default_value" and never {}. Your suggested “complete solution” adds new functionality, which may be a nice feature, but it is not the goal here.

My goals are:

  1. Minimize function calls – Each .get in a nested dictionary returns a new dictionary, which we then have to re-access from the original scope. Reducing chained .get calls improves performance and clarity.

  2. Improve readability – It should be immediately clear that the default value is returned if the lookup path is broken. No one should mistakenly think {} might be returned unless explicitly intended.

  3. Match the behavior of the indexing operator – The goal is to make .deepget() act like dict["first_key"]["second_key"]..., but in a safe, clean and intuitive way for nested dictionaries.

Your “complete solution” introduces complexity by allowing alternate default values per level, which makes it more of an extension than a replacement. While that may be useful in some cases, it’s not my goal here with deepget.

1 Like

Do you have actual code where this operation is a performance bottleneck? Otherwise, this feels like a mostly theoretical benefit.

It’s immediately clear in data.get("user", {}).get("name", "default_value"). The proposed alternative, data.deepget(["user", "name"], "default_value") isn’t any more clear (it’s less explicit, but shorter, which is overall neutral to me).

It’s not always safe - if dict["first_key"] is 0, it will raise a ValueError, so you still have to be prepared to catch exceptions. And it’s no cleaner than get - it’s shorter, but get is more explicit, and there’s no consensus that short beats explicit[1]. And I’m not even sure it’s “intuitive” - it’s learnable, certainly, but I’m not sure if I’d immediately understand it if I came across it for the first time.

On the plus side, I find the deepget method more understandable than the None coalescing operator data["user"]?["name"]? which was mentioned in a previous comment. So there’s that, at least…

To be honest, if I needed this, I’d just write it for myself as a utility function and move on.


  1. Indeed, the Zen of Python states that “explicit is better than implicit” ↩︎

2 Likes

Fair point,

Do you have actual code where this operation is a performance bottleneck?

I don’t have a piece of code that ius slow solely because of this problem but i do have a benchmark for it.

try/except (OK)        : 4.6907 sec
try/except (1st)       : 8.7040 sec
try/except (last)      : 10.8111 sec
chained .get (OK)      : 9.5443 sec
chained .get (1st)     : 9.2013 sec
chained .get (last)    : 9.3769 sec
safeget (OK)           : 8.0189 sec
safeget (1st)          : 10.4065 sec
safeget (last)         : 13.8650 sec
reduce (OK)            : 30.9550 sec
reduce (1st)           : 31.7279 sec
reduce (last)          : 31.6069 sec
recursive (OK)         : 37.9125 sec
recursive (1st)        : 36.9373 sec
recursive (last)       : 38.2307 sec
deepget (OK)           : 3.8801 sec
deepget (1st)          : 2.9919 sec
deepget (last)         : 3.6450 sec
import timeit
from functools import reduce

# Deep nested dictionary
data = {
    "a": {
        "b": {
            "c": {
                "d": {
                    "e": "final_value"
                }
            }
        }
    }
}

# Helpers
def try_except_access(d):  # Existing
    try:
        return d["a"]["b"]["c"]["d"]["e"]
    except KeyError:
        return "default_value"

def try_except_first_missing(d):  # "x"
    try:
        return d["x"]["b"]["c"]["d"]["e"]
    except KeyError:
        return "default_value"

def try_except_last_missing(d):  # "f"
    try:
        return d["a"]["b"]["c"]["d"]["f"]
    except KeyError:
        return "default_value"

def chained_get(d):  # Existing
    return d.get("a", {}).get("b", {}).get("c", {}).get("d", {}).get("e", "default_value")

def chained_get_first_missing(d):  # "x"
    return d.get("x", {}).get("b", {}).get("c", {}).get("d", {}).get("e", "default_value")

def chained_get_last_missing(d):  # "f"
    return d.get("a", {}).get("b", {}).get("c", {}).get("d", {}).get("f", "default_value")

def safeget(dct, *keys):
    for key in keys:
        try:
            dct = dct[key]
        except KeyError:
            return "default_value"
    return dct

def deep_get_reduce(dictionary, keys, default="default_value"):
    return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)

def deep_get_recursive(d, keys, default="default_value"):
    if not keys or d is None:
        return default
    if len(keys) == 1:
        return d.get(keys[0], default) if isinstance(d, dict) else default
    return deep_get_recursive(d.get(keys[0], {}), keys[1:], default)

# Method name -> statement mapping
benchmarks = {
    "try/except (OK)": "try_except_access(data)",
    "try/except (1st)": "try_except_first_missing(data)",
    "try/except (last)": "try_except_last_missing(data)",

    "chained .get (OK)": "chained_get(data)",
    "chained .get (1st)": "chained_get_first_missing(data)",
    "chained .get (last)": "chained_get_last_missing(data)",

    "safeget (OK)": "safeget(data, 'a', 'b', 'c', 'd', 'e')",
    "safeget (1st)": "safeget(data, 'x', 'b', 'c', 'd', 'e')",
    "safeget (last)": "safeget(data, 'a', 'b', 'c', 'd', 'f')",

    "reduce (OK)": "deep_get_reduce(data, 'a.b.c.d.e')",
    "reduce (1st)": "deep_get_reduce(data, 'x.b.c.d.e')",
    "reduce (last)": "deep_get_reduce(data, 'a.b.c.d.f')",

    "recursive (OK)": "deep_get_recursive(data, ['a','b','c','d','e'])",
    "recursive (1st)": "deep_get_recursive(data, ['x','b','c','d','e'])",
    "recursive (last)": "deep_get_recursive(data, ['a','b','c','d','f'])",

    "deepget (OK)": "data.deepget(['a','b','c','d','e'], 'default_value')",
    "deepget (1st)": "data.deepget(['x','b','c','d','e'], 'default_value')",
    "deepget (last)": "data.deepget(['a','b','c','d','f'], 'default_value')",
}

setup_code = """
from __main__ import (
    data,
    try_except_access, try_except_first_missing, try_except_last_missing,
    chained_get, chained_get_first_missing, chained_get_last_missing,
    safeget, deep_get_reduce, deep_get_recursive
)
"""

# Run benchmarks
print("Benchmark: 10M iterations each")
for name, stmt in benchmarks.items():
    t = timeit.timeit(stmt, setup=setup_code, number=10_000_000)
    print(f"{name:23}: {t:.4f} sec")

My point was that i felt like it completed .get since .get can also be implemented as a method but perhaps this problem is not as common as i thought. i am thankful for your feedback anyway.

I think that the problem is common but it is also more general than this. The suggested idea to have a deepget method on dict seems a little confused to me because generally dict methods don’t make any assumptions about the type of the values whereas deepget supposes that the values are dicts that also have __getitem__. The more general problem is that you might want to do something like

x = data['foo'][0].attribute['bar']

where it is not just a nested dict of dicts of dicts.

1 Like

FTR, there is 3rd party library implementing something similar although the runtime is likely not the best (it’s a pure Python lib): https://pydash.readthedocs.io/en/latest/api.html#pydash.objects.get. I’ve seen also many topics where the requested features existed (or could be implemented) using this library (it’s the Python equivalent of the “lodash” JS library which is a very common utility lib).

I’d like something like this to be built-in too, but PEP-505 would offer much more general applications so you may want to voice your support for Revisiting PEP 505 instead.