json.AttrDict - yes or no?

Dear all,
Having watched one of the YouTube videos from anthonywritescode, I found this closed CPython PR which proposed adding a data structure json.AttrDict, with an easy way of navigating deeply nested dictionaries via dot notation (config.servers.setup.interfaces.mgmt.protocol VS config["servers"]["setup"]["interfaces"]["mgmt"]["protocol"]).
Regardless of whether this data structure belongs to json or collections module, or whether it should extend a dictionary or typing.SimpleNamespace, I feel like some data structure in the spirit of this PR would definitely make working with deeply nested dictionaries or JSON files easier. It would also work nicely with my IDE, and make debugging much easier. Surprisingly, I haven’t found any discussion about this PR on this forum.
What do you all think? Bikeshedding aside, addition of this data structure to the standard library would definitely make my code more readable, and judging by GitHub stars, I am not alone in my judgement. I would be interested in hearing what are the trade-offs here.

I thought it made sense as a tool and don’t think it’ll turn Python into js. But I also think I wouldn’t reach for it very often. Guess a pep would be in order if people want it added.

  • It doesn’t conform to the JSON specification; therefore, it shouldn’t be included in the json library. For instance, JSON permits keys with spaces.
  • It is quite straightforward and has no relation to JSON whatsoever:
class AttrDict(dict):
    def __getattr__(self, attr):
        try:
            return self[attr]
        except KeyError:
            raise AttributeError(attr) from None

    def __setattr__(self, attr, value):
        self[attr] = value

    def __delattr__(self, attr):
        try:
            del self[attr]
        except KeyError:
            raise AttributeError(attr) from None

    def __dir__(self):
        return list(self) + dir(type(self))
1 Like

I don’t see why it needs to be in the standard library.

3 Likes

IMO, the main trade-off is about what e.g. config.keys should return.

  • Should it return the data, config["keys"]? That means config is no longer a proper dict subclass.
  • Should it return the dict.keys method? Then you need to actively remember all dict methods, so you know to write config.encryption["keys"] rather than config.encryption.keys
  • Should it depend on the data? Then your software can break when a future version of the data adds a new key.

The right behaviour for your use case might be obvious, but if this goes in the standard library, it better be right (and unsurprising) for everyone, including people who don’t read the docs very carefully.

8 Likes

How about option 2, but with raising a SyntaxWarning in case of conflicting keys?

import json
 

with open(“example.json”) as f:
    attr_dict = json.load(f, object_hook=json.AttrDict)

 
SyntaxWarning: AttrDict instance key AttrDict[“keys”] conflicts with a built-it dictionary method dict.keys and therefore cannot be accessed through attribute notation.

I don’t think SyntaxWarning should be based on data values.

5 Likes

That’s right. Not sure which warning category fits here, though.

So option 2-style proposal would look something like this:

import warnings
from functools import lru_cache


class AttrDict(dict):
    __slots__ = ()

    @staticmethod
    @lru_cache(maxsize=1)
    def _all_attrdict_methods() -> frozenset[str]:
        return frozenset(dir(AttrDict))

    @staticmethod
    def _validate_key(key: str) -> None:
        if key in AttrDict._all_attrdict_methods():
            warnings.warn(
                f'Key "{key}" in this AttrDict instance conflicts with a built-in AttrDict '
                "method and therefore cannot be accessed using attribute notation.",
                Warning,
            )

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        for key in kwargs:
            AttrDict._validate_key(key)

    def __getattr__(self, attr):
        try:
            return self[attr]
        except KeyError:
            raise AttributeError(attr) from None

    def __setitem__(self, key, value):
        super().__setitem__(key, value)
        self._validate_key(key)

    def __setattr__(self, attr, value):
        self[attr] = value

    def __delattr__(self, attr):
        try:
            del self[attr]
        except KeyError:
            raise AttributeError(attr) from None

    def __dir__(self):
        return list(self) + dir(type(self))

I like the enthusiasm here, but after reading this thread more it’s is a hard no for me. However I really like the Glom library for more “declarative” data access and manipulation: glom · PyPI

I think needing both object and attribute access on the same object is fairly rare, and leads to the mentioned issues (obj.keys vs obj["keys"] etc.). But attribute access for deeply nested configs etc. is often useful. My usual recipe is:

class Namespace(SimpleNamespace):
    def __init__(self, data, /, **kwargs):
        super().__init__(**data, **kwargs)

A good (and even backward-compatible?) addition to the standard library would be to allow SimpleNamespace to accept a mapping as a positional argument.

1 Like

Serhiy created a PR for this feature some days ago. See gh-108195.

5 Likes

Ah, lovely, thanks to Serhiy! I guess this solves this thread.