Dear all,
Having watched one of the YouTube videos from anthonywritescode, I found this closed CPython PR which proposed adding a data structure json.AttrDict
, with an easy way of navigating deeply nested dictionaries via dot notation (config.servers.setup.interfaces.mgmt.protocol
VS config["servers"]["setup"]["interfaces"]["mgmt"]["protocol"]
).
Regardless of whether this data structure belongs to json
or collections
module, or whether it should extend a dictionary or typing.SimpleNamespace
, I feel like some data structure in the spirit of this PR would definitely make working with deeply nested dictionaries or JSON files easier. It would also work nicely with my IDE, and make debugging much easier. Surprisingly, I haven’t found any discussion about this PR on this forum.
What do you all think? Bikeshedding aside, addition of this data structure to the standard library would definitely make my code more readable, and judging by GitHub stars, I am not alone in my judgement. I would be interested in hearing what are the trade-offs here.
I thought it made sense as a tool and don’t think it’ll turn Python into js. But I also think I wouldn’t reach for it very often. Guess a pep would be in order if people want it added.
- It doesn’t conform to the JSON specification; therefore, it shouldn’t be included in the json library. For instance, JSON permits keys with spaces.
- It is quite straightforward and has no relation to JSON whatsoever:
class AttrDict(dict):
def __getattr__(self, attr):
try:
return self[attr]
except KeyError:
raise AttributeError(attr) from None
def __setattr__(self, attr, value):
self[attr] = value
def __delattr__(self, attr):
try:
del self[attr]
except KeyError:
raise AttributeError(attr) from None
def __dir__(self):
return list(self) + dir(type(self))
I don’t see why it needs to be in the standard library.
IMO, the main trade-off is about what e.g. config.keys
should return.
- Should it return the data,
config["keys"]
? That meansconfig
is no longer a properdict
subclass. - Should it return the
dict.keys
method? Then you need to actively remember alldict
methods, so you know to writeconfig.encryption["keys"]
rather thanconfig.encryption.keys
- Should it depend on the data? Then your software can break when a future version of the data adds a new key.
The right behaviour for your use case might be obvious, but if this goes in the standard library, it better be right (and unsurprising) for everyone, including people who don’t read the docs very carefully.
How about option 2, but with raising a SyntaxWarning
in case of conflicting keys?
import json
with open(“example.json”) as f:
attr_dict = json.load(f, object_hook=json.AttrDict)
SyntaxWarning: AttrDict instance key AttrDict[“keys”] conflicts with a built-it dictionary method dict.keys and therefore cannot be accessed through attribute notation.
I don’t think SyntaxWarning should be based on data values.
That’s right. Not sure which warning category fits here, though.
So option 2-style proposal would look something like this:
import warnings
from functools import lru_cache
class AttrDict(dict):
__slots__ = ()
@staticmethod
@lru_cache(maxsize=1)
def _all_attrdict_methods() -> frozenset[str]:
return frozenset(dir(AttrDict))
@staticmethod
def _validate_key(key: str) -> None:
if key in AttrDict._all_attrdict_methods():
warnings.warn(
f'Key "{key}" in this AttrDict instance conflicts with a built-in AttrDict '
"method and therefore cannot be accessed using attribute notation.",
Warning,
)
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
for key in kwargs:
AttrDict._validate_key(key)
def __getattr__(self, attr):
try:
return self[attr]
except KeyError:
raise AttributeError(attr) from None
def __setitem__(self, key, value):
super().__setitem__(key, value)
self._validate_key(key)
def __setattr__(self, attr, value):
self[attr] = value
def __delattr__(self, attr):
try:
del self[attr]
except KeyError:
raise AttributeError(attr) from None
def __dir__(self):
return list(self) + dir(type(self))
I like the enthusiasm here, but after reading this thread more it’s is a hard no for me. However I really like the Glom library for more “declarative” data access and manipulation: glom · PyPI
I think needing both object and attribute access on the same object is fairly rare, and leads to the mentioned issues (obj.keys
vs obj["keys"]
etc.). But attribute access for deeply nested configs etc. is often useful. My usual recipe is:
class Namespace(SimpleNamespace):
def __init__(self, data, /, **kwargs):
super().__init__(**data, **kwargs)
A good (and even backward-compatible?) addition to the standard library would be to allow SimpleNamespace
to accept a mapping as a positional argument.
Serhiy created a PR for this feature some days ago. See gh-108195.
Ah, lovely, thanks to Serhiy! I guess this solves this thread.