Part 1. AttrDict
I am trying to figure out a way to have an object which has BOTH:
a) Efficient attribute access
b) Efficient key access of the same dictionary
What is efficient? (Note, %timeit has ~8ns overhead)
d = dict(a=1)
class Ns: pass
ns = Ns()
ns.a = 1
%timeit d['a'] # 31.7 ns
%timeit ns.a # 19.6 n
So the bar is set, now is it possible to make an object which has both? (Pure python solution)
There are many attempts on this, but all of them have only 1 good performance out of 2. E.g.:
# 1, `dotwiz` has penalty on `__getitem__`
from dotwiz import DotWiz
dw = DotWiz(a=1)
%timeit dw['a'] # 84.5 ns (+55ns)
%timeit dw.a # 23.9 ns (+0ns)
# 2. Any attempts to access dict `__getitem__` via `__getattr__` or `__getattribute__` results in terrible `__getattr__` performance.
class DictAttr3(dict):
def __getattribute__(self, k):
if k in self:
return self[k]
return object.__getattribute__(self, k)
da3 = DictAttr3(a=1)
%timeit da3['a'] # 57.3 ns (+27ns)
%timeit da3.a # 151 ns (+130ns)
# 3. The same holds other way round of 2. I guess this is what `dotwiz` does.
class DictAttr4:
def __init__(self, *args, **kwds):
for k, v in dict(*args, **kwds).items():
setattr(self, k, v)
def __getitem__(self, k):
return getattr(self, k)
da4 = DictAttr4(a=1)
%timeit da4['a'] # 77.5 ns (+55ns)
%timeit da4.a # 19.4 ns (+0ns)
The best solution so far that I have found is the following recipe, which sacrifices a bit of performance for both __getitem__ and __getattr__, but both remain in competitive range:
class DictAttr(dict):
def __new__(cls, *args, **kwds):
instance = super().__new__(cls, *args, **kwds)
instance.__dict__ = instance
return instance
d = DictAttr(a=1)
%timeit d['a'] # 51.1 ns (+20ns)
%timeit d.a # 41.6 ns (+20ns)
Part 2. argparse.Namespace
By spending time on this I have tried many different solutions and objects. One of the approaches is to access obj.__dict__ directly. At first it looks like the approach is not too bad:
class DictAttr5:
__getitem__ = object.__getattribute__
def __init__(self, *args, **kwds):
for k, v in dict(*args, **kwds).items():
setattr(self, k, v)
da5 = DictAttr5(a=1)
%timeit da5['a'] # 88.3 ns (+60ns)
%timeit da5.a # 19.5 ns (+0ns)
The strange thing (probably there is a good reason for it) is that after setting new item via obj.__dict__ attribute access speed decreases.
da5.__dict__['b'] = 1
%timeit da5.a # 48.6 ns (+25ns)
However, what is even more strange is that this applies to all similar objects, except argparse.Namespace. Even subclasses of argparse.Namespace lose this property. Even copying code directly from standard library doesn’t retain this:
an = argp.Namespace(a=1)
%timeit an.__dict__['a'] # 54.4 ns (+25ns)
%timeit an.a # 23.5 ns (+0ns)
an.__dict__['b'] = 1
%timeit an.a # 23.6 ns
# subclass
class NewNamespace(argp.Namespace):
pass
ann = NewNamespace(a=1)
%timeit an.__dict__['a'] # 53.4 ns (+25ns)
%timeit ann.a # 19.5 ns (+0ns)
ann.__dict__['b'] = 1
%timeit ann.a # 49.4 ns (+30ns)
If it was possible to expose Namespace().__dict__.__getitem__ directly, then it would be new best solution. However, if I sublass it, it doesn’t work anymore.
Any ideas why is argparse.Namespace special?