When iterating dict
, json module iterate the internal structure of the dict direclty. (e.g. PyDict_Iter
) It is very fast.
On the other hand, when iterating Mapping, it does:
for key in m.keys():
value = mapping[key]
serialize_map(key, value)
Since m.__getitem__
is implemented in Python, we need one Python call per iterating each item.
Calling Python method is much slower than iterating C array.
$ pyperf timeit -s 'd = dict.fromkeys(range(1000))' -- 'dict(d)'
.....................
Mean +- std dev: 18.8 us +- 0.2 us
$ pyperf timeit -s 'import collections; d = collections.UserDict.fromkeys(range(1000))' -- 'dict(d)'
.....................
Mean +- std dev: 208 us +- 1 us
$ pyperf timeit -s 'import collections; d = collections.UserDict.fromkeys(range(1000))' -- 'dict(d.data)'
.....................
Mean +- std dev: 18.8 us +- 0.2 us
See this example. If we serialize arbitrary Mapping, this example will not work.
class MyMapping(abc.Mapping):
def __init__(self, dict):
self._data = dict
...
def custom_serialize(obj):
if isinsintance(obj, MyMapping):
d = {"__type__": "MyMapping"}
d.update(obj._data)
return d
...
J = json.dumps(someobj, default=custom_serialize)
...
def custom_deserialize(d):
if d.get("__type__") == "MyMapping":
del d["__type__"]
return MyMapping(d)
return d
....
json.loads(J, object_hook=custom_deserialize)