Json does not support Mapping and MutableMapping

Marco_Sulla · December 10, 2019, 6:07pm

I created a class that implements Mapping. Using json.dumps(mymap) I get

TypeError: Object of type ‘MyMap’ is not JSON serializable

I can do json.dumps(dict(mymap)), but it’s more slow. I can do json.dumps(mymap._dict), but it’s not elegant. So I think I’ll implement a MyMap.jsonload.

Anyway, is this behaviour normal? Why json module can’t serialize Mapping classes by default?

aeros · December 11, 2019, 6:51am

Yes, I believe this behavioral is normal. Only Python dicts can be directly translated to JSON objects in the json module, see the table in json.JSONDecoder(). I’m not sure if there are any technical limitations/reasons as to why Mapping and MutableMapping wouldn’t be supported though.

This might be worth opening an issue on bugs.python.org (check for duplicates first though). If you do, be sure to add the currently active core developers that are maintainers for the module to the nosy list, “rhettinger” and “ezio.melotti”.

methane · December 11, 2019, 8:55am

Do not care about it. Iterating a Mappign is terribly slow compared to iterating dict anyway.

This is the best way when you care about performance.
Or you can have some public method (e.g. as_dict()) which returns self._dict.copy().

Using ABC instead of actual type is not easy as people think.

1. Since ABC is abstract, subclass, subclass relation is more complicated.

If json.dumps supports Mapping, why not support Sequence?
Then, strings are sequence. What a subtype of the string should be serialized?

2. There is a default option

json.dumps has the default option to customize how use types are serialized.
If we serialize all classes which are subtype of Mapping or Sequence, user can not customize how to serialize them anymore.

encukou · December 11, 2019, 10:03am

Custom Mappings can have more state than just what’s exported by the Mapping interface. Serializing as a dict would leave that out.

Marco_Sulla · December 11, 2019, 10:07am

? Why? Usually who creates a Mapping uses a private dict attribute, and __iter__ usually returns iter(self._dict)

dict.copy() is the same of dict(mymap)… I ended with this method:

def jsonDump(self, fp=None, *args, **kwargs):
    mydict = self._dict

    if fp is None:
        return json.dumps(mydict, *args, **kwargs)
    else:
        kwargs.setdefault("indent", 4)
        
        try:
            fp.write
            return json.dump(mydict, fp, *args, **kwargs)
        except AttributeError:
            with open(fp, "w") as f:
                f.write(json.dumps(mydict, *args, **kwargs))

json already serialize any str…

…why? o___0

Marco_Sulla · December 11, 2019, 10:10am

So why dict.update(Mapping) works?

methane · December 11, 2019, 10:37am

When iterating dict, json module iterate the internal structure of the dict direclty. (e.g. PyDict_Iter) It is very fast.
On the other hand, when iterating Mapping, it does:

for key in m.keys():
    value = mapping[key]
    serialize_map(key, value)

Since m.__getitem__ is implemented in Python, we need one Python call per iterating each item.

Calling Python method is much slower than iterating C array.

$ pyperf timeit -s 'd = dict.fromkeys(range(1000))' -- 'dict(d)'
.....................
Mean +- std dev: 18.8 us +- 0.2 us

$ pyperf timeit -s 'import collections; d = collections.UserDict.fromkeys(range(1000))' -- 'dict(d)'
.....................
Mean +- std dev: 208 us +- 1 us

$ pyperf timeit -s 'import collections; d = collections.UserDict.fromkeys(range(1000))' -- 'dict(d.data)'
.....................
Mean +- std dev: 18.8 us +- 0.2 us

See this example. If we serialize arbitrary Mapping, this example will not work.

class MyMapping(abc.Mapping):
    def __init__(self, dict):
        self._data = dict
    ...

def custom_serialize(obj):
    if isinsintance(obj, MyMapping):
        d = {"__type__": "MyMapping"}
        d.update(obj._data)
        return d
    ...

J = json.dumps(someobj, default=custom_serialize)

...

def custom_deserialize(d):
    if d.get("__type__") == "MyMapping":
        del d["__type__"]
        return MyMapping(d)
    return d

....

json.loads(J, object_hook=custom_deserialize)

Marco_Sulla · December 11, 2019, 7:45pm

And I said that in 99% of custom Mapping, __iter__() returns iter(self._dict), so it can use PyDict_Iter anyway. It have just to check if tp_iter is not changed, as PyDict_Merge do.
And if tp_iter is changed, PyDict_Merge uses a slower method, but in C, not in Python.

Anyway, I repeat, dict.update(Mapping) works, even if tp_iter is changed. We have not to do dict.update(dict(Mapping)). And furthermore I’m not sure this is more fast…

Furthermore, there are Python implementations of json (de)serialization that are way more faster, as orjson. So it does not seems to me that speed is a priority for json.

I continue to not understand… obviously I can’t test your example, since Mapping is not supported by json… but you’re really saying that, for data types that json supports directly, custom (de)serialization can’t be done? It seems very strange to me.

methane · December 11, 2019, 10:40pm

No, PyDict_Merge can not use PyDict_Iter if other is not subclass of dict.

Remember, you referred the performance as reason for supporting Mapping in the first comment.
“Anyway, I repeat…”, and “speed is not a priority” does not make sense at all.
OK, stop about performance. It can not be a reason for support Mapping.

You can use default option to customize serialization of types which are not supported by json. Read the manual. There is enough examples.

You can not customize serialization of types json supports (e.g. str, int, list, dict) with the default because default call back is not called for them.

Marco_Sulla · December 12, 2019, 9:21pm

And this is IMHO wrong, but, as I said, even if the fast method is not used, it’s used the slow method, without forcing the coder to convert the object to a dict first.

Performance is not secondary, but the first reason is elegance and practicality. If json supports also collection.abc subclasses, it’s more elegant and practical to write json.dumps(Mapping) instead of json.dumps(dict(Mapping)). And it could be also fast as json.dumps(dict), if json will simply check if the object is a subclass of Mapping and tp_iter is equal dict_iter.

So your proposal is that json should not support any other type natively anymore, I suppose.

asvetlov · December 12, 2019, 10:14pm

@Marco_Sulla please hold on and take a time for meditating on @methane answers.
He is completely correct, I’m 100% sharing his opinion.

If the meditation is not enough, I suggest making a patch with implementation of your proposal. I expect you’ll see a lot of failing tests but it can be the excellent exercise.
After getting tests green you can come up with your proposal again if you still want to get it done.

Marco_Sulla · December 12, 2019, 11:10pm

Ok, you’re right. He is completely correct, so please remove the support of dict.update(Mapping) and any other function or method that support directly Mapping, MutableMapping, Sequence and any other collections.abc subclasses without converting them previously to the respective builtin type, because they are not consistent with the json behavior.

asvetlov · December 12, 2019, 11:48pm

Please don’t ask us, volunteers, to do some work.
Please don’t hesitate to make a prototype for demonstration of your awesome idea instead; the working example extremely helps with the idea defending.
The proven proposal weights much more than an abstract random thought, isn’t it?

Marco_Sulla · December 13, 2019, 12:12am

@asvetlov First of all I’m busy. Secondly, if I have time, I’ll spend to finish the implementation of frozendict. Thirdly, why should I implement something that you and @methane think is wrong?

If it’s wrong, please change all the other behavior of CPython with collections.abc classes accordingly to the json package.

Marco_Sulla · December 13, 2019, 12:26am

Furthermore, maybe you have short memory, but I tried in past to contribute to your aiohttp project. And I don’t remember you was so sarcastic when I opened some bug report.

methane · December 13, 2019, 1:19am

When a Mapping is not subclass of dict, how it can have tp_iter same to dict_iter?

I don’t propose anything. Please read your first post again. I just answered your question:

I think json can not support any other types natively by default.
But it can add opt-in option, like mapping_to_object and datetime_to_iso8601. I don’t have any opinion about it.

Marco_Sulla · December 13, 2019, 10:10pm

For the third time:

def __iter__(self):
    return iter(self._dict)

Anyway, as I already said many other times, it does not matter, since other functions in CPython, like PyDict_Merge(), do a slow but generic iteration to get the keys and values of a generic mapping. json can do it without any problem.

Ok. Why?

methane · December 14, 2019, 2:01am

No, it’s tp_iter just calls your __iter__ method. (e.g. slot_tp_iter, not dict_iter).

If this is third time, I could not understand you because you misunderstood the tp_iter.

I already described it several times in this thread. It breaks backward compatibility. It makes user can not use default option to customize serialization of user types which are Mapping.

methane · December 14, 2019, 2:04am

If you feel I am sarcastic, I’m sorry about it. I am not good at English so it is difficult to use right nuance. (I am not maintainer of aiohttp project anyway.)

I just wanted to answer your questions and ask questions I can not understand what you mean.

Marco_Sulla · December 14, 2019, 2:34am

Oooook, but as I already said 4 times that anyway this does not matter, since json could just use the “slow” method of PyDict_Merge().

I suppose it does not break nothing, on the contrary. Simply default will be ignored and the object will be serialized more fastly.

I was not replying to you, but to Svetlov, that it seems that have a very short memory.