Add `array_hook` to Json-decoder options

For something that could be very straightforward:

When decoding a JSON string, the stdlib decoder allows for a series of customizations in the form of callbacks for (almost) every element encountered: `object_hook` for dicts, and `parse_float`, `parse_int` and `parse_constant` for converting scalar values.

The one JSON data type missing in the customization are JSON arrays, which then must be parsed as Python lists. Maybe at the time of writting the API, people simply couldn’t find a use case for that.

But with the advent of Frozendict, if JSON arrays can be decoded as tuples, it is trivial to parse a whole deeply nested data structure to an imutable datastructure, Python side.

The use cases for that would be the same that exists for “frozendict”, so I believe I won’t need to justify then. And JSON decoding would fix this strange asymmetry.

For the record, parsing JSON as a frozen datastructure, given frozendicts, is almost possible with frozendicts and the `object_hook` argument:

frozen = json.loads(
    json_str, object_hook=lambda dct: frozendict({k: (tuple(v) if isinstance(v, list) else v) for k, v in dct.items()}))

But this won’t work if the outermost JSON element is itself an array (list) instead of an object (dict).

In addition, or alternatively to an `array_hook` parameter, the decoder could more simply have a `frozen` parameter flag that would decode straight to tuples and frozendicts (and that would be more optimal) .

Also, note that the JSON encoder can handle constant data structures made with frozendicts and tuples by default - so having the option to decode to imutable would also yield a nice symmetry.


For the record, with the proposed `array_hook` and no special parameter to load the structure as immutable, this is the call that would do it:

frozen = json.loads(
    json_str, object_hook=lambda dct: frozendict(dct), array_hook=lambda lst: tuple(lst))
3 Likes

As this looks simple enough, and I had no voiced objections, I just went ahead and created a PR here:

object_hook was added in support a decoding object_hook · simplejson/simplejson@3035dac · GitHub to support creation of objects depending on the content of the dict. For example:

    >>> import simplejson
    >>> def as_complex(dct):
    ...     if '__complex__' in dct:
    ...         return complex(dct['real'], dct['imag'])
    ...     return dct
    ... 
    >>> simplejson.loads('{"__complex__": true, "real": 1, "imag": 2}',
    ...     object_hook=as_complex)
    (1+2j)

This is not the case for JSON arrays. So, array_hook would have less use cases than object_hook. In any case we can postprocess the result of deseralization, and use not only the object/array content, but also their position in the tree. But this is true also for the original object_hook parameter. There are third-party packages to convert JSON to structural tree of objects basing on rules and patterns, they don’t use object_hook.

cc @bob

I don’t have a strong opinion about it, but the use case for a frozen result sounds reasonable enough to consider.

2 Likes