Json should support collections.type.Mapping (revisit)

Proposed: json.dump / dumps should support collections.type.Mapping instead of just dict.

With PEP814 getting a lot of support, the json encoder has to add yet another specific isinstance check (frozendict). Perhaps it’s type to revisit the idea of supporting the dict superclass.

Prior discussion and open bug report.

1 Like

Maybe a way to opt in, similar to this?

>>> from collections import UserDict, UserList
>>> import jsonyx as json
>>> json.dump(UserDict(a=1, b=2, c=3), types={"object": UserDict})
{"a": 1, "b": 2, "c": 3}
>>> json.dump(UserList([1, 2, 3]), types={"array": UserList}) 
[1, 2, 3]

Would round-tripping be expected to work?

A

Given that tuple round tripping doesn’t work, probably not.

>>> json.loads(json.dumps((1, 2, 3)))
[1, 2, 3]
3 Likes

This boils down to two issues: (1) The role and responsibility of json (2) The status of Mapping duck-typing.

json uses internal data structures to speed up serialization. Custom mappings implemented in Python will certainly be slower. For a standard library implementation, parity with language features is more important than performance (though the stdlib json is not known for top performance anyway, for that third-party implementations like orjson exist).

Mapping is one area where duck-typing falls apart in Python. We already have language syntax (** unpacking, match statement) that work for non-dict mappings. We already have non-dict mappings in the standard library (MappingProxyType). On the other hand, we still have eval/exec only accepting pure dicts in their globals argument (but passing non-dict to locals is OK):

>>> from types import MappingProxyType
>>> mpt = MappingProxyType({"foo": "bar"})
>>> eval("foo", mpt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    eval("foo", mpt)
    ~~~~^^^^^^^^^^^^
TypeError: globals must be a real dict; try eval(expr, {}, mapping)
>>> eval("foo", {}, mpt)
'bar'

I can understand that “module global is supposed to be a string-keyed dict" is a core assumption of the language. But the json situation is leaning towards the eval / exec case despite being much less a core part of the language. The performance or complexity issue also makes little sense as json “supports” non-string keys in the incoming dict, something I find difficult to understand the purpose.

Overall, the json story is just like the standard library re: it’s there, batteries included, but as soon as you want something with more features, better performance, or strict conformance to external specs, you should use third-party libraries — but third-party libraries are tailored to their specific needs, aren’t always available on every platform or environment, and are under no expectation whatsoever to match language / stdlib features as new ones get introduced. (The third-party orjson does not accept non-dict mappings either, but that’s more understandable as its main point is performance, and it does other things differently than stdlib json.)

1 Like

No, that’s not reasonable. e.g. the collection classes defaultdict, Counter, OrderedDict would all produce the same JSON output for the same values. (while other stdlib maplike raise TypeError)

#!/usr/bin/env python3
"""
Demo: passing various collections mapping types directly to json.dumps()

Python: 3.14
"""

import json
from collections import defaultdict, Counter, OrderedDict, ChainMap, UserDict


DEMO_ITEMS = [("a", 1), ("b", 2), ("c", 3)]


def try_dump(label: str, obj):
    print(f"\n{label}")
    try:
        print(json.dumps(obj, indent=2, sort_keys=True))
    except TypeError as e:
        print(f"TypeError: {e}")


def main():
    # 1. defaultdict -> dict subclass (WORKS)
    dd = defaultdict(int, DEMO_ITEMS)
    try_dump("defaultdict", dd)

    # 2. Counter -> dict subclass (WORKS)
    counter = Counter(dict(DEMO_ITEMS))
    try_dump("Counter", counter)

    # 3. OrderedDict -> dict subclass (WORKS)
    od = OrderedDict(DEMO_ITEMS)
    try_dump("OrderedDict", od)

    # 4. ChainMap -> mapping view (FAILS)
    cm = ChainMap(
        {"a": 1, "b": 2},
        {"b": 20, "c": 3},
    )
    try_dump("ChainMap", cm)

    # 5. UserDict -> dict wrapper (FAILS)
    ud = UserDict(dict(DEMO_ITEMS))
    try_dump("UserDict", ud)


if __name__ == "__main__":
    main()

output:


defaultdict
{
  "a": 1,
  "b": 2,
  "c": 3
}

Counter
{
  "a": 1,
  "b": 2,
  "c": 3
}

OrderedDict
{
  "a": 1,
  "b": 2,
  "c": 3
}

ChainMap
TypeError: Object of type ChainMap is not JSON serializable

UserDict
TypeError: Object of type UserDict is not JSON serializable

what I think would be reasonable from an end-user viewpoint is stdlib json just to work with any than is a duck-typed Mapping.

Since I personally generally consider instance checks a poor Liskov substiution violating practice, I did some searching on Python philosphy and came across this 2015 GvR quote:

Actually, "eat your own dogfood" is not one of the goals of the stdlib -- nor is it supposed to be an example of how to code. This is often misunderstood.  The stdlib contains a lot of Python code, and you can learn a lot from it, but good coding habits aren't generally something you learn there -- the code is crusty (some of the oldest Python code in existence lives in the stdlib!), often has to bend over backwards to support backward compatibility, and is riddled with performance hacks.

Based on some events in the distant past, there's actually an active ban against sweeping changes to the stdlib that attempt to "modernize" it or use new features -- because there is so much code in the stdlib, review of such sweeping (often near-mechanical) changes is inevitably less thorough than when a new feature is implemented, and even the best tests don't catch everything, so regressions in dark corners are often the result.

I’m not CPython knowledgeable enough about the stldib JSON implementation to know whether, in 2025, with the potential additon of frozendict to the json instance checks to the module, whether the trade-off between performance and what would a user expect has shifted. Which is the why this is a revisit the idea instead of pull request or pre-PEP.

While this might work for Mappings, for Sequences it quickly falls apart:

>>> import jsonyx as json
>>> from collections.abc import Sequence
>>> encoder = json.Encoder(types={"array": Sequence})
>>> encoder.dump(b"foo")
[102, 111, 111]
>>> encoder.dump(range(3))
[0, 1, 2]

I don’t think this will be fixed any time soon, so it’s better to manually register the types you want to serialize:

>>> import jsonyx as json
>>> encoder = json.Encoder(types={"array": set})
>>> encoder.dump({1, 2, 3}) # this is not even a sequence (like np.ndarray)
[1, 2, 3]