Json.register()?

Personally, I prefer the stdlib version to be stateless. Adding a (presumably global) registry of converters is easy to do in your own app, if you need it.

As a library author, I’d rather not have to worry about the possibility that calling json.dumps might run some arbitrary user code.

10 Likes

Is there something special about json.dumps compared to, say, the + operator or str()?

The possibility of running user code is inescapable in Python.

Personally, I think that a stronger argument is that json.dumps(obj, cls=CustomEncoder) is nicely explicit and obvious while a statefully registry is implicit and obscure, and not very Pythonic. A dunder method of some sort or another would be more in the spirit of the language.

What would be the advantage of this over using functools.partial?

Pretty much everything :slight_smile:

Note that I’m not necessarily arguing for a json registry, but I am arguing against partial being a reasonable alternative.

With a registry:

json.register(Spam, SpamEncoder)
json.register(Eggs, EggsEncoder)
json.register(Cheese, CheeseEncoder)

# later
json.dumps(obj)

With partial:

dump_spam = functools.partial(json.dumps, cls=SpamEncoder)
dump_eggs = functools.partial(json.dumps, cls=EggsEncoder)
dump_cheese = functools.partial(json.dumps, cls=CheeseEncoder)

# later
if isinstance(obj, Spam):
    dump_spam(obj)
elif isinstance(obj, Eggs):
    dump_eggs(obj)
elif isinstance(obj, Cheese):
    dump_cheese(obj)

In Python 3.10, that could be re-written using match…case:

# Untested
match obj:
    case Spam():
        dump_spam(obj)
    case Eggs():
        dump_eggs(obj)
    case Cheese():
        dump_cheese(obj)

Or in any version, we could rewrite it as a dispatch table, at the cost of losing support for subclassing:

dispatch_table = {Spam: dump_spam, Eggs: dump_eggs, Cheese: dump_cheese}
dispatch_table.get(type(obj), lambda o: None)(obj)

But such a dispatch table solution is essentially re-inventing the registry concept, only less effective.

OP didn’t mention using multiple classes when registering.

Would a registry class be a good way to handle this? That would allow multiple simultaneous registries that could be used on a case-by-case basis.

I prefer two-stage approach.

When serializing:

  1. Convert Python application objects to JSON compatible values (str, int/float, list, and dict).
  2. Serialize it to JSON.

When deserializing:

  1. Deserialize JSON to JSON compatible values.
  2. Convert it to application objects.

This is what pydantic does.

This approach is symmetric. User can chose converter without changing serializer. This is nice separation of concerns.

Of course, this approach have drawbacks. Temporary JSON-compat objects consumes some RAM (although most int, float, and strings are shared between application objects and JSON-compat object).

To reduce RAM usage, SAX-like parser and incremental writer will be needed. But it is not good at usability.
I think two-stage approach works fine for 99% cases. And I don’t think Python stdlib should have hard-to-use writer/parser only for 1% use cases.

3 Likes

A simpler option would be to simply decorate json.dumps with functools.singledispatch.

There are pros and cons to dispatch, but it doesn’t seem popular in Python today.

If you have a registry, by definition it supports multiple classes. At the very least it has to support the default JSON encoder plus at least one optional custom encoder. But why would anyone create a registry that only supports a maximum of one custom encoder?

With partial, you have to know which encoder to use at call time. Even if there is only one custom encoder, you still have to decide whether to use it or the default encoder.

if isinstance(obj, Spam):
    dump_spam(obj)
else:
    json.dumps(obj)

Whereas with a registry, you just call json.dumps(obj) without caring what obj is, and the registry handles the decision.

I guess this is perhaps an argument in favour of a registry over the status quo:

if isinstance(obj, Spam):
    json.dumps(obj, cls=SpamEncoder)
else:
    json.dumps(obj)

The bottom line is that the best case for partial is that it is no better than a registry, and the worst case is that it is significantly worse.

I’m still neutral on the idea of a registry, and still lean towards some sort of dunder as a more Pythonic solution.

Why would you need more than one registry? How would you specify which registry gets used?

If I remember well, dunder method was proposed and discarded in the past. I proposed json.register() inspired by ABCClass.register()

It seems… really boring ^^’
I mean, pickle has __reduce__ magic method that allow you to skip all of this trouble. I agree that JSON does not deserve a magic method, but a way to simplify the life? I suppose yes.

Hey, this could be an idea! I’ll try it when I have time.

2 Likes

pickle and JSON is very different format.

pickle is for Python. pickle can serialize and deserialize user objects.

JSON is not for Python. Even if we add support some magic methods like __for_json__, it allow only serialization.
For example, if we can serialize datetime object into ISO 8601 string. But json module can not distinguish some string is datetime encoded in string or just a string.

So additional conversion layer like pydantic is necessary. Not just boring.

As a bonus, same conversion layer can be used for other formats like msgpack, yaml, and toml.

2 Likes

A registry doesn’t have to be global.

import json
my_registry = json.Registry()
my_registry.register(CustomType, CustomTypeEncoder)
# more registrations
json.dumps(some_object, registry=my_registry)

A shared standard registry would then simply be an instance of Registry located in the json module or some other agreed upon location:

import json
json.global_registry.register(CustomType, CustomTypeEncoder)
json.dumps(some_object, registry=json.global_registry)

This way, json.dumps does not use any global state unless you ask for it. You can still do:

json.dumps(some_object) # doesn't know about CustomType at all
2 Likes

What’s the advantage of a registry object over using single dispatch?

I might be misreading things, but my understanding was that single dispatch would only work on the top-level object, whereas the registry would work at any point in the tree. For instance, a list of custom objects, or a dictionary mapping strings to those objects, would simply look like a list or dict to singledispatch.

Whether you have a manual registry or use single dispatch, the handler for a type has to go back through the json.dumps machinery to use it, so it’s the same either way. Using single-dispatch means that the type registry would already be implemented and tested.

The following custom encoder uses singledispatch. It’s easy enough to write and use that I’d much rather recommend this approach over any sort of global registry built into the json module.

import json
from functools import singledispatch

# Implement an extensible encoder

@singledispatch
def to_json(obj):
    return json.JSONEncoder.default(self, obj)

class CustomisableEncoder(json.JSONEncoder):
    def default(self, obj):
        return to_json(obj)

# Example of adding custom serialisation for dates and datetimes

from datetime import date, datetime

@to_json.register
def _(obj: datetime):
    return f"Datetime: {obj}"

@to_json.register
def _(obj: date):
    return f"Date: {obj}"

example = [1, 2, {"a": 3, "b": 4}, datetime.now(), date.today()]

print(json.dumps(example, cls=CustomisableEncoder))
4 Likes

I never said JSON and pickle are the same beasts :slight_smile: I just said that things for json serialization could be more easy, as they are easy for pickle.

Unluckily it’s impossible to do the same thing for JSON string, but I never proposed something in that direction. Furthermore, I suppose people uses a custom json serializer more often than a custom deserializer.

I agree with this method, but I simply proposed a register that makes the point one more easy.

I don’t understand exactly why you’re so contrary to a register. copyreg is an apposite module that adds a register for pickle, for example. Why this is so bad in your eyes?

Well, I just realized that no one stops me to do this horrible thing:

>>> original_json_dumps = json.dumps
>>> _sentinel = object()
>>> 
>>> def my_json_dumps(*args, **kwargs):
...     if args:
...         obj = args[0]
...         newargs = args[1:]
...     else:
...         obj = kwargs.pop(obj, _sentinel)
...         
...         if obj is _sentinel:
...             original_json_dumps()
...     
...     if isinstance(obj, CustomTuple):
...         return original_json_dumps(tuple(obj), *newargs, **kwargs)
...     else if [...]
...     
...     return original_json_dumps(obj, *newargs, **kwargs)
... 
>>> json.dumps = my_json_dumps

:stuck_out_tongue: