Personally, I prefer the stdlib version to be stateless. Adding a (presumably global) registry of converters is easy to do in your own app, if you need it.
As a library author, I’d rather not have to worry about the possibility that calling json.dumps might run some arbitrary user code.
Is there something special about json.dumps compared to, say, the + operator or str()?
The possibility of running user code is inescapable in Python.
Personally, I think that a stronger argument is that json.dumps(obj, cls=CustomEncoder) is nicely explicit and obvious while a statefully registry is implicit and obscure, and not very Pythonic. A dunder method of some sort or another would be more in the spirit of the language.
Convert Python application objects to JSON compatible values (str, int/float, list, and dict).
Serialize it to JSON.
When deserializing:
Deserialize JSON to JSON compatible values.
Convert it to application objects.
This is what pydantic does.
This approach is symmetric. User can chose converter without changing serializer. This is nice separation of concerns.
Of course, this approach have drawbacks. Temporary JSON-compat objects consumes some RAM (although most int, float, and strings are shared between application objects and JSON-compat object).
To reduce RAM usage, SAX-like parser and incremental writer will be needed. But it is not good at usability.
I think two-stage approach works fine for 99% cases. And I don’t think Python stdlib should have hard-to-use writer/parser only for 1% use cases.
If you have a registry, by definition it supports multiple classes. At the very least it has to support the default JSON encoder plus at least one optional custom encoder. But why would anyone create a registry that only supports a maximum of one custom encoder?
With partial, you have to know which encoder to use at call time. Even if there is only one custom encoder, you still have to decide whether to use it or the default encoder.
if isinstance(obj, Spam):
dump_spam(obj)
else:
json.dumps(obj)
Whereas with a registry, you just call json.dumps(obj) without caring what obj is, and the registry handles the decision.
I guess this is perhaps an argument in favour of a registry over the status quo:
if isinstance(obj, Spam):
json.dumps(obj, cls=SpamEncoder)
else:
json.dumps(obj)
The bottom line is that the best case for partial is that it is no better than a registry, and the worst case is that it is significantly worse.
I’m still neutral on the idea of a registry, and still lean towards some sort of dunder as a more Pythonic solution.
Why would you need more than one registry? How would you specify which registry gets used?
It seems… really boring ^^’
I mean, pickle has __reduce__ magic method that allow you to skip all of this trouble. I agree that JSON does not deserve a magic method, but a way to simplify the life? I suppose yes.
pickle is for Python. pickle can serialize and deserialize user objects.
JSON is not for Python. Even if we add support some magic methods like __for_json__, it allow only serialization.
For example, if we can serialize datetime object into ISO 8601 string. But json module can not distinguish some string is datetime encoded in string or just a string.
So additional conversion layer like pydantic is necessary. Not just boring.
As a bonus, same conversion layer can be used for other formats like msgpack, yaml, and toml.
I might be misreading things, but my understanding was that single dispatch would only work on the top-level object, whereas the registry would work at any point in the tree. For instance, a list of custom objects, or a dictionary mapping strings to those objects, would simply look like a list or dict to singledispatch.
Whether you have a manual registry or use single dispatch, the handler for a type has to go back through the json.dumps machinery to use it, so it’s the same either way. Using single-dispatch means that the type registry would already be implemented and tested.
The following custom encoder uses singledispatch. It’s easy enough to write and use that I’d much rather recommend this approach over any sort of global registry built into the json module.
import json
from functools import singledispatch
# Implement an extensible encoder
@singledispatch
def to_json(obj):
return json.JSONEncoder.default(self, obj)
class CustomisableEncoder(json.JSONEncoder):
def default(self, obj):
return to_json(obj)
# Example of adding custom serialisation for dates and datetimes
from datetime import date, datetime
@to_json.register
def _(obj: datetime):
return f"Datetime: {obj}"
@to_json.register
def _(obj: date):
return f"Date: {obj}"
example = [1, 2, {"a": 3, "b": 4}, datetime.now(), date.today()]
print(json.dumps(example, cls=CustomisableEncoder))
I never said JSON and pickle are the same beasts I just said that things for json serialization could be more easy, as they are easy for pickle.
Unluckily it’s impossible to do the same thing for JSON string, but I never proposed something in that direction. Furthermore, I suppose people uses a custom json serializer more often than a custom deserializer.
I agree with this method, but I simply proposed a register that makes the point one more easy.
I don’t understand exactly why you’re so contrary to a register. copyreg is an apposite module that adds a register for pickle, for example. Why this is so bad in your eyes?