Adding `serialize` and `deserialize` Dunder Methods for Object Serialization

Monarch · July 18, 2025, 6:11pm

Yes, in fact, this extends to quite a few standard library types already. This is what I want to extend to arbitrary types that don’t need explicit support from framework authors.

There’s also the fact that a third-party type could be using a vanilla class, pydantic.BaseModel, or msgspec.Struct completely out of your control. This third-party type may most likely also provide methods that have already done the hard work of (de)serialization, but neither Pydantic nor Msgspec would know how to invoke this.

Take this example. I wrote a simple Package class where I want to use third-party types like packaging.version.Version and whenever.Instant.

from typing import Annotated, Any

import msgspec
from packaging.version import InvalidVersion, Version
from pydantic import BaseModel, BeforeValidator, ConfigDict, PlainSerializer
from whenever import Instant

# --- Pydantic Implementation ---

def version_validator(obj: Any) -> Version:
    if isinstance(obj, str):
        try:
            return Version(obj)
        except InvalidVersion:
            raise ValueError("Bad Version!")
    raise ValueError(f"Cannot convert {type(obj).__name__} to Version.")

class PackageModel(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)
    name: str
    version: Annotated[Version, BeforeValidator(version_validator), PlainSerializer(lambda ver: str(ver), return_type=str, when_used='json')]
    upload_date: Instant # whenever.Instant supports Pydantic out of the box.

# --- Msgspec Implementation ---

def enc_hook(obj: Any) -> Any:
    if isinstance(obj, Version):
        return str(obj)
    if isinstance(obj, Instant):
        return obj.format_common_iso()
    raise NotImplementedError(f"Objects of type {type(obj)} are not supported")


def dec_hook(type: type[Any], obj: Any) -> Any:
    if type is Version and isinstance(obj, str):
        return Version(obj)
    if type is Instant and isinstance(obj, str):
        return Instant.parse_common_iso(obj)
    raise NotImplementedError(f"Objects of type {type} are not supported")


class PackageStruct(msgspec.Struct):
    name: str
    version: Version
    upload_date: Instant

# --- Usage Example ---

obj = {"name": "foobar","version": "1.0", "upload_date": "2025-07-18 16:12:30.481346+00:00"}

pydantic_test = PackageModel.model_validate(obj)
msgspec_test = msgspec.convert(obj, type=PackageStruct, dec_hook=dec_hook)

assert pydantic_test.upload_date == msgspec_test.upload_date
assert pydantic_test.version == msgspec_test.version
assert pydantic_test.model_dump_json() == msgspec.json.encode(msgspec_test, enc_hook=enc_hook).decode()

As you can see, even though types like packaging.version.Version and whenever.Instant inherently know how to handle their own (de)serialization, I still had to write specific validators and hooks for Pydantic and Msgspec, respectively. This boilerplate is exactly what I’m hoping to avoid for arbitrary types that already provide their own constructor methods.

You’ll also notice how whenever.Instant happens to implement Pydantic support (which is identical to calling Instant.parse_common_iso(obj)). This is great if you’re using Pydantic, otherwise you’re out of luck.

A Note on Third-Party Libraries:

I’m not in any way suggesting this is a fault on the authors of any third-party types mentioned in this post. They’ve done an amazing job already, and it’s up to them which framework to support (if any). It’s also not practical to ask every library author to implement support for every popular framework.

ncoghlan · July 19, 2025, 12:36am

The initial proposal in this most reminds me of the to_builtins (“serialise”)/convert(“deserialise”) function pair in msgspec.

I use those heavily in a library where the wire format is json, but the actual json serialisation happens in the underlying HTTP library, so jumping straight from Struct to string is inconvenient, but the Struct to dict conversion means the network facing layers only need to handle dicts containing JSON-compatible builtin types.

A pair of to_stdtypes/from_stdtypes single-dispatch functions somewhere in the standard library (maybe the copy module?) seems plausibly useful, but I share the concerns others have raised about the problem space being too complex for a new dunder protocol to really help.

rrolls · July 19, 2025, 6:58am

-1

I would like to make this claim: in general, good serde code needs to know about the implementation details of both the serialised format and the in-memory format. Therefore, an attempt to introduce a “standard way of serialising all objects in a language” will always fail - more specifically, it may “work”, it may well give you a way to serialise and deserialise arbitrary objects, but it will always have the same shortcomings in terms of flexibility. Where a project needs serde, it is better if the developers consider the serialised formats and in-memory formats required as part of their design, and write their own serde code specific to that.

This is why I avoid things like pickle, which serialises Python objects into… a Python-specific binary format that isn’t relevant anywhere else. If all I want is “dump this state and re-load it later, I don’t care about speed or size”, I use JSON - my codebases have plenty of functions to “serialise” things into something that can be passed to json.dumps and “deserialise” things from something returned from json.loads. One huge advantage JSON has over pickle is that I can just open up the serialised state and find out what’s in there, and even fiddle about with it, which is incredibly useful during development. When JSON isn’t suitable enough, for whatever reason, I’ll write my own serde code to go all the way to binary and back - and then, should I ever wish to use a different language to read/write the same format, I need only port my existing serde code, rather than having to implement the Python-specific pickle format somewhere else.

Specifically, the OP’s proposal:

would only encourage everyone to write all serde code with JSON (and other JSON-like formats) in mind. See, when I’m encoding to JSON and back, I write code that pretty much follows this exact specification - but I’m consciously making the choice to do that each time I do it - and the option to write some other code instead is always open. If Python standardises a protocol or convention to serialise to JSON-like structures, while claiming that that is the way to do serde (rather than correctly presenting it as being specifically tailored to JSON, to invite developers to consider alternatives), it would be horrible for anyone that wants to use some other format.

Probably the best example of an alternative format I can think of is XML. XML, I believe, is widely misused and misunderstood. I have seen many projects out there that claim to support both JSON and XML, and they technically do, but what they’re really doing is converting everything to/from a JSON-like structure, so the JSON looks natural, but XML is treated as a second-class citizen: only a limited subset of XML is used, and it just represents the JSON, so you get things like this:

<dict>
  <element key="some_tag_name">
    <dict>
      <element key="some_string_property_name">
        <string value="hello world"/>
      </element>
      <element key="some_int_property_name">
        <integer value="42">
      </element>
    </dict>
  </element>
</dict>

rather than a sensible use of XML:

<some_tag_name
  some_string_property_name="hello world"
  some_int_property_name="42"
/>

The same thing has happened with YAML: various projects claim to support both YAML and JSON, but here it is JSON that is treated as the second-class citizen, so if you want to actually use its JSON offering, you have to put up with a weird clunky format that actually just represents the YAML, meaning you end up with a lot of boilerplate and all the disadvantages of YAML with none of the advantages - so it would have been better overall if the project just decided to use only YAML so that users had to use YAML directly.

So, to answer the question of what I think is a better alternative to the proposal, I’d simply say status quo is better: let projects decide their serde formats individually, let library authors consider this problem, and let library users deal with whatever choices the library author made.

Of course, if I’m wrong, and there actually is some magic silver bullet that I’m missing here, I’d love to know about it, but I’ve not come across one yet.

EpicWink · July 19, 2025, 9:47am

@Monarch could you please link to the prior discussion in this forum and in the Python mailing list? I’m sure I’ve seen this idea discussed before.

One way to improve flexibility is for the consumer to include a set of allowed types to the implementer of __serialize__ as an argument: the implementer must be able to downgrade their type specificity or raise some error. For example, when serialising to JSON, datetimes are unsuitable, but they’re valid on YAML.

Similarly, the implementer would expose a set of supported types for __deserialize__ (eg as a third attribute).

Monarch · July 19, 2025, 9:53am

Only other thread I’m aware of is: Introduce a __json__ magic method

Adding `__serialize__` and `__deserialize__` Dunder Methods for Object Serialization

A Note on Third-Party Libraries:

Adding `serialize` and `deserialize` Dunder Methods for Object Serialization