Well, this ended up much longer than I initially thought it would! And personally, I don’t have a huge need for this kind of thing, but it’s been a really interesting thought process to explore.
Something very similar to this proposal but more generic is Rust’s serde: “Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.”
Serde defines two traits (think of them as dunder methods) called Serialize and Deserialize. They are a way of mapping every Rust data structure into one of 29 possible types and Serde implements these for most of stdlib’s types. Third-party libraries are free to implement these traits on their structures (classes). This approach has been so popular that practically every Rust library implements these despite Serde not being a part of Rust’s stdlib. Rust also has feature gates, which means you can conditionally implement serde support. The convention in the Rust community is to gate Serde behind a feature (think of it as an extra in Python packaging), so consumers of the library who don’t care about serde won’t have to pull in a library and its dependencies for a feature they won’t use.
Now, serde itself cannot actually (de)serialize from one concrete format to another; it just acts as a unified API for those that want to. Crucially, serde doesn’t need to concern itself with the specific types that each format supports, as that is the responsibility of the client libraries.
For example, if you want to convert a struct to JSON, you’ll have to install serde_json:
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let point = Point { x: 1, y: 2 };
// Convert the Point to a JSON string.
let serialized = serde_json::to_string(&point).unwrap();
// Prints serialized = {"x":1,"y":2}
println!("serialized = {}", serialized);
// Convert the JSON string back to a Point.
let deserialized: Point = serde_json::from_str(&serialized).unwrap();
// Prints deserialized = Point { x: 1, y: 2 }
println!("deserialized = {:?}", deserialized);
}
This approach means “client” libraries that are tailored to specific formats (TOML, YAML, BSON, etc.) work with serde to do the (de)serializing.
Unlike pydantic or msgspec, serde’s scope is strictly limited to serialization and deserialization; it doesn’t automatically generate other methods like __init__, __repr__, or validation logic. This means that if a Rust library chooses not to enable the serde feature, the definition of its structs remains completely unchanged for users who don’t need serialization or deserialization. In contrast, the absence of inheriting from msgspec.Struct or pydantic.BaseModel in Python would drastically alter the class definition and its capabilities. Trying to achieve a similar optionality in Python by writing two separate class definitions, one inheriting from pydantic.BaseModel (or msgspec.Struct) and another without, is highly impractical due to code duplication and the complexity of maintaining two distinct versions of the same data structure.
To achieve something similar to serde in Python without these limitations, we could propose two new dunder methods:
def __serialize__(self) -> Any
def __deserialize__(cls, data: Any) -> Self
The __serialize__ method in Python would act similarly to a Rust type implementing the Serialize trait. Instead of mapping to one of serde’s 29 possible types, it would aim to map the Python object’s data into a restricted set of standard Python types like dictionaries, lists, strings, numbers, booleans, etc. This set would form an intermediate, format-agnostic representation of the object.
Then, similar to how serde_json or serde_yaml work in Rust, external Python libraries would be responsible for taking this intermediate representation and converting it into a specific format (like JSON, YAML, etc.). These libraries would understand the structure of the standard Python types and how to encode them according to the rules of their target format.
Conversely, the __deserialize__ class method would mirror the Deserialize trait. A format-specific library (e.g., one handling JSON) would first parse the incoming data from its format into a Python object. It would then call the __deserialize__ method of the target class, passing this object as the data argument. The __deserialize__ method would then be responsible for creating a new instance of the class and populating its attributes based on the data.
This separation of concerns, where the core class only needs to know how to represent itself in a standard Python way and the format-specific libraries handle the actual encoding and decoding, is a key aspect of serde’s design that contributes to its flexibility and wide adoption. It allows new formats to be supported simply by creating a new “client” library that understands the intermediate representation, without requiring changes to the core classes themselves. This approach also ensures that the class doesn’t become tightly coupled to any particular serialization format.
Here’s a crude implementation:
# --- User code ---
import json
from typing import Any, Self, Sequence
import base64
class Point:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
def __serialize__(self) -> dict[str, int]:
return {"x": self.x, "y": self.y}
@classmethod
def __deserialize__(cls, data: Any) -> Self:
if isinstance(data, dict):
return cls(float(data["x"]), float(data["y"]))
elif isinstance(data, Sequence):
return cls(float(data[0]), float(data[1]))
else:
raise TypeError(
f"Unsupported data type for deserializing Point: {type(data).__name__}. "
"Expected a dictionary or a sequence."
)
# --- Hypothetical third-party JSON (de)serializing library ---
def default(obj: Any) -> Any:
"""
Example handler for the limited set of types returned by __serialize__.
"""
if isinstance(obj, bytes):
return base64.b64encode(obj).decode("utf-8")
if isinstance(obj, ...): # Some other type
return ...
if isinstance(obj, ...): # Some other type
return ...
raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable.")
def to_json_string(obj: Any) -> str:
if hasattr(obj, "__serialize__"):
return json.dumps(obj.__serialize__(), default=default)
raise TypeError(f"Object of type {type(obj).__name__} is not serializable.")
def from_json_string(json_str: str, cls: type[Self]) -> Self:
try:
data = json.loads(json_str)
if hasattr(cls, "__deserialize__"):
return cls.__deserialize__(data)
raise TypeError(f"Class {cls.__name__} is not deserializable.")
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON string: {e}")
except (TypeError, ValueError) as e:
raise ValueError(f"Deserialization error for {cls.__name__}: {e}")
# --- Usage Example ---
point = Point(5, 10)
# Serialization
try:
json_output = to_json_string(point)
print(f"Serialized to JSON: {json_output}")
except TypeError as e:
print(f"Serialization Error: {e}")
# Deserialization
try:
deserialized_point = from_json_string(json_output, Point)
print(f"Deserialized Point: x={deserialized_point.x}, y={deserialized_point.y}")
except (TypeError, ValueError) as e:
print(f"Deserialization Error: {e}")
For more convenience, the Python standard library could potentially include @serializable and @deserializable decorators. When applied to a class, these decorators could automatically generate the __serialize__ and __deserialize__ methods. This functionality would be applicable if all attributes of the class are types from the standard library (such as integers, strings, lists, datetime, uuid, dictionaries, etc). The @serializable decorator would generate a __serialize__ method returning a dictionary of the object’s attributes, and the @deserializable decorator would generate a __deserialize__ class method to reconstruct the object from a dictionary. This could simplify the process for making data classes serializable and deserializable in many common cases.