Nested dataclasses: reducing structural boilerplate

When working with deeply nested data structures, @dataclass requires each level to be defined separately and wired together manually:

@dataclass
class GrandChild:
    grandchild_str: str = "grandchild1"
    grandchild_num: int = 1

@dataclass
class Child:
    grandchild: GrandChild = field(default_factory=GrandChild)
    child_str: str = "child"

@dataclass
class Parent:
    child: Child = field(default_factory=Child)
    parent_str: str = "parent"

The nesting structure is only implicit — you have to read all three classes to understand the shape of the data. The field(default_factory=...) wiring is pure boilerplate.

PEP 712’s field(converter=...) helps at the field level, but the structural verbosity remains.

This gap became concrete while working on fargv, an argument parser prefering dataclass definitions. Nested subcommands have no ergonomic definition syntax in vanilla dataclasses — the hierarchy is real, but expressing it readably requires either significant boilerplate or giving up on dataclasses altogether.


I’ve been experimenting with a decorator that lets you express the same hierarchy inline:

@deep_dataclass
class Parent:
    class child:
        class grandchild:
            grandchild_str: str = "grandchild1"
            grandchild_num: int = 1
        child_str: str = "child"
    parent_str: str = "parent"

The decorator recursively converts nested class blocks into proper @dataclass types and wires field(default_factory=...) automatically. The result is fully compatible with asdict(), ==, and all other stdlib dataclass tooling.

Notably, the motivation here is readability and expressiveness — not serialization. Libraries like dacite and pydantic solve dict-to-dataclass coercion well, but the problem of defining a nested hierarchy cleanly is separate and, I think, underserved.

I’ve published an early version at deep-dataclasses · PyPI and the source is at [ GitHub - anguelos/deep_dataclasses: a decorator to create nested dataclasses from nested class definitions. · GitHub ].

Questions I’d like community input on:

  1. Is the nested class syntax a natural fit, or does it feel like it takes too many liberties with class definition conventions?
  2. Is there a better way to express nested hierarchies that I’m missing?
  3. Is there appetite for something like this in the stdlib, perhaps as an addition to the dataclasses module?

Happy to discuss tradeoffs — this is early and I’m genuinely uncertain about the right direction.

8 Likes

It looks like a fairly reasonable way of expressing nested dataclasses to me (although I wonder how well it would work if multiple fields all had the same type/structure). But it’s not a problem I’ve seen occur often, and I suspect it’s sufficiently rare that it would be better remaining a 3rd party package extending dataclasses, rather than trying to get it into the stdlib. If it turns out to be popular, it can always be added to the stdlib later.

5 Likes

At first reading this looks nice. The proposed structure makes the structure definition look more similar to the resulting jsons, which is nice.
Building classes that fit together like this is something I do (though not particularly often). And the current method of defining them does end up looking like a sprawl.

I don’t expect this would go into the standardlib, certainly not immediately.

Professionally I could only use your library if it was included in fastapi, and that’s a steep hill to climb. I also suspect your work wouldn’t feel ‘nice’ anymore after it got mixed with pydantic.

2 Likes

How would this be different from

  • having the nested structure
  • add @dataclass before each class
  • and add a member field (e.g. child: Child) for each nested class?

That’s admittedly somewhat longer, but it’s also more flexible. I.e., it allows for different field types (e.g. list[Child], which better reflects reality). And that’s possible right now, unless I misunderstand your intent.

4 Likes

I’d like to note that this pep was rejected and that converters like this don’t actually exist in dataclasses.

You’re also currently getting __annotations__ from the class dictionary in order to change how the dataclass is constructed and this won’t work in Python 3.14 or later under PEP-649 annotations.

1 Like

I can see the appeal of structuring a dataclass more like the target json, but I also feel like this absolutely has some possible footguns (specifically with indentation as the complexity grows).

Are the inner definitions available on the module scope? Say for access in type hints? How do you handle sequences? If I was to add a Sibling class to your example, would I need to define that outside the nested structure and use a ..default_factory=list[Sibling])? What if I wanted to add a GrandParent class? Since that would now be the outermost scope, do I need to adjust the hierarchy for it?

I do agree that there’s a lot of boilerplate with structures like TypedDict and dataclass, but being able to compose those parts freely without having to track tons of levels of indentation and scope is the tradeoff you get.

@dataclass
class DeepParent:
    @dataclass
    class Child:
        @dataclass
        class GrandChild:
            grandchild_str: str = "grandchild1"
            grandchild_num: int = 1

        grandchild: GrandChild = field(default_factory=GrandChild)
        child_str: str = "child"

    child: Child = field(default_factory=Child)
    parent_str: str = "parent"
Test
from dataclasses import dataclass, field, asdict


@dataclass
class GrandChild:
    grandchild_str: str = "grandchild1"
    grandchild_num: int = 1


@dataclass
class Child:
    grandchild: GrandChild = field(default_factory=GrandChild)
    child_str: str = "child"


@dataclass
class Parent:
    child: Child = field(default_factory=Child)
    parent_str: str = "parent"


@dataclass
class DeepParent:
    @dataclass
    class Child:
        @dataclass
        class GrandChild:
            grandchild_str: str = "grandchild1"
            grandchild_num: int = 1

        grandchild: GrandChild = field(default_factory=GrandChild)
        child_str: str = "child"

    child: Child = field(default_factory=Child)
    parent_str: str = "parent"


print(DeepParent().child)
print(DeepParent().child.grandchild)

assert asdict(DeepParent()) == asdict(Parent())

The fact that @deep_dataclass applies transformations to nested classes makes its behavior less transparent. Why is it limited to classes inside DeepParent instead of all callables? A function-based API, or a stateful class with configurable transformations, would be easier to understand and reason about.

3 Likes

@hwelch raises some good points about typing.

It would be rather critical for this being useful (in my code-base at least) that it would be possible to type a function as

def f(p: GrandChild): ...

or something in that direction.

I think actually the code / typing structure

@deep_dataclass
class Person:
    class Child:
        class Child:
            name: str = "grandchild1"
            num: int = 1
        name: str = "child"
    name: str = "parent"

def f(p: Person.Child.Child): ...

q = Person(
  child = Person.Child(
    child = Person.Child.Child(
      name="a",
      num=2,
    ),
    name="c",
  ),
  name="d",
)

f(q.child.child)

would be nicest.


I wouldn’t use this deep_dataclass when I need flexibility, but the structure it allows is precisely what I use for settings jsons (which need to be parsed and validated before being used). So I think it does have potential to be a valuable tool in the toolkit. Even if 10% of fields couldn’t be described in this nested class definition (due to being lists of custom classes for example), I think it would still be valuable.

1 Like

Thank you all for the thoughtful feedback — it has directly shaped the project in the days since the opening post. I want to address each point and share how things have evolved.


On the core motivation (stepping back from the boilerplate argument)

Reading your replies made me realise I underemphasised the real motivation. The boilerplate is annoying, but the deeper issue is that Python has no clean, idiomatic way to express a pure hierarchical data structure as a single coherent definition. @dataclass handles flat records beautifully. TypedDict, NamedTuple, attrs — all flat or one level deep. For genuinely nested pure data schemas, there is no first-class answer. @deep_dataclass is my attempt at one.


@DavidCEllis — PEP 712 and PEP 649

Thank you for the PEP 712 correction — I was wrong to cite it as landed and have fixed this. More importantly, the PEP 649 concern about __annotations__ was a real bug. I have addressed it by switching to inspect.get_annotations(), which is the stdlib-blessed solution for both PEP 563 and PEP 649 compatibility. Streamlining the handling of both is still ongoing.


@Dutcho and @elis.byberi — why not just nest @dataclass manually?

Elis’s example was valuable — thank you for writing it out. It gave me the opportunity to make explicit something I had underemphasised in the opening post: the round-trip via asdict() is broken for vanilla nested dataclasses, and was in fact a founding motivation for this project:

NestedParent(**asdict(NestedParent())) == NestedParent()  # False

This is because asdict() serialises nested instances to plain dicts, but @dataclass does not coerce them back on construction. @deep_dataclass fixes this — dict coercion at construction time works at all depths. This is arguably more important than the definition ergonomics, and reflects a broader view that @dataclass is mostly designed for flat representations.


@hwelch — sequences, Union, and scope

These were the right questions to ask. The project now supports List, Tuple, Optional, Union, and Literal annotations via an @auxiliary decorator that marks an inner class as a type-only helper rather than a standalone field:

@deep_dataclass
class Config:
    @auxiliary
    class TrainMode:
        lr: float = 0.001
    @auxiliary
    class TestMode:
        metric: str = "accuracy"
    mode: Union[TrainMode, TestMode] = field(default_factory=TrainMode)
    device: Literal["cpu", "cuda"] = "cpu"
    images: List[str] = field(default_factory=list)

When constructing from a dict, the correct Union variant is selected by matching field names.


@peterc — JSON Schema and broader utility

The project now includes to_json_schema(), which exports the full schema including Literal, Union, and List constraints for use with jsonschema or any other validator. This also addresses the config file use case more concretely — @deep_dataclass + tomllib + to_json_schema is a fully stdlib-friendly validated config stack with no heavy dependencies.


@pf_moore — on the path forward

I’m following exactly the path you suggested: staying as a third-party package and letting real-world use determine whether stdlib inclusion is ever warranted. My honest concern is that I’m not active on social media and don’t have the means to reach a wider audience independently. If anyone has suggestions for how a small stdlib-adjacent library finds its audience organically, I’d genuinely welcome them.


The project is at GitHub and PyPI. On ecosystem compatibility — naive but passing test cases for dacite, dataclass-wizard, and pydantic.dataclasses are in the test suite. Because @deep_dataclass produces standard dataclass instances, compatibility is largely automatic — these tests exist to catch regressions rather than bridge any fundamental gap. Feedback on the @auxiliary API and the autosnake naming option would be especially welcome — those feel like the parts most likely to benefit from community input before stabilising.

2 Likes

With respect, this reads strongly like ChatGPT output. It’s worth noting that I’ve had ChatGPT reference the converter PEP as though it were an accepted language feature, and seeing that in your original post made me wonder a little even before reading this most recent reply.

Could you confirm if you’re using ChatGPT to generate your responses / posts here? I understand those tools feel helpful, but ultimately it doesn’t make for a good approach to language design discussions where first-hand experience of hitting real problems and recognising appropriate solutions really matters.

I have to admit, I did use an LLM for editing my posts here; I mostly did that because I am unfamiliar with the tone and etiquette apreciated here and avoiding english mistakes, typos in the source blocks etc… But for each of my posts I did spend good time and thought writing it. As for citing PEP 712, I did not express my self clearly, I was aware that PEP 712 was rejected when I mentionend it, I did so to indicate I am aware that there is a decision on limiting what can be done with @dataclassdataclassdataclassdataclass coersion, I thought that even rejected PEPs form key documentation to the evolution of the language. I guess at the same time I should disclose that I employed LLMs for the docstrings and increassing the testcase cov@deep@deep_dataclassdataclassrage in @deep_dataclass.

1 Like

I believe that while these are legitimate concerns, there are fairly simple ways to express each of these, even without the need for new decorators or changing how the O.P. decorator will work in most cases:

Note that any time the nesting level becomes overwhelming, the suggested approach doesn’t prevent one of pre-defining the child classes elsewhere, and just attribute the field as is done in the “current way to do it” example. Maybe the decorator could be a bit smarter and prevent the need for the explicit `Field` with a default class altogether for dataclass members.

No. And they wouldn’t need to. Neither for type hints. If at any point you want to instantiate a separate GrandChild, it is valid Python (and introspectable for tools) to write:

`Parent.Child.GrandChild` . No changes needed to the OP proposed decorator behavior, or on how to Python treats this.

Sequences is something the OP is not concerned about - but I agree they could be useful, and if the novelty would be developed, would deserve a mechanism to them.
Then, it would be as simple as having a second decorator to be applied to each inner class which could specify metadata about the field itself, including that it should behavior as a sequence.

Yes. As usual in Python: you need no new rules, or changes for things to work as expected, following the existing rules.


@deep_dataclass
class A:
    class B:
        c: str
    d: B

is just plain Python and would simply work. A decorator to adjust some field metadata could make it even better (So, the field that exposes the first `B` sibling could have another name than the class, for example)

Why would you?
You can either indent the whole construct, or just add the GrandParent at the end, and define the Parent class as its field, at the end - to avoid indentation levels that impair reading rather than helping it. Both approaches look straightforward for me.

The current requested approach doesn’t require any changes for places where the current approach makes more sense, while adding a lot of flexibility and conciseness to the way they are written, where feasible.

1 Like

Thanks for going point by point, I’m satisfied with those answers and it seems like if this was to end up somewhere it would just allow for a simpler representation of simple constructs.

Not unlike how dataclasses were meant to reduce boilerplate for regular classes, but can still be used alongside them.

I think making sure that those points are included in the documentation for any deep_dataclass module would help with adoption since those are all pretty readily apparent questions anyone who sees one would have.

1 Like

I have improved the original 0.1.0 functionality trying to address was has been pointed out here.

For now the only documentation is the docstrings and the README.md current version is 0.3.3 I will try to add better documentation in the weekend.

I hope I am not beeing to minute on my questions/answers:

Dropping the explicit field(factory=.. is in my todo’s.

Should it be limited to classes defined in the @deep_dataclass?

Should it be limited to any @dataclass?

Or should any class be allowed.

I suspect any @dataclass/@deep_dataclass would be apropriate as it guarranties dict coersion will always work.

I got interested in sequences the moment they were mentioned.

For the moment the design I have implemented covers most typehint types including Union and List etc..

from dataclasses import field, asdict
from typing import Literal, List, Union
from deep_dataclasses import deep_dataclass, auxiliary, to_json_schema


@deep_dataclass
class Config:
    @auxiliary
    class TrainMode:
        lr: float = 0.001
        pseudo_batch_size: int = 32
    @auxiliary
    class TestMode:
        metric: str = "accuracy"
        folds: int = 5
    mode: Union[TrainMode, TestMode] = field(default_factory=TrainMode)
    device: Literal["cpu", "cuda"] = "cpu"
    images: List[str] = field(default_factory=list)

At the moment a sinple nested class by default is interpreted to be defining a field as well. If decorated with @auxiliary it only defines the type without a datafield, it can then be explictly used for collection fileds etc.

Is the @auxiliary decorator apropriaetly named?

Would it be worth automating the field wrapper for deep_dataclass classes, any problem in automating it for all classes?

When testing about this, defining a class GrandParent linking having an independently defined Parent, in my implementation had a bit of problem when defining inside a function, and using future annotations (PEP 563) because the localns of the decorator applied on Grandparent was not aware of the definition of @deep_dataclass Paren , both for @deep_dataclass and @dataclass

have you tested unions of the form A | B, or just Union[A, B]? those constructs have different origin types iirc, so you may need to handle them separately.

Yes they are different types but practically behave the same so their differences can be contained in a few lines of code. Both variants now (0.3.4) are supported. Thanks for reminding me. Restructuring code to unify typehints, reduced deep_dataclass.py from 516 lines to 392. Maybe more extencive testing is required.

1 Like