Add `dataclass_factory` argument to `dataclasses.make_dataclass` for custom dataclass transformation support

Forward GitHub issue: python/cpython#118974

Feature or enhancement


typing.dataclass_transform (PEP 681 – Data Class Transforms) allows users define their own dataclass decorator that can be recognized by the type checker.

Here is a real-world example use case:

Also, dataclasses.asdict and dataclasses.astuple allow users pass an extra argument for the factory of the returned instance.

However, the make_dataclass function does not support third-party dataclass factory (e.g., flax.struct.dataclass):

It can only apply dataclasses.dataclass (see the return statement above).

This feature request issue will discuss the possibility of adding a new dataclass_factory argument to the dataclasses.make_dataclass to support third-party dataclasss transformation, similar to dict_factory for dataclasses.asdict.


def make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True,
                   repr=True, eq=True, order=False, unsafe_hash=False,
                   frozen=False, match_args=True, kw_only=False, slots=False,
                   weakref_slot=False, module=None,

    # Apply the normal decorator.
    return dataclass_factory(cls, init=init, repr=repr, eq=eq, order=order,
                             unsafe_hash=unsafe_hash, frozen=frozen,
                             match_args=match_args, kw_only=kw_only, slots=slots,
1 Like

Can you please show an example? How would you want to use this new param?

I want to re-export the dataclasses functionally in my own package. Here is the snippet to illustrate my use case:

# mypkg/
# β”œβ”€β”€
# └──

import dataclasses

from typing_extensions import dataclass_transform  # Python 3.11+

from mypkg import xxx, yyy, zzz

__all__ = ['dataclass', 'field', 'make_dataclass']

def dataclass(cls=None, /, *, **kwargs):
    xxx(kwargs)             # do something

    if cls is not None:
        klass = dataclasses.dataclass(cls, **kwargs)
        yyy(klass, kwargs)  # do something else
        return klass

    def wrapper(cls):
        klass = dataclasses.dataclass(cls, **kwargs)
        yyy(klass, kwargs)  # do something else
        return klass

    return wrapper

def field(**kwargs):
    zzz(kwargs)  # do something
    return dataclasses.field(kwargs)

def make_dataclass(**kwargs):
    return dataclasses.make_dataclass(
        dataclass_factory=dataclass,  # my own dataclass() above

The users can do:

import mypkg

class Foo:
    x: int
    y: int

Bar = mypkg.dataclasses.make_dataclass('Bar', [('a', float), ('b', int)])
1 Like

Do you really find Bar as nice as Foo? Seems significantly worse.

Can you not implement make_dataclass in your package by creating a custom type, adding in the annotations you want, and finally applying your dataclass function?

Yes, the normal use case of the @dataclass decorator is more elegant and readable. But sometimes there are use cases for dynamic class creation, just like subclassing typing.NamedTuple vs. calling collections.namedtuple.


MyNetwork = dataclasses.make_dataclass('MyNetwork', [(f'layer{i}', Layer) for i in range(NUM_LAYERS)])

I understand, but can you not generate write a make_dataclass function of your own without delegating to the dataclasses.make_dataclass using the instructions in my last comment?

Oh, I see, but you want the annotations to be right. Got it.

I can do this, but I don’t think ordinary users can understand that and it is also not easy to use. I want to re-export the dataclasses functionally in my package and then ship it to PyPI.


MyNetwork1 = dataclasses.make_dataclass('MyNetwork1', [(f'layer{i}', Layer) for i in range(NUM_LAYERS)])

MyNetwork2 = type('MyNetwork2', (object,), {'__annotations__': {f'layer{i}': Layer for i in range(NUM_LAYERS)}})
MyNetwork2 = dataclasses.dataclass(MyNetwork2)

Also, I do not want to copy-paste the code of dataclasses.make_dataclass in my package. I want to make it always sync with the stdlib.

1 Like