Add `dataclass_factory` argument to `dataclasses.make_dataclass` for custom dataclass transformation support

Forward GitHub issue: python/cpython#118974

Feature or enhancement

Proposal:

typing.dataclass_transform (PEP 681 – Data Class Transforms) allows users define their own dataclass decorator that can be recognized by the type checker.

Here is a real-world example use case:

Also, dataclasses.asdict and dataclasses.astuple allow users pass an extra argument for the factory of the returned instance.

However, the make_dataclass function does not support third-party dataclass factory (e.g., flax.struct.dataclass):

It can only apply dataclasses.dataclass (see the return statement above).

This feature request issue will discuss the possibility of adding a new dataclass_factory argument to the dataclasses.make_dataclass to support third-party dataclasss transformation, similar to dict_factory for dataclasses.asdict.

# dataclasses.py

def make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True,
                   repr=True, eq=True, order=False, unsafe_hash=False,
                   frozen=False, match_args=True, kw_only=False, slots=False,
                   weakref_slot=False, module=None,
                   dataclass_factory=dataclass):
    ...

    # Apply the normal decorator.
    return dataclass_factory(cls, init=init, repr=repr, eq=eq, order=order,
                             unsafe_hash=unsafe_hash, frozen=frozen,
                             match_args=match_args, kw_only=kw_only, slots=slots,
                             weakref_slot=weakref_slot)
1 Like

Can you please show an example? How would you want to use this new param?

I want to re-export the dataclasses functionally in my own package. Here is the snippet to illustrate my use case:

# mypkg/
# β”œβ”€β”€ __init__.py
# └── dataclasses.py

import dataclasses

from typing_extensions import dataclass_transform  # Python 3.11+

from mypkg import xxx, yyy, zzz

__all__ = ['dataclass', 'field', 'make_dataclass']

@dataclass_transform(field_specifiers=(field,))
def dataclass(cls=None, /, *, **kwargs):
    xxx(kwargs)             # do something

    if cls is not None:
        klass = dataclasses.dataclass(cls, **kwargs)
        yyy(klass, kwargs)  # do something else
        return klass

    def wrapper(cls):
        klass = dataclasses.dataclass(cls, **kwargs)
        yyy(klass, kwargs)  # do something else
        return klass

    return wrapper

def field(**kwargs):
    zzz(kwargs)  # do something
    return dataclasses.field(kwargs)

def make_dataclass(**kwargs):
    return dataclasses.make_dataclass(
        dataclass_factory=dataclass,  # my own dataclass() above
        **kwargs,
    )

The users can do:

import mypkg


@mypkg.dataclasses.dataclass
class Foo:
    x: int
    y: int


Bar = mypkg.dataclasses.make_dataclass('Bar', [('a', float), ('b', int)])
1 Like

Do you really find Bar as nice as Foo? Seems significantly worse.

Can you not implement make_dataclass in your package by creating a custom type, adding in the annotations you want, and finally applying your dataclass function?

Yes, the normal use case of the @dataclass decorator is more elegant and readable. But sometimes there are use cases for dynamic class creation, just like subclassing typing.NamedTuple vs. calling collections.namedtuple.

NUM_LAYERS = 32

MyNetwork = dataclasses.make_dataclass('MyNetwork', [(f'layer{i}', Layer) for i in range(NUM_LAYERS)])

I understand, but can you not generate write a make_dataclass function of your own without delegating to the dataclasses.make_dataclass using the instructions in my last comment?

Oh, I see, but you want the annotations to be right. Got it.

I can do this, but I don’t think ordinary users can understand that and it is also not easy to use. I want to re-export the dataclasses functionally in my package and then ship it to PyPI.

NUM_LAYERS = 32

MyNetwork1 = dataclasses.make_dataclass('MyNetwork1', [(f'layer{i}', Layer) for i in range(NUM_LAYERS)])

MyNetwork2 = type('MyNetwork2', (object,), {'__annotations__': {f'layer{i}': Layer for i in range(NUM_LAYERS)}})
MyNetwork2 = dataclasses.dataclass(MyNetwork2)

Also, I do not want to copy-paste the code of dataclasses.make_dataclass in my package. I want to make it always sync with the stdlib.

1 Like