`dataclasses.asdicttype(type)`

From python/typing Gitter:

martin ➬: I have a couple of dataclasses and matching TypedDicts and it pains me so much to have all this redundancy. Why? Why doesn’t there seem to be a way to create a TypedDict from a dataclass, like I can make a TypeAdapter from a Pydantic model?

or at least let me use a TypedDict as the basis for the dataclasses. But no, right now it seems I have to define two classes with identical fields and god forbid I make a mistake and one has a:int and the other a:str, which is a function of redundancy and never a question of “if” but only ever “when”.

I agree with the general sentiment, and already stumbled upon such redundancy in my work.

Would dataclasses.asdicttype(type) be accepted in dataclasses stdlib module?

Alternatively it could be dataclasses.astypeddict(type), but that could be confusing by similarity to dataclasses.asdict(obj).

Similar functionality is currently provided as ubertyped (MIT-licensed).

3 Likes

Might you be able to briefly describe the use-case?

I probably just don’t use dataclasses or TypedDicts enough, but after reading the README for the ubertyped project you linked, which certainly describes the features very well, I don’t understand what the reason to use that library/this function would be.

Are dataclasses missing some first-class support in typing that TypedDicts have?

Data serialisation is the main use case. We can have adapters that convert types to accept external data or to output a JSON through a dict between some codebase layers. It can also be used while refactoring, to let gradually switch from a dict structure to a dataclass.

Then asdicttype would serve as a convenience function, to avoid redundancy.

While support for dataclasses.asdict(obj) returning TypedDict is blocked, it could serve as a way to enable semi-manually specifying the type:

from dataclasses import asdict, asdicttype, dataclass
from typing import cast

@dataclass
class InventoryItem:
    name: str
    unit_price: float
    quantity_on_hand: int = 0 


i = InventoryItem(name='a', unit_price=1.0)

id1 = asdict(i)
# reveal_type(id1)  # Revealed type is "builtins.dict[builtins.str, Any]"

id2 = cast(asdicttype(InventoryItem), id1)
# reveal_type(id2)  # Revealed type is [anonymous typed dict derived from InventoryItem]

Can you speak more to the use cases? I still don’t understand what the point is of this intermediate typeddict.

If I understand correctly, a minimal example (based on yours) would be:

from dataclasses import asdict, asdicttype, dataclass

@dataclass
class InventoryItem:
    name: str
    unit_price: float
    quantity_on_hand: int = 0 


i = InventoryItem(name='a', unit_price=1.0)

# Then later on you want to serialise it
to_serialise: asdicttype(i) = asdict(i) 

And this would work without needing a cast?

And this would work without needing a cast?

Actually it occurs that dict is not compatible with TypedDict which is more general type (covers also defaultdict and also some other mappings classes). So it would still require a cast now (or at least until mypy#10104 is implemented/merged). The minimal example would be:

from dataclasses import asdict, asdicttype, dataclass
from typing import cast

@dataclass
class InventoryItem:
    name: str
    unit_price: float
    quantity_on_hand: int = 0 


i = InventoryItem(name='a', unit_price=1.0)

to_serialize = cast(asdicttype(InventoryItem), i)
# reveal_type(to_serialize)  # Revealed type is [anonymous typed dict derived from InventoryItem]

Converting dataclass-like objects to dictionary is a very common pattern:

As you can notice, all of these essentially lose all the types with dict[str, Any] despite the fact that the resulting dictionary has the same keys as the fields on the original dataclass-like object.

I could imagine a typing construct like:

def asdict[T: DataClassLike](obj: T) -> AsTypedDict[T]: ...

To allow typing the above pattern.

The devil, however, is in the details so:

  • Should AsTypedDict[T] be recursive and represent nested dataclass-like objects as nested typed dicts? (Probably yes given that’s what common runtime behaviour seems to be)
  • How should AsTypedDict[T] handle fields that are typed as field: T | NoneRequired[T | None] or NotRequired[T]?
  • Probably more edge cases I can’t think of.

  1. cattrs has customizable unstructure strategies, so it can probably never be typed accurately. However, the default strategy is similar to asdict. ↩︎

1 Like

It’s not that simple. In cattrs at least, the result may only have a subset of the attribute names, the keys may be renamed (either to their alias or an arbitrary string), and the result may not actually be a dict at all (it may be a tuple, if that’s the configuration applied, or anything else really).

I still don’t really understand what is it that folks want to actually do with this resulting dictionary that requires exact typing? Usually it’s immediately passed over to a serialization library, which doesn’t really care about the type (it’s very dynamic).

5 Likes

Thank you @maciek for forwarding my message from Gitter/Matrix!

I welcome the opportunity to respond to @Tinche’s request for context, which I’ll shamelessly duplicate from this discussion over in the Pydantic forum. The following talks a about Pydantic BaseModel, but the exact same idea applies to dataclasses.

One feature of FastAPI I’m enamoured with is the idea of query parameter models, and it works like this: Say you have a model with a:str and b:int, then you can tell FastAPI that an API endpoint takes e.g. GET query parameters a:str and b:int simply by telling FastAPI that the model describes valid query string parameters. The same works for cookies and headers (I cannot link to the docs for these, as I am new user here who may only include two links per post… :().

This got me thinking: say I have two models (or dataclasses) describing parameters:

class One(BaseModel):
    a: str

class Two(BaseModel):
    b: int

FastAPI can form a union of those two and my API endpoint suddently knows how to handle a and b.

Now, assume I want to define a function that takes a and b as parameters, is there a way to reuse those BaseModel derivatives for typing a **kwargs dictionary, similar to if the above two were TypedDict definitions? Basically something like Unpack that works with Pydantic’s BaseModel? Like so:

def some_function(**kwargs: Unpack[One | Two]) -> None:
    ...

and that would mean that type checkers would type-check kwargs in the function call and warn if anything other than a:str and b:int was passed?

Looking forward to any insights or ideas how to achieve the above. Thank you!

@madduck I think this boils down to first having a way to convert a dataclass-like type to a TypedDict, that can then be used in Unpack[]. I’m not sure the union is the right way to express this. You’d probably need to first create a subclass of One and Two or have a way to create intersection types (in the TypeScript sense).

I’m currently in the process of trying to spec out a way to create TypedDicts dynamically using a comprehension syntax (Inlined typed dicts and typed dict comprehensions - #12 by Viicos). While it is hard to keep the scope limited, such TypedDict comprehensions could be created from a dataclass to express the requested feature here.

It’s clear there are some use cases for this, otherwise there wouldn’t be multiple people requesting this feature (e.g. related requests [1], [2]).

Using dataclasses as **kwargs arguments is the most common one.

Neither of the issues you mention are about serialization (although serialization gets mentioned early on in the first one), but about wrapping a (data)class’s __init__ in a type-safe way.

That’s an interesting use case but could probably be solved more elegantly and generally than converting a dataclass type into a typed dict to be used with Unpack. Those use cases also need the typeddict conversion to be shallow, whereas the serialization use case would probably need it to be deep (but I still don’t understand the serialization use case fully).

For example, it’d be cool if we could annotate kwargs (maybe *args too) as having the same type as a different callable’s kwargs, or the return value as having the same type as a different callable’s return value. It would probably solve these issues and help with other cases of function composition, too.

Personally, I’ve run into two main cases where this is useful:

  1. Building an immutable object from a dictionary

    @dataclass(frozen=True)
    class UserProfile:
        username: str
        email: str
        age: int
        is_admin: bool = False
        newsletter: bool = False
        theme: str = "light"
    
    
    def build_user_profile(defaults: bool, admin: bool, newsletter: bool, dark_mode: bool) -> UserProfile:
        kwargs = {
            "username": "guest" if defaults else "alice",
            "email": "guest@example.com" if defaults else "alice@example.com",
            "age": 0 if defaults else 30,
        }
    
        if admin:
            kwargs["is_admin"] = True
        if newsletter:
            kwargs["newsletter"] = True
        if dark_mode:
            kwargs["theme"] = "dark"
    
        return UserProfile(**kwargs)  
        # Argument 1 to "UserProfile" has incompatible type "**dict[str, str | int]"; expected "str"  [arg-type]
    

    In this case, I lose all the benefits of a typed dataclass, so I usually avoid this pattern and instead assign variables directly before passing them manually into the dataclass constructor. Something like kwargs: AsTypedDict[UserProfile] would make this approach type-safe and usable.

  2. Type hinting from_dict and to_dict methods on my dataclass-like objects

    Currently, I just fall back to dict[str, Any], which isn’t type-safe.

    @dataclass
    class Foo:
        bar: int
        baz: str
        @classmethod
        def from_dict(cls, data: AsTypedDict[Foo]) -> Foo: ...
        def to_dict(self) -> AsTypedDict[Foo]: ...
    
1 Like

Got it. My followup question is: what do you then do with the result of to_dict() that you can’t do with just an instance of Foo?

To sum up from my side, I hope that support for asdict returning TypedDict would be enabled by the intersection type and merged into mypy in the future. That would be a great improvement.

I think after above is working, we can get back to this thread (at least me personally :slight_smile: ).