martin ➬: I have a couple of dataclasses and matching TypedDicts and it pains me so much to have all this redundancy. Why? Why doesn’t there seem to be a way to create a TypedDict from a dataclass, like I can make a TypeAdapter from a Pydantic model?
or at least let me use a TypedDict as the basis for the dataclasses. But no, right now it seems I have to define two classes with identical fields and god forbid I make a mistake and one has a:int and the other a:str, which is a function of redundancy and never a question of “if” but only ever “when”.
I agree with the general sentiment, and already stumbled upon such redundancy in my work.
Would dataclasses.asdicttype(type) be accepted in dataclasses stdlib module?
Alternatively it could be dataclasses.astypeddict(type), but that could be confusing by similarity to dataclasses.asdict(obj).
Similar functionality is currently provided as ubertyped (MIT-licensed).
Might you be able to briefly describe the use-case?
I probably just don’t use dataclasses or TypedDicts enough, but after reading the README for the ubertyped project you linked, which certainly describes the features very well, I don’t understand what the reason to use that library/this function would be.
Are dataclasses missing some first-class support in typing that TypedDicts have?
Data serialisation is the main use case. We can have adapters that convert types to accept external data or to output a JSON through a dict between some codebase layers. It can also be used while refactoring, to let gradually switch from a dict structure to a dataclass.
Then asdicttype would serve as a convenience function, to avoid redundancy.
If I understand correctly, a minimal example (based on yours) would be:
from dataclasses import asdict, asdicttype, dataclass
@dataclass
class InventoryItem:
name: str
unit_price: float
quantity_on_hand: int = 0
i = InventoryItem(name='a', unit_price=1.0)
# Then later on you want to serialise it
to_serialise: asdicttype(i) = asdict(i)
Actually it occurs that dict is not compatible with TypedDict which is more general type (covers also defaultdict and also some other mappings classes). So it would still require a cast now (or at least until mypy#10104 is implemented/merged). The minimal example would be:
from dataclasses import asdict, asdicttype, dataclass
from typing import cast
@dataclass
class InventoryItem:
name: str
unit_price: float
quantity_on_hand: int = 0
i = InventoryItem(name='a', unit_price=1.0)
to_serialize = cast(asdicttype(InventoryItem), i)
# reveal_type(to_serialize) # Revealed type is [anonymous typed dict derived from InventoryItem]
As you can notice, all of these essentially lose all the types with dict[str, Any] despite the fact that the resulting dictionary has the same keys as the fields on the original dataclass-like object.
Should AsTypedDict[T] be recursive and represent nested dataclass-like objects as nested typed dicts? (Probably yes given that’s what common runtime behaviour seems to be)
How should AsTypedDict[T] handle fields that are typed as field: T | None – Required[T | None] or NotRequired[T]?
Probably more edge cases I can’t think of.
cattrs has customizable unstructure strategies, so it can probably never be typed accurately. However, the default strategy is similar to asdict. ↩︎
It’s not that simple. In cattrs at least, the result may only have a subset of the attribute names, the keys may be renamed (either to their alias or an arbitrary string), and the result may not actually be a dict at all (it may be a tuple, if that’s the configuration applied, or anything else really).
I still don’t really understand what is it that folks want to actually do with this resulting dictionary that requires exact typing? Usually it’s immediately passed over to a serialization library, which doesn’t really care about the type (it’s very dynamic).
Thank you @maciek for forwarding my message from Gitter/Matrix!
I welcome the opportunity to respond to @Tinche’s request for context, which I’ll shamelessly duplicate from this discussion over in the Pydantic forum. The following talks a about Pydantic BaseModel, but the exact same idea applies to dataclasses.
One feature of FastAPI I’m enamoured with is the idea of query parameter models, and it works like this: Say you have a model with a:str and b:int, then you can tell FastAPI that an API endpoint takes e.g. GET query parameters a:str and b:int simply by telling FastAPI that the model describes valid query string parameters. The same works for cookies and headers (I cannot link to the docs for these, as I am new user here who may only include two links per post… :().
This got me thinking: say I have two models (or dataclasses) describing parameters:
class One(BaseModel):
a: str
class Two(BaseModel):
b: int
FastAPI can form a union of those two and my API endpoint suddently knows how to handle a and b.
Now, assume I want to define a function that takes a and b as parameters, is there a way to reuse those BaseModel derivatives for typing a **kwargs dictionary, similar to if the above two were TypedDict definitions? Basically something like Unpack that works with Pydantic’s BaseModel? Like so:
@madduck I think this boils down to first having a way to convert a dataclass-like type to a TypedDict, that can then be used in Unpack[]. I’m not sure the union is the right way to express this. You’d probably need to first create a subclass of One and Two or have a way to create intersection types (in the TypeScript sense).
I’m currently in the process of trying to spec out a way to create TypedDicts dynamically using a comprehension syntax (Inlined typed dicts and typed dict comprehensions - #12 by Viicos). While it is hard to keep the scope limited, such TypedDict comprehensions could be created from a dataclass to express the requested feature here.
Neither of the issues you mention are about serialization (although serialization gets mentioned early on in the first one), but about wrapping a (data)class’s __init__ in a type-safe way.
That’s an interesting use case but could probably be solved more elegantly and generally than converting a dataclass type into a typed dict to be used with Unpack. Those use cases also need the typeddict conversion to be shallow, whereas the serialization use case would probably need it to be deep (but I still don’t understand the serialization use case fully).
For example, it’d be cool if we could annotate kwargs (maybe *args too) as having the same type as a different callable’s kwargs, or the return value as having the same type as a different callable’s return value. It would probably solve these issues and help with other cases of function composition, too.
Personally, I’ve run into two main cases where this is useful:
Building an immutable object from a dictionary
@dataclass(frozen=True)
class UserProfile:
username: str
email: str
age: int
is_admin: bool = False
newsletter: bool = False
theme: str = "light"
def build_user_profile(defaults: bool, admin: bool, newsletter: bool, dark_mode: bool) -> UserProfile:
kwargs = {
"username": "guest" if defaults else "alice",
"email": "guest@example.com" if defaults else "alice@example.com",
"age": 0 if defaults else 30,
}
if admin:
kwargs["is_admin"] = True
if newsletter:
kwargs["newsletter"] = True
if dark_mode:
kwargs["theme"] = "dark"
return UserProfile(**kwargs)
# Argument 1 to "UserProfile" has incompatible type "**dict[str, str | int]"; expected "str" [arg-type]
In this case, I lose all the benefits of a typed dataclass, so I usually avoid this pattern and instead assign variables directly before passing them manually into the dataclass constructor. Something like kwargs: AsTypedDict[UserProfile] would make this approach type-safe and usable.
Type hinting from_dict and to_dict methods on my dataclass-like objects
Currently, I just fall back to dict[str, Any], which isn’t type-safe.
@dataclass
class Foo:
bar: int
baz: str
@classmethod
def from_dict(cls, data: AsTypedDict[Foo]) -> Foo: ...
def to_dict(self) -> AsTypedDict[Foo]: ...