I’ve been combining two of my favorite features, dataclasses and descriptors and whoo boy has it been a ride!
The documentation does include a section on this combo, however I’d argue it’s either misleading or incomplete.
Let me show you
1. Slotted dataclasses don’t call the descriptor’s __set__
This is captured in slotted dataclasses not calling descriptor field `__set__` · Issue #132946 · python/cpython · GitHub, but TL;DR it’s not an issue specifically with dataclasses
, but with descriptors and slotted classes. However dataclasses
plumbing of slotted classes doesn’t make this obvious.
# This declaration doesn't error
@dataclass(slots=True)
class SlottedDataclass:
name: Descriptor = Descriptor()
# This declaration errors
# `ValueError: 'name' in __slots__ conflicts with class variable`
class Slotted:
__slots__ = ("name",)
name = Descriptor()
2. Can’t provide mutable defaults
This is captured in Dataclasses erroring when descriptor has mutable default · Issue #132559 · python/cpython · GitHub, but TL;DR there’s no way to make a “mutable” default (or use default_factory
, see below).
class Descriptor:
def __get__(self, instance, owner):
if instance is None:
# This is `dataclasses` probing for the default value
return []
return ...
@dataclass
class Dataclass:
name: Descriptor = Descriptor()
# `ValueError: mutable default <class 'list'> for field name is not allowed: use default_factory`
3. Can’t use field(default_factory=...)
either
You might think to solve 2, we should do:
class Descriptor:
def __get__(self, instance, owner):
if instance is None:
return field(default_factory=list)
return ...
however, dataclasses
removes the class attribute (here, the descriptor instance) if the class value is a dataclasses.Field
without a .default
. Which means __set__
won’t get called since the class no longer has a descriptor attribute.
3.1 Or any field(...)
(e.g. hash=
, or metadata=
)
This also means you can’t do return field(metadata={"foo": self.bar})
, or any other field(...)
params, since that linked block either overrides the descriptor with the default value or deletes it altogether.
Solutions
I see two solutions (happy to hear more though)
1. Maybe datclasses
+ Descriptors is just cursed
So, instead of advertising how it works in the documentation, maybe just mention something along the lines of “your mileage may vary”. This would mean that datclasses
can remain (mostly) descriptor-agnostic.
2. Lean in
Beef up the descriptor-usage in the tests and start finding and fixing bugs. I think overall the changes would likely be isolated to:
- Erroring for slots + descritpors
- Special handling of descriptors in the class attribute replacement/deletion code
…and ideally thats it?
Anywho, happy to hear thoughts. I lean 2 (obviously, and would also love to make the PR) but would love to hear from @ericvsmith @DavidCEllis @sobolevn who have all chimed in on GitHub. (I’ll also be at PyCon if you wanna pick my brain on WTF I’m doing)