`dataclasses` and descriptors

:waving_hand: I’ve been combining two of my favorite features, dataclasses and descriptors and whoo boy has it been a ride!

The documentation does include a section on this combo, however I’d argue it’s either misleading or incomplete.

Let me show you

1. Slotted dataclasses don’t call the descriptor’s __set__

This is captured in slotted dataclasses not calling descriptor field `__set__` · Issue #132946 · python/cpython · GitHub, but TL;DR it’s not an issue specifically with dataclasses, but with descriptors and slotted classes. However dataclasses plumbing of slotted classes doesn’t make this obvious.

# This declaration doesn't error
@dataclass(slots=True)
class SlottedDataclass:
    name: Descriptor = Descriptor()

# This declaration errors
# `ValueError: 'name' in __slots__ conflicts with class variable`
class Slotted:
    __slots__ = ("name",)
    name = Descriptor()

2. Can’t provide mutable defaults

This is captured in Dataclasses erroring when descriptor has mutable default · Issue #132559 · python/cpython · GitHub, but TL;DR there’s no way to make a “mutable” default (or use default_factory, see below).

class Descriptor:
    def __get__(self, instance, owner):
        if instance is None:
            # This is `dataclasses` probing for the default value
            return []
        return ...

@dataclass
class Dataclass:
    name: Descriptor = Descriptor()
# `ValueError: mutable default <class 'list'> for field name is not allowed: use default_factory`

3. Can’t use field(default_factory=...) either

You might think to solve 2, we should do:

class Descriptor:
    def __get__(self, instance, owner):
        if instance is None:
            return field(default_factory=list)
        return ...

however, dataclasses removes the class attribute (here, the descriptor instance) if the class value is a dataclasses.Field without a .default. Which means __set__ won’t get called since the class no longer has a descriptor attribute.

3.1 Or any field(...) (e.g. hash=, or metadata=)

This also means you can’t do return field(metadata={"foo": self.bar}), or any other field(...) params, since that linked block either overrides the descriptor with the default value or deletes it altogether.


Solutions

I see two solutions (happy to hear more though)

1. Maybe datclasses + Descriptors is just cursed

So, instead of advertising how it works in the documentation, maybe just mention something along the lines of “your mileage may vary”. This would mean that datclasses can remain (mostly) descriptor-agnostic.

2. Lean in

Beef up the descriptor-usage in the tests and start finding and fixing bugs. I think overall the changes would likely be isolated to:

  1. Erroring for slots + descritpors
  2. Special handling of descriptors in the class attribute replacement/deletion code

…and ideally thats it?


Anywho, happy to hear thoughts. I lean 2 (obviously, and would also love to make the PR) but would love to hear from @ericvsmith @DavidCEllis @sobolevn who have all chimed in on GitHub. (I’ll also be at PyCon if you wanna pick my brain on WTF I’m doing)

1 Like

I’m sure everyone reading is super worried, so I thought I’d reassure you I’m not blocked or anything since my use-case ALSO combines another couple of favorite (very-cursed) features: Metaclasses and dataclass_transform).

So in my library I can just override the metaclasses __setattr__ and __delattr__ to work around the quirks. But, uh… I don’t think your average Joe should have to study the dark arts and commit the atrocities I have just to combine these two things :sweat_smile:

1 Like

I’m not sure I’d want to suggest additional checking and errors to try to handle all of the sharp edges you can encounter with descriptors and dataclasses, but I do think the slots behaviour should at least be documented.

They don’t work with slotted classes because the class attribute that would be used for the descriptor is needed for the slot, but I don’t think this is mentioned in the documentation on descriptor-typed fields.

The mutable defaults (and technically the field deletion) behaviour does follow from the documented behaviour - the __get__ value is treated as the default, this is mutable, so it errors as a mutable default. The strange behaviour when you use field(...) can also be traced to this.


I’ll note that attrs[1] always removes the default values from the class attributes even with slots=False. It also uses cls.__dict__.get(...) instead of getattr(cls, ...) when collecting defaults which means your default value is the descriptor instead of the return from __get__ which actually feels more consistent even if it might not be useful.

>>> from attrs import define
>>> class Desc:
...     def __get__(self, inst, cls=None):
...         return "Descriptor Output"
...         
>>> @define(slots=False)
... class X:
...     a: str = Desc()
...     
>>> X.a
Traceback (most recent call last):
  File "<python-input-12>", line 1, in <module>
    X.a
AttributeError: type object 'X' has no attribute 'a'
>>> X().a
<__main__.Desc object at 0x7a08a0424050>

  1. (and my own dataclass-like implementation.) ↩︎

1 Like

You didn’t talk about what happens when you use the typing.ClassVar annotation.

@dataclass
class Dataclass:
    name: ClassVar[Descriptor] = Descriptor()

dataclass behaves differently depending on whether ClassVar is in the annotation.
For example, this one doesn’t crash with your “mutable defaults” example, but does crash without the ClassVar.


You also didn’t talk about example usages of these classes and their instances.
(How they’re used affects whether ClassVar is appropriate.)

The OP is about “Descriptor-typed fields” (the title of the heading straight from the documentation).

So using ClassVar means they are “Class Variables” and therefore not fields.