Dataclasses `has_explicit_hash` broken?

We’re having problems when trying to implement __hash__ for a dataclass that has successors. Investigation led to the following lines in dataclasses.py:

    class_hash = cls.__dict__.get('__hash__', MISSING)
    has_explicit_hash = not (class_hash is MISSING or
                             (class_hash is None and '__eq__' in cls.__dict__))

It seems dataclasses decides whether a class has an explicit hash by examining only the immediate class, and not its mro. This is at the very least inconsistent, as in other places dataclasses.py does examine the mro to reach similar decisions (eg, slots).

Would this be considered a bug? Would a PR that changes it be considered?

@ericvsmith ?

No, this is expected behavior - a subclass can change wether or not it’s safe to hash instances of that type with zero relation to what superclasses implement or don’t implement.

If this check doesn’t do what you want because you are in some interesting edge cases, you should manually provide the correct parameters.

But also, what is your situation? Can you provide a small example where it behaves incorrectly?

Here’s an example demonstrating the unexpected behavior:

from dataclasses import dataclass

@dataclass
class A:
    a: int = 1
    b: int = 2
    
    def __hash__(self) -> int:
        return hash((self.a, self.b))
        
@dataclass
class B(A):
    pass


ss = set([A(), B()])   # TypeError: unhashable type: 'B'

This is a toy example of course, and suggesting fixes for it is not the point (here there’s no actual need for a custom __hash__).

a subclass can change wether or not it’s safe to hash instances of that type with zero relation to what superclasses implement or don’t implement.

That’s akin to saying dataclass doesn’t support inheritance. Doesn’t it? If it does, why should the support be any different for __hash__?

Ok, but this is so much of a toy example that I still don’t understand why you want this.

The reason this doesn’t work is because despite there not being any new attributes, B generates a new __eq__. Is that the behavior you want to see changed?

Also, don’t forget that without frozen=True, it is strictly unsafe to support hashing.

Feel free to add frozen to the example, it doesn’t change the issue.

When you say ‘this is expected behavior’ (== dataclass ignoring inheritance for a particular method but not all others) is this because it was actually discussed? Was there any agreement that this is the right behavior?

Here is the only discussion I could find leading to this code: Change dataclasses hashing to use unsafe_hash boolean (default to False) · Issue #77110 · python/cpython · GitHub
There are no mentions of any inheritance considerations for has_explicit_hash.

It is not dataclasses “ignoring inheritance”, dataclasses is just respecting the behavior of python here. Remove @dataclass from the example and instead manually define __eq__ in both A and B and you will observe the exact same behavior.

See 3. Data model — Python 3.13.3 documentation

Note if you use @dataclass(eq=False) on B then it will inherit the __hash__ method.

dataclasses behaviour is trying to emulate the behaviour of Python if the __eq__ method was written in the class even though it is adding it after the class has been defined.

from dataclasses import dataclass

def eq_method(self, other):
    if type(self) is type(other):
        return self.a == other.a and self.b == other.b
    return NotImplemented

@dataclass
class A:
    a: int = 1
    b: int = 2
    
    def __hash__(self) -> int:
        return hash((self.a, self.b))
        
class B(A):
    __eq__ = eq_method
      

class C(A):
    pass

C.__eq__ = eq_method  # This is roughly how dataclass adds the __eq__ method

print(B.__hash__)  # None
print(C.__hash__)  # <function A.__hash__ at 0x76deda1a0220>

Dataclasses has to do the check and clear out the hash itself because it wants to look like you’ve written class B and not class C even though the way it works is more like class C.

3 Likes

I see now. Thank you!