Why dataclass is unhashable?

For example:

@dataclass
class Node:
    val: int
    next: "Node" = None
    random: "Node" = None

f1 = Node(3)
f2 = Node(4)
print("__hash__" in dir(f1), "__eq__" in dir(f1))
d = {f1: 0, f2: 1}

I check the dataclass object already contains the __hash__ method, but it’s unhashable. Why?

But the plain class is hashable.

class Node:
    def __init__(self, x: int, next: 'Node' = None, random: 'Node' = None):
        self.val = int(x)
        self.next = next
        self.random = random

BTW, what’s the default __hash__ method of a plain class?

There’s a paragraph in the docs that mentions this:

  • If eq and frozen are both true, by default dataclass() will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

In general, hashing doesn’t play nicely with mutable objects, so it was safer for dataclasses to keep things unhashable by default.

2 Likes

Different instances of a Python class by default always compare unequal to each other: they will only compare equal if they are actually the same object.

>>> class A:
...     pass
... 
>>> a = A()
>>> a == a
True
>>> b = A()
>>> a == b
False

Hashing of plain python classes just needs to be consistent with this: it should always give the same hash for the same object, and it should ideally give mostly-unique hashes for different objects. To accomplish this, Python (CPython) bases the hash on the object’s memory address, i.e., where it lives in your computer memory.

>>> bin(id(a))  # the memory address of a
'0b100110110100110010001110010010111010010000'
>>> bin(hash(a))  # the hash: trim off some zero-bits on the right.
'0b10011011010011001000111001001011101001'
1 Like

As @sweeneyde pointed out, you can set the eq param to false to instruct the child to inherit the default __hash__ method provided by the parent(the object class):

@dataclass(eq=False)
class Node:
   val: int
   next: "Node" = None
   random: "Node" = None

Doing so will lead to:

>>> d1 = {Node(3): 0, Node(4): 1}
>>> d2 = {Node(3): 0, Node(4): 1}
>>> d1 == d2
False

However, digging deeper into the docs, you can find the following explanation:

Although not recommended, you can force dataclass() to create a __hash__() method with unsafe_hash=True . This might be the case if your class is logically immutable but can nonetheless be mutated. This is a specialized use case and should be considered carefully.

Which supports equality between instances:

@dataclass(unsafe_hash=True)
class Node:
    val: int
    next: "Node" = None
    random: "Node" = None

d1 = {Node(3): 0, Node(4): 1}
d2 = {Node(3): 0, Node(4): 1}
assert d1==d2