Need help with __hash__ and __eq__

Hello,

I am not sure If I understand something correctly and need help.

I have a small class that holds three values.
For this class I implement the __eq__.
To provide more convenience I also provide equality for comparing against list and tuple since from the comparing context it’s clear that these values mean the same.

class MyObj:
    def __init__(self, a: int, b: int, c: int | None):
        self._a: Final = a
        self._b: Final = b
        self._c: Final = c

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self._a == other._a and self._b == other._b and self._c == other._c

        # to provide convenience also allow comparison with tuples/lists
        if isinstance(other, (list, tuple)):
            if len(other) == 2:
                return self._a == other[0] and self._b == other[1]
            if len(other) == 3:
                return self._a == other[0] and self._b == other[1] and self._c == other[2]
            return False

        return NotImplemented

    def __hash__(self):
        # has collisions with tuple
        # __eq__ can also be true without having the same hash
        return hash((self._a, self._b, self._c))

        # Is this one better?
        return hash((self.__class__, self._a, self._b, self._c))


Now I am not sure how I should correctly implement the __hash__.

The docs clearly state:

The only required property is that objects which compare equal have the same hash value

My __hash__ function is thus obviously not correct since __eq__ can be true without the same hash.
I would also expect that len({(1, 2, 3), MyObj(1, 2, 3)}) == 2.

How can I provide convenience in the __eq__ while still implementing a valid __hash__?
It seems I am not understanding everything correctly.

Thank you for your help

hash((self._a, self._b, self._c))
Seems correct

__eq__ can be true without the same hash.

How? Or do you mean?

__eq__ can be false with the same hash.

hash((1,2,3)) == MyObj(1,2,3).__hash__()
should be true

The docs state that if __eq__is true, __hash__ should be the same.
In my example __eq__ can be true, but __hash__ returns not the same hash (compare with len == 2).

It’s unclear to me how I can provide the convenience while still respecting the docs

If I’m understanding you correctly, your class (a) compares equal to a tuple of the same values, and (b) has the same hash as a tuple of the same values. That seems correct, and not at all problematic. (Including the class in the hash isn’t an improvement, stick to the first one.) Where things get a little more confusing is that it can also compare equal to a shorter tuple. Presumably this makes sense in context, but it does mean that you have non-transitive equality: that a MyObj instance may be equal to two things that are not equal to each other.

Why would you? The two are equal, you made that definition yourself. They are and should be considered equivalent. When you put two equal values into a set, only one of them is retained. Notably, len({42, 42.0}) is 1, not 2.

So I guess the question is, which aspect do you want to retain? If “from the comparing context it’s clear that these values mean the same”, then MyObj should compare equal to a tuple, should have the same hash as a tuple, and each should be treated as present in a set/dict when the other is. (If x == y, then stuff[x] is stuff[y] in a normal dictionary.) But as written here, there are several things that will be a bit confusing.

  1. MyObj(1, 2, 3) will be equal to (1, 2, 3) and also to (1, 2) but those are not equal to each other. This will cause strange behaviour in sets and dicts, depending on which one gets in first.
  2. MyObj(1, 2, 3) won’t have the same hash as the short tuple (1, 2). But they compare equal. This will have problematic effects in sets and dicts.
  3. Your objects compare equal to lists, same as they do tuples, but the lists and tuples won’t themselves compare equal. Probably less problematic than the other nontransitivity but still worth keeping in mind.

So this is going to be a very quirky class. I think the safest option would be to make them non-hashable (__hash__ = None), but you may find that things work acceptably well even with them being hashable, just as long as you accept that there will be certain behaviours that are unsupported (set({1, 2}, MyObj(1, 2, 3)) is likely to misbehave).

2 Likes

This an example from a class the saves coordinates.
It has longitude, latitude and an optional height.

I want the user to be able to do

  • if coordinate == (lat, long)
  • if coordinate == (lat, long, height)

because in this context it’s clear that user wants to compare coordinates.

Since this class is pseudo-immutable I thought it would be nice if it would be hashable.
That’s where I am stuck at the moment.

If I include the class in the hash then

  • len({(1, 2, 3), MyObj(1, 2, 3)}) == 2
    
  • No strange behavior when inserting the object in a set/dict because there is no collision

But then there are even more cases where equality is true but the hash is not the same.

Maybe add if c is None, don’t add it to the hash, or eq. I think None has a hash value which might confuse things.
Ideally, you would have two classes Coordinate2d, Coordinate3d and a cast or conversion method, assume height is always Z axis, use 0 instead of None

An option is to give your class a method called something like is_equivalent_to or loosely_equal and use that to handle these comparisons. Then instead of doing coordinate == (lat, long) someone would call coordinate.loosely_equal(lat, long).

It kind of depends on what the use cases are for objects of this class. As you’re finding, defining __eq__ has additional consequences beyond just the behavior of explicit comparisons like coordinate == (lat, long). It also has consequences for hashing which has consequences for set/dict membership and so on.

You can’t separate these if you rely on __eq__. But if you define your own method you can target just the contexts you care about and not “corrupt” other contexts.

The fact that you want your objects to behave as equal to tuples in one context (explicit == comparison) but not in another (a set should be able to contain one of each) suggests to me that perhaps relying on __eq__ isn’t the best choice. A custom method may be more clear.

1 Like

If your __eq__ can return True without considering the value of the third attribute then that attribute should not affect the hash:

def __hash__(self):
    return hash((self._a, self._b))

I recommend not defining __eq__ the way that you have though. It is confusing and not useful to have completely different kinds of objects (MyObj and list/tuple) compare equal with == when they don’t have any of the same attributes, methods and behaviour.

Then they should not compare equal with ==.

1 Like

Except that this is by coincidence. You’re seeing a length of 2 because the two objects landed into different buckets. If, however, they had happened to land in the same bucket, they would have been checked for equality, and one of them would be suppressed. This will lead to EXTREMELY bizarre bugs where the set contains both items if you only have those two, but if you add precisely N other items first (for some value of N), the MyObj will be dropped from the set. Mismatching hash and eq results in weird, data-dependent issues.

This is a problem, then.

The way sets and dictionaries work, very roughly speaking, is:

  1. Take the object’s hash
  2. Divide that by some bucket count and look at the remainder. That tells you which bucket the thing goes into.
  3. See if anything in that bucket is equal to the thing.

So if you have different hashes for things that compare equal, they will only be checked if they HAPPEN to land in the same bucket, purely by coincidence. You don’t control the bucket count, so it might vary based on other (unrelated) actions in the set/dict.

I second this recommendation. This completely bypasses the issues of non-transitive equality and broken hashing. Your objects would compare equal only to other objects of the same type, but can be considered “loosely equal” or “equivalent” to whatever others you like.

1 Like

At least in CPython the hash is stored as well and the implementation will compare the hashes before calling __eq__ so even if they hit the same bucket if the hashes are different they will be considered different:

However the full hash is not stored and only a fixed size integer derived from that hash is used so it is possible that __hash__ returns different integers but the implementation stores the same value for them. I don’t think any of this is “guaranteed behaviour” though.

Thank you all for your valuable input - I think I gained a deeper understanding thanks to your answers. I’ll also keep a dedicated equal function in mind.

However since I want to keep the equality implementation for different reasons (for now) I’ll set __hash__ = None .

Thank you all for your help!

Yeah, there’s a lot that is very much implementation-defined. For another example, the way that a tuple calculates its hash could change at any time, so the “include the class in the hash” trick might in a future version happen to not actually make the hash any different, or only different in the high bits, or something.

This is why I try to avoid using any sort of empirical evidence for arguments about hashing, instead preferring to stick to the actual definitions. Much easier to reason about hashes than to get reliable experimental data.

That’s fine. The nontransitive equality may cause some confusion, but it’s not going to cause bizarre behaviour if the objects aren’t hashable. I think this is a good solution.