How does CPython calculate the hashes of types?

I’m trying to trace through the CPython source code to understand how the hashes of python types get generated. For example:

Python 3.11.6 (main, Nov 27 2023, 12:13:52) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> hash(object)
-9223372036582494240
>>> hash(list)
-9223372036582496135

I think these are calculated by PyType_Type.tp_hash, but I can’t seem to find the implementation anywhere. Am I on the right track or is there something else happening?

I think type.__hash__ is simply inheriting from object.__hash__, so the hash is based on the id.

>>> hash(object)
270842856
>>> object.__hash__(object)
270842856
>>> hash(list)
270840586
>>> object.__hash__(list)
270840586

[Edited for clarity]

And here’s a clearer proof:

>>> type.__hash__
<slot wrapper '__hash__' of 'object' objects>
>>> object.__hash__
<slot wrapper '__hash__' of 'object' objects>
>>> type.__hash__ is object.__hash__
True

The relevant C code has moved around a bit recently. In the main branch, see: cpython/Python/pyhash.c at fb0cf7d1408c904e40142a74cd7a53eb52a8e568 · python/cpython · GitHub

which eventually lands here: cpython/Include/internal/pycore_pyhash.h at fb0cf7d1408c904e40142a74cd7a53eb52a8e568 · python/cpython · GitHub

And for completeness (sorry, should have included this in the last message), see also:

which is where the tp_hash slot for the object type is specified.

1 Like

Thanks! NumPy is definitely doing something fishy then: BUG: Some DTypes have identical hashes but distinct ids · Issue #26446 · numpy/numpy · GitHub.

Interesting. I’m not quite seeing the connection with types, though. Despite the name, dtypes aren’t (Python) types, right?

>>> isinstance(int, type)
True
>>> isinstance(np.dtype("uint64"), type)
False

Though I note that sometimes dtype instances compare equal to types, which makes things a bit messy (since it breaks transitivity of equality, along with the rule that things that are equal should have equal hash):

>>> np.float64 == np.dtype("float64") == float
True
>>> np.float64 == float
False
>>> hash(np.float64) == hash(np.dtype("float64"))
False
1 Like

Oh, that is definitely a broken implementation of __eq__ and __hash__ in numpy, and will cause issues if someone tries to use dtypes in a hash table.

1 Like