In some cases like hash(-1) , the hash function returns -2 instead of the correct result -1 because in cpython, the value -1 means exception. This problem has been already mentioned in Issue 127814 but this issue was closed because of compatibility. Now I have fixed it on my local environment by passing the hash result through a pointer parameter instead of return value and it runs perfectly. Can I push my code to the cpython repository? Should I edit relevant notes and documentation? Which documentation should I edit?
Wouldnât this still introduce backward incompatibility as itâs stated in the linked issue ? (Iâm assuming youâre returning a pointer to be able to have hash(-1) == -1)
Why do you care what hash(-1)
returns?
The âcorrect resultâ according to the documentation is hash(-1) == -2
.
- If
x = m / n
is a nonnegative rational number andn
is not divisible byP
, definehash(x)
asm * invmod(n, P) % P
, whereinvmod(n, P)
gives the inverse ofn
moduloP
.
âŚ
- If
x = m / n
is a negative rational number definehash(x)
as-hash(-x)
. If the resulting hash is-1
, replace it with-2
.
Built-in Types â Python 3.13.1 documentation
What hash
actually does should to be regarded as an implementation detail in any case. What inevitably-brittle code do you wish to write, that uses hash
for your own purposes, that for some reason mustnât use a simple wrapper function, that checks if x < 0
and returns whatever you want ?
How you âfixedâ that?
It doesnât make any sense to discuss unknown code. âTalk is cheap. Show me the codeâ
What will be hash(-1)
on in your version?
@JamesParrott @skirpichev Thanks for your replies, next I will explain my thoughts.
What the problem actually is?
Not only -1, but any object with a hash value of -1 is returned with a value of -2 through the hash
function. hash(-1)
is just an example.
Why do I care it?
hash(-1) == hash(-2)
is an anomaly. Python uses -1 as the return value of the function when the error occurs, but sometimes the hash value of the object is -1, so the objectâs hash function has to return -2 in order for the original error mechanism to work. Although this âsolutionâ has been documented and -2 has been seen as the âcorrect resultâ, this issue still leads to some problems and confusion, so I want to fix it.
Whatâs my solution?
My solution is to use a pointer parameter to pass the real hash value and use the return value to pass the error message.
For example, this is the definition of long_hash, which is used to calculate the hash value of a long object:
static Py_hash_t
long_hash(PyObject *obj)
{
PyLongObject *v = (PyLongObject *)obj;
Py_uhash_t x;
Py_ssize_t i;
int sign;
if (_PyLong_IsCompact(v)) {
x = (Py_uhash_t)_PyLong_CompactValue(v);
if (x == (Py_uhash_t)-1) {
x = (Py_uhash_t)-2;
}
return x;
}
i = _PyLong_DigitCount(v);
sign = _PyLong_NonCompactSign(v);
x = 0;
while (--i >= 0) {
x = ((x << PyLong_SHIFT) & _PyHASH_MODULUS) |
(x >> (_PyHASH_BITS - PyLong_SHIFT));
x += v->long_value.ob_digit[i];
if (x >= _PyHASH_MODULUS)
x -= _PyHASH_MODULUS;
}
x = x * sign;
if (x == (Py_uhash_t)-1)
x = (Py_uhash_t)-2;
return (Py_hash_t)x;
}
Notice that when the real hash value x is -1, it will be changed into -2. I changed this code to something like this:
static int
long_hash(PyObject *obj, Py_hash_t *result)
{
PyLongObject *v = (PyLongObject *)obj;
Py_uhash_t x;
Py_ssize_t i;
int sign;
if (_PyLong_IsCompact(v)) {
x = (Py_uhash_t)_PyLong_CompactValue(v);
*result = x;
return 0;
}
i = _PyLong_DigitCount(v);
sign = _PyLong_NonCompactSign(v);
x = 0;
while (--i >= 0) {
x = ((x << PyLong_SHIFT) & _PyHASH_MODULUS) |
(x >> (_PyHASH_BITS - PyLong_SHIFT));
x += v->long_value.ob_digit[i];
if (x >= _PyHASH_MODULUS)
x -= _PyHASH_MODULUS;
}
x = x * sign;
*result = x;
return 0;
}
It will pass the result through the pointer parameter instead of the return value.
And the function which receives its result is edited like this:
static PyObject *
builtin_hash(PyObject *module, PyObject *obj)
{
Py_hash_t x;
if (PyObject_Hash(obj, &x) == -1)
return NULL;
return PyLong_FromSsize_t(x);
}
So hash
will return the real hash value instead of a tampered one.
After this change, what will hash
return?
It wonât return âwhatever I wantâ, it will just return the objectâs real hash value calculated through the PYTHONâS HASH ALOGRITHM. I just pass this result in another way without any modifications. For example, hash(-1)
will return -1, which is the real value calculated by hash algorithm, but in the original code, it returns -2 which is modified to accommodate the error handling mechanism.
If you have a better solution, please share your idea!
I wonât âreturnâ a pointer, instead I will pass the value through a pointer parameter to get a better compatibility.
Iâm unclear what problems this leads to? Yes, people are surprised by it, but what actual problem does it cause? The hash() function isnât required to produce unique values, and cannot in any case because its input space is far larger than its output space:
>>> hash(1)
1
>>> hash(2**61)
1
FWIW, hash(-1)
was how I learned this, because I had written code which assumed the outputs were unique and someone showed me how it failed on -1
.
I think having an easy collision case to point at is very valuable for pedagogical purposes.
If you have a better solution, please share your idea!
If you demonstrate your âsolutionâ solves an actual problem, not just an artificial non-problem, Iâll gladly give it more consideration. You still havenât given an example of good code that depends on the values returned by hash
, or explained why those of you that care canât simply write a wrapper function. That would be a good start. Not to mention that exposing pointers can lead to all kinds of security issues. Itâs much easier simply not to do so, and not have to think about that risk any further.
hash
is not intended to be a cryptographically secure hash function (See hashlib for those). Itâs pretty much just an implementation detail for set
and dict
.
What the problem actually is?
Not only -1, but any object with a hash value of -1 is returned with a value of -2 through thehash
function.hash(-1)
is just an example.
To put it bluntly, so what?
hash(-1) == hash(-2)
is an anomaly.
Itâs merely a curiosity. With educational value as Ned points out. It resulted from a design decision to avoid the error return value (another implementation detail).
this issue still leads to some problems and confusion,
What problems? Is there a bug or a performance penalty? If so, please provide an MRX or benchmarks respectively.
And confusion amongst whom? Among anyone other than those who are already confused enough to think hash
must return anything meaningful?
it will just return the objectâs real hash value
This is meaningless. Hash values are arbitrary, not real. Hash values for hash tables (such a dict and set) just need to produce a wide enough spread across the buckets, to achieve O(1) look up on average.
This âproblemâ looks rather artificial for me. Just hash collisions arenât a problem, as people already told you.
Please describe these âsome problemsâ. The rest doesnât make sense without this, as you suggest huge API break. You should have good arguments to show that this cure is really better than disease.
Well, you break tp_hash API, PyObject_Hash. Finally, you break algorithm for hashing integers and projects, that reproduce it for own integer types (like gmpy2).
Iâm not sure, that there is something to be solved. The âproblemâ might be obvious to you, but clearly â not for many people in this thread.
What you propose is a major breaking change. long_hash
is an implementation of the tp_hash
protocol, it should have signature compatible with Py_hash_t (*)(PyObject *)
. Changing the tp_hash
protocol in such way will break any extension class that implements hashing. PyObject_Hash()
is a part of the public C API and ABI. Changing it will break virtually every extension and program that uses embedded Python. This will not happen. Not until we abandon an existing C API and a large part of Python ecosystem.
There is a more sane way to make the transition. We can introduce the tp_hash_new
slot and new PyObject_HashNew()
API, and make the CPython core maintaining synchronization between tp_hash_new
and tp_hash
, and use tp_hash_new
first and tp_hash
as a fallback. This will add code to handle special cases in some already complicated places, and it will left here for decades (tp_getattr
is still here). It will still be not completely backward compatible, because third-party code can rely on details of the current behavior. All this hassle to solve a minor problem that may not exist and with no guarantee of a solution.
Personally, if I were doing it from scratch, I would make zero a special value, not -1, because many formulas that mix hashes degarde for zero (h1 + h2
, h1 ^ h2
, (h << n) | (h >> m)
, etc). It would also mean always using an unsigned type for the hash. But whatâs done is done.
I completely agree with the rest of the post.
Question from a newbie: zero is not a more usual hash value? For example itâs the hash of False
.
About unsigned type, the negative integers have a negative hash. If they have a positive hash, their hash will not collide with positive integers? I think it can also pose a restriction to hash algorithms.
Returning -hash(-x+1)
for negative x
seems like a much less severe change
My question above is still unanswered:
No reason has been given to change anything so there is no need to discuss what the changes would be.
And hash(x) == 0
for any x
such that x == 0
. It doesnât matter to this whether type(x)
is int
, bool
, float
,. complex
, fractions.Fraction
, or decimal.Decimal
. If x == y
, hash(x) == hash(y)
regardless of the numeric types involved. This isnât obvious, or âfreeâ. A lot of work was done to make it true.
True! They will not. But Iâve never seen code that relies on that. And hope I never will: hash values are mostly accidents of implementation, and not guaranteed across implementations, or releases, or in some cases not even across two runs of the same program using a single release (think âhash randomizationâ for strings).
Oh, Iâm not scared about randomness of hash value My doubt was about uniformity, but I was not taking in consideration that int is unbounded in Python, so you have a nearly infinite values you can return for an hash function, even if you limit it to âonlyâ positive integers. Am I right?

Am I right?
No. Hash function return values need to fit in 64bit (32bit on 32bit systems), since they might be passed through C code as none-PyObjects. But you are correct that python ints are unbounded, so some int values have to share hash values.
But uniformity is not broken by a single value being shifted around. O(1) âmistakesâ donât matter relative to O(2^32) or O(2^64).

My doubt was about uniformity, but I was not taking in consideration that int is unbounded in Python, so you have a nearly infinite values you can return for an hash function, even if you limit it to âonlyâ positive integers. Am I right?
@MegaIng is right. While nothing about this is guaranteed across releases, for ints i
the hash value is actually
abs(i) % sys.hash_info.modulus
with the sign of i
attached.
>>> import sys
>>> sys.hash_info.modulus
2305843009213693951
>>> _ == 2**61 - 1
True
>>> hash(sys.hash_info.modulus)
0
>>> hash(-sys.hash_info.modulus) # so there are negative ints with 0 hash
0
>>> hash(-sys.hash_info.modulus-1)
-2
>>> hash(-sys.hash_info.modulus-2)
-2
The modulus is both a prime and a solid string of 1 bits. The latter allows fast computation of the mod without using division (akin to âcasting out 9sâ in decimal via addition). âPrimeâ is valuable for more than one reason. The subtle one is that it allows to compute a hash for rational values thatâs the same regardless of the base their numerator and denominator are expressed in, and without dividing by their gcd first.
>>> import fractions, decimal
>>> hash(fractions.Fraction(15, 1000))
1694794611772065054
>>> hash(decimal.Decimal("0.015"))
1694794611772065054
>>> fractions.Fraction(15, 1000)
Fraction(3, 200)
>>> 3 * pow(200, -1, m) % m
1694794611772065054
>>> 15 * pow(1000, -1, m) % m
1694794611772065054
This breaks down if the denominator is a multiple of the modulus, because there is no multiplicative inverse then. Thatâs a special case.