I don’t mean to spam the forum, but I was asked to post about it here again.
NOTE: I sent this exact message below to the python-dev mailing list]
A proposal to modify None
so that it hashes to a constant
I wrote a doc stating my case here:
Briefly,
-
The main motivation for it is to allow users to get a predictable result on a given input (for programs that are doing pure compute, in domains like operations research / compilation), any time they run their program. Having stable repro is important for debugging. Notebooks with statistical analysis are another similar case where this is needed: you might want other people to run your notebook and get the same result you did.
-
The reason the hash non-determinism of None matters in practice is that it can infect commonly used mapping key types, such as frozen dataclasses containing
Optional[T]
fields. -
Non-determinism emerging from other value types like
str
can be disabled by the user usingPYTHONHASHSEED
, but there’s no such protection againstNone
.
All it takes is for your program to compute a set somewhere with affected keys, and iterate on it - and determinism is lost.
The need to modify None itself is caused by two factors
-
Optional
being implemented effectively asT | None
in Python as a strongly established practice -
The fact that
__hash__
is an intrinsic property of a type in Python, the hashing function cannot be externally supplied to its builtin container types. So we have to modify the type None itself, rather than write some alternative hasher that we could use if we care about deterministic behavior across runs.
This was debated at length over the forum and in discord.
I also posted a PR for it, and it was closed, see:
Asking for opinions, and to re-open the PR, provided there is enough support for such a change to take place.