I don’t mean to spam the forum, but I was asked to post about it here again.
NOTE: I sent this exact message below to the python-dev mailing list]
I wrote a doc stating my case here:
The main motivation for it is to allow users to get a predictable result on a given input (for programs that are doing pure compute, in domains like operations research / compilation), any time they run their program. Having stable repro is important for debugging. Notebooks with statistical analysis are another similar case where this is needed: you might want other people to run your notebook and get the same result you did.
The reason the hash non-determinism of None matters in practice is that it can infect commonly used mapping key types, such as frozen dataclasses containing
Non-determinism emerging from other value types like
strcan be disabled by the user using
PYTHONHASHSEED, but there’s no such protection against
All it takes is for your program to compute a set somewhere with affected keys, and iterate on it - and determinism is lost.
The need to modify None itself is caused by two factors
Optionalbeing implemented effectively as
T | Nonein Python as a strongly established practice
The fact that
__hash__is an intrinsic property of a type in Python, the hashing function cannot be externally supplied to its builtin container types. So we have to modify the type None itself, rather than write some alternative hasher that we could use if we care about deterministic behavior across runs.
This was debated at length over the forum and in discord.
I also posted a PR for it, and it was closed, see:
Asking for opinions, and to re-open the PR, provided there is enough support for such a change to take place.