hash(None) Mk.2

oscarbenjamin · December 1, 2022, 12:54pm

While that would be a useful feature to have in Jupyter it would still be useful to have outside of Jupyter as well. Reproducibility of scientific code is important but not all scientific code is run in Jupyter and in any case there are other reasons for wanting reproducilibity.

Another common case is reproducing bugs. You need to be able to reproduce something deterministically to be able to use e.g. git bisect:

github.com/python/cpython

Optimiser breaks attribute lookup from metaclass property

opened 08:09PM - 13 Jul 22 UTC

closed 12:37PM - 17 Sep 22 UTC

oscarbenjamin

type-bug interpreter-core 3.11 3.12

**Bug report** This comes from a SymPy issue: https://github.com/sympy/sympy/…issues/23774 It looks like there is a nondeterministic error in the optimisation introduced in 96346cb6d0593ef9ec122614347ccb053cd63433 from #27722 first included in CPython version 3.11.0a1. The optimisation speeds up calling methods but sometimes retrieves the wrong attribute from a class whose metaclass defines a property method. The way that this arises is quite sensitive to small changes so I haven't been able to distil a standalone reproducer. I'll show how to reproduce this using SymPy below but first this is a simplified schematic of the situation: ```python class MetaA(type): def __init__(cls, *args, **kws): pass class A(metaclass=MetaA): def method(self, rule): return 'A method' class MetaB(MetaA): @property def method(self): def method_inner(rule): return 'MetaB function' return method_inner class B(A, metaclass=MetaB): pass print(B.method(1)) # MetaB function print(B().method(1)) # A method ``` Here `B` is a subclass of `A` but an instance of `MetaB`. Both define `method` but the `MetaB` method should be used when accessed from the class `B` rather than an instance `B()`. The failure seen in SymPy is that sometimes `B.method(1)` will execute as `A.method(1)` which fails because of the missing `self` argument. The following code reproduces the problem and fails something like 50% of the time with SymPy 1.10.1 and CPython 3.11.0a1-3.11.0b4: ```python from sympy import * x, y = symbols('x, y') f = Function('f') # These two lines look irrelevant but are needed: expr = sin(x*exp(y)) Derivative(expr, y).subs(y, x).doit() expr = Subs(Derivative(f(f(x)), x), f, cos) # This is where it blows up: expr.doit() ``` The failure is not deterministic: ```console $ python bug.py $ python bug.py $ python bug.py Traceback (most recent call last): File "/home/oscar/current/sympy/sympy.git/bug.py", line 13, in <module> expr.doit() ^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/function.py", line 2246, in doit e = e.subs(vi, p[i]) ^^^^^^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/basic.py", line 997, in subs rv = rv._subs(old, new, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/cache.py", line 70, in wrapper retval = cfunc(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/basic.py", line 1109, in _subs rv = self._eval_subs(old, new) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/function.py", line 1737, in _eval_subs nfree = new.xreplace(syms).free_symbols ^^^^^^^^^^^^^^^^^^ TypeError: Basic.xreplace() missing 1 required positional argument: 'rule' ``` The failure can be reproduced deterministically by setting the hash seed: ```console $ PYTHONHASHSEED=1 python bug.py ... TypeError: Basic.xreplace() missing 1 required positional argument: 'rule' ``` In the debugger the same code that already failed succeeds: ```console $ PYTHONHASHSEED=1 python -m pdb bug.py > /home/oscar/current/sympy/sympy.git/bug.py(1)<module>() -> from sympy import * (Pdb) c Traceback (most recent call last): File "/media/oscar/EXT4_STUFF/src/cpython/Lib/pdb.py", line 1768, in main pdb._run(target) File "/media/oscar/EXT4_STUFF/src/cpython/Lib/pdb.py", line 1646, in _run self.run(target.code) File "/media/oscar/EXT4_STUFF/src/cpython/Lib/bdb.py", line 597, in run exec(cmd, globals, locals) File "<string>", line 1, in <module> File "/home/oscar/current/sympy/sympy.git/bug.py", line 13, in <module> expr.doit() File "/home/oscar/current/sympy/sympy.git/sympy/core/function.py", line 2255, in doit e = e.subs(vi, p[i]) ^^^^^^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/basic.py", line 993, in subs rv = rv._subs(old, new, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/cache.py", line 70, in wrapper retval = cfunc(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/basic.py", line 1105, in _subs rv = self._eval_subs(old, new) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/oscar/current/sympy/sympy.git/sympy/core/function.py", line 1746, in _eval_subs nfree = new.xreplace(syms).free_symbols ^^^^^^^^^^^^^^^^^^ TypeError: Basic.xreplace() missing 1 required positional argument: 'rule' Uncaught exception. Entering post mortem debugging Running 'cont' or 'step' will restart the program > /home/oscar/current/sympy/sympy.git/sympy/core/function.py(1746)_eval_subs() -> nfree = new.xreplace(syms).free_symbols (Pdb) p new.xreplace <function FunctionClass.xreplace.<locals>.<lambda> at 0x7f0dd0f763e0> (Pdb) p new.xreplace(syms) cos (Pdb) p new.xreplace(syms).free_symbols set() ``` The actual arrangement of SymPy classes is something like this: ```python class ManagedProperties(type): def __init__(cls, *args, **kws): pass class Basic(metaclass=ManagedProperties): def xreplace(self, rule): print('Basic') class Expr(Basic): pass class FunctionClass(ManagedProperties): @property def xreplace(self): return lambda rule: print('functionclass') class Application(Basic, metaclass=FunctionClass): pass class Function(Application, Expr): pass class cos(Function): pass cos.xreplace(1) ``` **Your environment** - CPython versions tested on: 3.11.0a1-3.11.0b4 (3.10.5 or lower does not have the bug) - Operating system and architecture: Ubuntu x86-64.

In that issue what happened was that somewhere something iterated over a set and hash-randomisation made the iteration order non-deterministic. That should be fine because the SymPy code in question should compute the same result regardless of the iteration order of whichever set was being iterated over. It was not fine though because apparently depending on that iteration order the optimiser might or might not be triggered exposing a bug in the optimiser’s rewrite rules. Happily I could control the non-determinism in that case with PYTHONHASHSEED but it would have otherwise been much more painful to debug.

pf_moore · December 1, 2022, 2:21pm

That’s precisely the OP’s use case, and I agree it would be good to have better reproducibility for that case.

Making the hash of None constant is a step in that direction, but it’s not clear to me (a) if it’s the right step, or (b) if it’s sufficient. And in the absence of clear answers to those questions, I’d rather look at the bigger picture before diving in and changing things.

oscarbenjamin · December 1, 2022, 3:32pm

You ask if it’s “sufficient” but then the question is: sufficient for what?

There are some cases where this would make the difference between something being reproducible and something being not reproducible and for those cases it is sufficient. Obviously there are other cases where this would not be enough but in fact nothing can ever lead to complete reproducibility in all cases so that’s too high a bar to set.

I tend to see the situation with reproducibility as being that every little helps. Or to put it the other way round it only takes one bad apple to ruin determinism so why make something non-deterministic if there is no particular reason to do so?

In the case of SymPy sets are used extensively and SymPy expressions have structural hash functions. Those structural hash functions would be deterministic if it wasn’t for hash randomisation but that is at least controllable. Mostly sets of SymPy expressions are used but also things like sets of integers and sometimes sets like {None, False, True}. As far as I know the only object used in sets throughout the codebase that uses id for its hash is None. What this means is that it is possible to have a large codebase where hash(None) is potentially the only source of uncontrollable non-determinism.

yonillasky · December 1, 2022, 3:34pm

This is great, thank you, but perhaps there is another point to consider here.

From what I see, at least among the researchers in my own org, it is relatively well known that (assuming you want deterministic runs in your compute workload):

identity hashing has to be avoided (in any language)
random seeds need to be fixed (in any language)

The issue with bytes/str hash randomization and the PYTHONHASHSEED fix is far less known than (1), (2) ; I knew about it when I joined, and I remember at least some of those people being surprised when I told them about it.

Some of them had a misconception that it was the set implementation that was causing the non-determinism (as I imagine most people have when they see non deterministic behavior from keys with optional fields - they won’t realize it’s None causing it).

At least one person there did huge refactors where graphs (whose nodes were hashed as strings) are reconstructed in various places using dicts where nodes are inserted according to some manually dictated order, and similar complications that shouldn’t ever existed.

You don’t hear such people complaining here, the majority of programmers and scientists aren’t going to get to the bottom of such problems and then start fighting the holy wars in open source forums necessary to make it less of a hazard to the next person.

pf_moore · December 1, 2022, 3:47pm

I don’t know. But we’re not talking about “making something non-deterministic”, we’re talking about making it deterministic when it’s currently not guaranteed to be deterministic, even though it is in a lot of cases. The hash of None is always -9223363242512554292 on my 64-bit Windows installation of Python 3.11.0. As far as I know, the only time it’s not deterministic is on Unix systems with address space randomisation switched on. A non-trivial proportion of systems, yes. But far from all of them.

I really don’t care an awful lot here. There was a PR. It was rejected. A core dev needs to care enough to override the rejection. That core dev won’t be me. If this was part of a set of changes that “made deterministic reproduction of bugs significantly easier”, then maybe it would be me. I can’t give you a precise set of criteria for waht would persuade me. All I can say is that changing the hash of None by itself isn’t enough (for me).