Why equality comparison does not check first for identity?

I had a curiosity, so I’ve done this little test:

(venv_pyperf_3_11_2) marco@buzz:~/sources/venv$ pyperf timeit --rigorous -s "a = object()" "a == a"
.........................................
Mean +- std dev: 20.0 ns +- 0.2 ns
(venv_pyperf_3_11_2) marco@buzz:~/sources/venv$ pyperf timeit --rigorous -s "a = object()" "a is a"
.........................................
Mean +- std dev: 14.6 ns +- 0.4 ns

So a is a is faster than a == a, even if a is a really simple object. Not a big surprise, since is checks for identity and == checks for equality.

The real question is: why equality comparison does not check first for identity?

1 Like

Because not all objects are equal to themselves. If you want to know if two objects are equal, you use the == operator; if you want to know whether they are identical-or-equal, there’s no easy way to spell it, but that’s two separate checks so it’s x is y or x == y. It’s worth noting that the identical-or-equal check is what’s used for object containment (eg x in [x] will always be true even if x isn’t equal to itself), so maybe it should have an operator, but most of the time == is fine.

Here’s an example of a class where instances are never equal to themselves.

class A:
    def __eq__(self, other):
        return False

…So the problem is NaN?

1 Like

And things like it.

For example, I can imagine a case for making sentinel values
not-equal-to-self.

But Chris’ post is basicly saying to you: write what you mean. If you
mean equality, use ==. If you mean identity, use is. Don’t seek some
pointless optimisation of a well defined idea (== in this case).

The flip side of Aohan’s example of __eq__ returning False is that
if you have a class where a full on values based equality test is
expensive and you know that identify implies equality, you can
implement __eq__ as:

 def __eq__(self, other):
     return self is other or all-the-salient-internal-values-are-equal

in the class.

The point here being that this optimisation belongs in the class, which
knows its own semantics, not in the caller’s mind, who may have a
simplistic notion of those semantics.

Say when you mean at the calling end. Rely on the object itself to
implement that meaning.

And you can, if you truly know the domain sufficiently, write:

 a is b or a == b

if you know it is (a) reliable and (b) a useful performance hack.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like
>>> nan = float('nan')
>>> nan == nan
False
>>> nan is nan
True
1 Like

? What’s the use case? Yes, I use sentinels to check their identity, but I’m quite sure they are also equal.

I already do this. I just feel I’m reinventing the wheel.

Yes, you’re right. I know it, just forgot.

In my humble opinion, this behavior is wrong. Not a mathematical, but I’ll try my best.

NaN mathematically is an indeterminate form:

https://en.wikipedia.org/wiki/Indeterminate_form

For example:

0 / 0 = nan

means that nan can have any value, since

0 * nan = 0

is true for any nan. Of course, moving the zero from the other side of the equation is not permitted in mathematics, but I leave to the reader a more sophisticated explaining why nan is indeterminate.

So, the fact that nan == nan is False is… false. Or better, it can be false, but it can also be true! You can’t know, since nans are indeterminate, so they could also have the same value.

IEEE 754 says that “The equality and inequality predicates are non-signaling.”:

https://en.wikipedia.org/wiki/NaN#Comparison_with_NaN

This, in my triple humble opinion, is incorrect, especially for a language that can raise exceptions.

Is there any chance to change the behavior of comparison with NaN so they will raise an exception instead?

PS: sorry for the way I’m posting references, but suddenly I can’t post links…

Yes, there is! A snowblal’s chance in a bonfire is greater, but there is a chance! All you have to do is convince the IEEE engineers that they utterly and totally blooped, and that they should create a backward-incompatible change to an otherwise-working system that’s deployed across virtually every computer in the world.

? What’s the use case? Yes, I use sentinels to check their identity, but I’m quite sure they are also equal.

I’m thinking about making it harder to accidentally write code which
does an equality test and doesn’t notice a sentinel. My use case is
nebulous, but I feel like I’ve seen stuff in the past where code naively
traverses something with an equality test. Having sntinels
not-self-equal and also not-other-equal might be of benefit here in
terms of catching accidents.

I already do this. I just feel I’m reinventing the wheel.

You are, but some wheels are so small that they’re not worth
generalising. Also, for simple classes (eg one with just 2 fields or
something) the identity test brings a low speed benefit and you might be
better with simpler code rather that as-fast-as-possible code.

Cheers,
Cameron Simpson cs@cskk.id.au

[quote=“Serhiy Storchaka, post:6, topic:24656, full:true,
username:storchaka”]

>>> nan = float('nan')
>>> nan == nan
False
>>> nan is nan
True

[/quote]

Yes, you’re right. I know it, just forgot.
In my humble opinion, this behavior is wrong. Not a mathematical, but
I’ll try my best.

It’s an expediency thing. The idea is to make tests return false to
reduce the likelihood of mistaking a NaN for something else.

[…]

IEEE 754 says that “The equality and inequality predicates are
non-signaling.”:
[…]
This, in my triple humble opinion, is incorrect, especially for a
language that can raise exceptions.

This is to allow bulk computations to run to completion, with NaNs from
errors propagating to their dependent results without aborting the
rest of the suite. Think huge matrices of values or vectorised stuff
passed to a coprocessor eg a modern graphics card.

The objective is to let it all run and sift the NaNs from the results in
whatever fashion is sensible. So you might:

 df['scaled'] = df['raw'] * 900

and get a nice dataframe series 'scaled' with Nans in the slots where
there were NaNs in df['raw'], but sensible results for the other
slots. If this raised an exception this would be a nightmare.

The same applies across all “high level” bulk operations, particularly
if offloaded in bulk to some fast hardware. Which is very common.

Is there any chance to change the behavior of comparison with NaN so they will raise an exception instead?

I think there might be some kind of global setting, saw it mentioned in
passing somewhere. But that affects the enitre Python process.

Cheers,
Cameron Simpson cs@cskk.id.au

NANs were not designed with object-oriented coding in mind. They were designed with low-level languages in mind where there is no concept of “object identity”.

Under those circumstances, it is highly desirable that NANs compare unequal to every value, including other NANs, even other NANs with the same bit-pattern.

(Remember that there are many different NAN bit-patterns. NANs can differ by a 51 bit payload, a 1 bit quiet vs signalling bit, and a sign bit.)

Otherwise we would have very bad consequences. Quoting Stephen Canon, who was on the IEEE committee that chose the behaviour:

“NaN being equal to itself would be extremely problematic. It would lead to completely nonsensical “true” statements like 0.0/0.0 == infinity - infinity or acos(1.5) == log(-1.0). You really don’t want those to be identities, much more so than you want == to be reflexive.”

The source of the quote was this Stackoverflow post a few years ago, but unfortunately it looks like somebody has deleted that specific comment, or at least I can’t find it now. (One of the bad things about Stackoverflow is that people can edit other people’s comments.)

When IEEE-754 maths was moved into object-oriented languages like Python, folks decided that the way to honour the spirit of the standard was for NANs to compare unequal to all other floats, including:

  • other NANs;
  • even if they have the same bit-pattern;
  • even if they are the same object.

Alas, that implies that the interpreter cannot assume that if two object references point to the same object that they are not equal.

Yes for Decimal. For floats: no, but yes, but mostly no.

Signalling NANs will signal (raise an exception) on comparisons:

>>> from decimal import Decimal
>>> s = Decimal('snan')
>>> s == 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]

The Decimal module has a flag that treats all NANs as signalling NANs. That is fully supported.

In theory, we can do the same with binary floats. Most (all?) CPUs that implement IEEE-754 allow you to set the floating point flags to treat all NANs as signalling.

Unfortunately this is platform specific, and Python’s support for float exception control was fragile and unmaintained and is now removed.