After noticing that the
is not operator have inadvertently been used thousands of times over the decades throughout our code base at work in lieu of
!= comparisons to string literals and numbers… Could we address this wart in the language itself or within the CPython VM?
I call it a wart because code reads and writes wonderfully
if value is 'thing': just sounds logical. That it does not do what a reader blissfully unaware of identity vs equality may expect is unfortunate because it is so easy to read and write as English without realizing there was something to think about.
Object identity is an important concept. But it is not normally something someone needs to use. The common valid use cases in Python are
is None and
is not None where it can be important to avoid triggering an objects
__ne__ methods which can (and often do) do the “wrong” thing. The other common but infrequent use case for an identity check is comparing against custom singletons, usually a module’s named instance of
object() or a dummy type or similar.
I don’t want a language breaking change! We can’t remove the
is operator or have it blindly start behaving like
An interesting approach pointed out by a colleague is that PyPy gave up on
is being only for identity in all situations as so much existing code failed due to using it to compare to immutable basic types by virtue of CPython’s implementation detail of having singletons for widely used values so said code “worked” despite itself. Their choice effectively normalized the CPython implementation detail practice of
is working for a known subset of comparisons. (TODO: dig into their code and see what logic they chose for this situation)
So has the ship sailed? I’m not convinced. We could alter
is to behave differently when both sides are known basic immutable types, triggering an actual equality check. This could break some code but that should be rare - within reason for a normal feature release. What I think would be bad is ever triggering dunder method calls. I believe we’d only want to do this for our own known built-in basic immutable types, not offer it to user defined types or C extensions. The goal would be to work around the “identity crisis” whenever a VM happens to decide to use singletons some or all of the time for some set of values as we do for bytes, str, and int.
id(LHS) == id(RHS) is effectively a slow replacement for
is - It could be used when someone rare actually wants to know if
"foo" was returned from a C API string building function and thus is a different object or of it is the same instance of literal
"foo" because it was generated by Python code within their module (today’s CPython VM implementation detail).