I think the in statement in python can be better without forcing to bool type

The in statement in python currently forces the result of __contains__ to be of type bool:

> annotation.object[0].bndbox.__contains__(("xmin","ymin","xmax","ymax"))
[287,428,351,662]
> ("xmin","ymin","xmax","ymax") in annotation.object[0].bndbox
True

Otherwise I can get the properties of the object in bulk like this, which is cool, while retaining the original meaning of in: True for Bool values if there are any, and False for Bool values if they are empty.

That’s not a containment check any more; it’s really a set intersection. Would you consider using the & operator instead?

1 Like

Indeed, I can only consider using the normal method.

This is something that has been awkward for sympy which has symbolic sets:

In [7]: from sympy import *

In [8]: x = symbols('x')

In [9]: Contains(2, Integers)
Out[9]: True

In [10]: Contains(x, Integers)
Out[10]: x ∈ ℤ

In [11]: 2 in Integers
Out[11]: True

In [12]: x in Integers
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 x in Integers

File ~/current/sympy/sympy.git/sympy/sets/sets.py:795, in Set.__contains__(self, other)
    791 b = tfn[c]
    792 if b is None:
    793     # x in y must evaluate to T or F; to entertain a None
    794     # result with Set use y.contains(x)
--> 795     raise TypeError('did not evaluate to a bool: %r' % c)
    796 return b

TypeError: did not evaluate to a bool: None

It isn’t possible to overload __contains__ to return a non-bool because it will always be coerced to a bool:

class Set:
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.name
    def __lt__(self, other):
        return StrictSubset(self, other)
    def __contains__(self, element):
        return Contains(self, element)

class StrictSubset:
    def __init__(self, lhs, rhs):
        self.lhs = lhs
        self.rhs = rhs
    def __repr__(self):
        return f'{self.lhs} ⊂ {self.rhs}'

class Contains:
    def __init__(self, element, set_):
        self.element = element
        self.set_ = set_
    def __repr__(self):
        return f'{self.element} ∈ {self.set_}'

Then we have:

In [24]: A = Set('A')

In [25]: B = Set('B')

In [26]: A < B
Out[26]: A ⊂ B

In [27]: Contains(A, B)
Out[27]: A ∈ B

In [28]: A in B
Out[28]: True

In [29]: A.__contains__(B)
Out[29]: A ∈ B

In [30]: bool(A.__contains__(B))
Out[30]: True

Returning True here is not correct so sympy chooses to raise an error if the __contains__ would otherwise return anything other than True or False.

It would be better for this usage if it were possible to return non-bool from __contains__ and have Python respect that rather than coercing to bool in the same way that it does for other operators like <, &, etc. My guess is that the reason __contains__ is different is related to the syntactic constructs like in and not in or something but I’m not sure. It is also possible that this case was just overlooked because it wasn’t needed by the likes of numpy for array operations.

Changing this in Python so that the interpreter does not coerce would not be backwards compatible because although probably the vast majority of __contains__ methods will already return bool there will almost certainly be some code somewhere that currently relies on the interpreter to do this coercion. I think that the change would be worth it anyway though and I have been meaning for some time to propose it. I haven’t gotten round to proposing this because I haven’t found the time myself to prepare a patch and I thought it would be better to do that first rather than propose an idea that I don’t yet have time to implement.

1 Like

The data model description says that __contains__ should return true or false (lowercase) which usually means anything truthy right? So a custom __contains__ does not need to return a bool, but in is the thing that coerces the truthy/falsy value to a bool.

I’m not used to sympy so your examples confuse me a little, and I don’t see exactly how they relate to in coercing the output of __contains__ to bool.

That is correct. A __contains__ method does not need to return a bool. However, it would be a backward compatibility violation to change the promise of the in operator. For example:

class Container:
    # This contains every integer that is 2 or 3 above
    # a multiple of four
    def __contains__(self, item):
        return item & 2

numbers = random.sample(range(100), 50)
count = sum(x in Container() for x in numbers)

Currently, this will, as the names suggest, count the number of numbers that are shown as being in that container - even though the container doesn’t return True/False but other truthy/falsy values. Both sides of this are completely valid under the current definition of the in operator.

And this is a documented feature. Note, for instance: 6. Expressions — Python 3.12.1 documentation

“”"
For user-defined classes which define the __contains__() method, x in y returns True if y.__contains__(x) returns a true value, and False otherwise.
“”"

So this is a backward-incompatible change and a language definition change. Obviously that’s not an instant fail, since backward-incompatible changes do happen, but it’s going to have consequences in unexpected places. This would be a fairly subtle bug if it ever shows up (since the vast majority of code won’t notice the difference), making it hard to debug.

2 Likes

You can already get properties in bulk, using operator.itemgetter or operator.attrgetter.

>>> from operator import itemgetter, attrgetter
>>> itemgetter('a', 'b'){'a': 1, 'b': 2, 'c': 3}
(1, 2)
>>> attrgetter('real', 'imag')(5+3j)
(5.0, 3.0)

Trying to overload what __contains__ does based on whether it is called directly or invoked via the in operator seems like it would be confusing.

6 Likes