Fast path for "float in range" checks

kknechtel · September 7, 2023, 10:21pm

Since I was referred here from Float contained in range - #15 by jeanas :

The use case is polymorphism, where we don’t necessarily either know that the left-hand side is an integral float, or that the right-hand side is a range instance, or both.

It also affects things that aren’t floats.

There are obvious specific objections to the following example off the top of my head, but the point is that the situation could potentially arise through more complex design considerations.

# API:
class Example:
    def __init__(self, value:int):
        self._value = value
    def use(self):
        if _internal_logic(): # temporarily pretend the value isn't found anywhere
            self._value = None
    def value_in(self, candidates:Container[int]):
        # surely the calling code will use a `set` for performance, right?
        # but the logic doesn't require that, so we specify an opaque interface.
        return self._value in candidates

# Client:
e = Example(1)
e.use()

# surely this will be efficient, right? `range` has a fast path,
# and we set the initial value to an integer, and MyPy didn't report
# any problem in OUR code.
e.value_in(range(big_number))

If instances of type can ever sensibly compare equal to any integer, then the type should also be expected to supply a conversion to integer - in the same way that it should be expected to hash to the same value as the integer it’s equal to (so that dicts and sets can work properly). We’re already okay with weird gotchas for user-defined types that define __eq__ and __hash__ inconsistently; conversions to built-in types seem to me like the same kind of thing.

Then we only need to worry about things that are convertible to integer even though they don’t compare equal. There’s only one built-in case for that - str - and it otherwise seems reasonable to expect the relationship to work on both directions for user-defined types. (The purpose of str being convertible to int is to prescribe a conversion from textual input; user-defined types can’t realistically lay claim to such an important piece of functionality.) But even for the str case, it can be caught by checking that the converted result is “equal to” the original (which is necessary to handle non-integer floats anyway).

So we could have the C equivalent of:

class range:
    def __contains__(self, ob):
        try:
            iob = int(ob)
        except TypeError:
            return False
        if iob != ob:
            return False # e.g. for non-integer floats
        return self._contains_long(iob)

With this approach, there is never a “slow path”, and it could be tweaked to ensure the “fast path” for int and bool doesn’t add overhead. This is only breaking for specific pathological user-defined types, whose semantics I doubt can be reasonably justified.

(the relevant source has moved further down the file since; it is now at line 482)

Another consequence of this implementation is that user-defined subclasses of int take the slow path. This violates my expectations; if I’m not supposed to rely on myint(x) == int(myint(x)) then I shouldn’t expect to find it in a range at all (and I am also seriously violating the Liskov Substitution Principle), and if I’m expecting myint(x) in range(y) to make any sense at all, then the only sensible semantic (at least to me) is “the same result as int(myint(x)) in range(y)”.