Fast path for "float in range" checks

Lots of objects are equal to ints without being ints, and they should still work with range objects.

The behaviour of range in tests is fine and does not need fixing. It works exactly as it should:

  • it has a fast path for int arguments as an optimization;
  • everything else falls back on the default Sequence in test, which is only O(N).

This thread asked to introduce a fast path for floats. We explained why it would be bad to encourage the use of floats. But that doesn’t mean that it should be prohibited.

There are two schools of thought in programming models:

  • the language should be as restrictive as possible: anything not explicitly permitted should be forbidden (“bondage and domination” languages with extremely strict and restrictive typing rules).
  • the language should be as open as reasonable: anything not explicitly forbidden should be permitted.

Python is squarely in the second design school. That’s why we use it, and why it is so popular.

If you are worried about accidently using a float instead of an int for range tests, use a type-checker or a linter that will warn you of the issue.

2 Likes

Actually, I have written millions of code like for x in range..., but never if x in range... except it was an accidental typo. I think everyone here too. :slight_smile:

2 Likes

Since I was referred here from Float contained in range - #15 by jeanas :

The use case is polymorphism, where we don’t necessarily either know that the left-hand side is an integral float, or that the right-hand side is a range instance, or both.

It also affects things that aren’t floats.

There are obvious specific objections to the following example off the top of my head, but the point is that the situation could potentially arise through more complex design considerations.

# API:
class Example:
    def __init__(self, value:int):
        self._value = value
    def use(self):
        if _internal_logic(): # temporarily pretend the value isn't found anywhere
            self._value = None
    def value_in(self, candidates:Container[int]):
        # surely the calling code will use a `set` for performance, right?
        # but the logic doesn't require that, so we specify an opaque interface.
        return self._value in candidates

# Client:
e = Example(1)
e.use()

# surely this will be efficient, right? `range` has a fast path,
# and we set the initial value to an integer, and MyPy didn't report
# any problem in OUR code.
e.value_in(range(big_number))

If instances of type can ever sensibly compare equal to any integer, then the type should also be expected to supply a conversion to integer - in the same way that it should be expected to hash to the same value as the integer it’s equal to (so that dicts and sets can work properly). We’re already okay with weird gotchas for user-defined types that define __eq__ and __hash__ inconsistently; conversions to built-in types seem to me like the same kind of thing.

Then we only need to worry about things that are convertible to integer even though they don’t compare equal. There’s only one built-in case for that - str - and it otherwise seems reasonable to expect the relationship to work on both directions for user-defined types. (The purpose of str being convertible to int is to prescribe a conversion from textual input; user-defined types can’t realistically lay claim to such an important piece of functionality.) But even for the str case, it can be caught by checking that the converted result is “equal to” the original (which is necessary to handle non-integer floats anyway).

So we could have the C equivalent of:

class range:
    def __contains__(self, ob):
        try:
            iob = int(ob)
        except TypeError:
            return False
        if iob != ob:
            return False # e.g. for non-integer floats
        return self._contains_long(iob)

With this approach, there is never a “slow path”, and it could be tweaked to ensure the “fast path” for int and bool doesn’t add overhead. This is only breaking for specific pathological user-defined types, whose semantics I doubt can be reasonably justified.

(the relevant source has moved further down the file since; it is now at line 482)

Another consequence of this implementation is that user-defined subclasses of int take the slow path. This violates my expectations; if I’m not supposed to rely on myint(x) == int(myint(x)) then I shouldn’t expect to find it in a range at all (and I am also seriously violating the Liskov Substitution Principle), and if I’m expecting myint(x) in range(y) to make any sense at all, then the only sensible semantic (at least to me) is “the same result as int(myint(x)) in range(y)”.