Fast path for "float in range" checks

Proposal:
Add a fast arithmetic path for __contains__ check of floats on range objects.

Why:
Currently the range() object already performs an arithmetic-based fast path for __contains__ checks of int values, but not float and integral float values. This results in very long comparisons for checks of floats.

This situation can arise in many cases when a list of floats (that may or may not be integral) are being compared to a range in a loop. Especially since the O(1) membership check of range() is well known (but not the fact that this does not work on integral floats), this can unexpectedly cause certain large-size checks to hang indefinitely.

90000000.0 in range(100000000)
>> True, 1.87 s

90000000 in range(100000000)
>> True, 3.81 µs

A new check can be added to the existing fast path branching to attempt to convert float values to long, and either fast returning False if unable, or using the arithmetic-based check if able.

Relevant source portions:
cpython/rangeobject.c at main · python/cpython (github.com)

static int
range_contains(rangeobject *r, PyObject *ob)
{
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,
                                       PY_ITERSEARCH_CONTAINS);
}

Curious: what is your use case for <integral float> in <range>?

2 Likes

I am not a fan of the inconsistent performance this proposal would cause. Certain values of float would be O(1), while other values would be O(n). I can’t think of a way to make this work for all values of float. But it can be made to work with a lot more values than just integral values. Where would we draw the line?

1 Like

Yes, it can be done. But it would add 10-20 lines of code to a function that currently includes only 4 non-empty lines of code. It will have non-zero implementation cost and some non-zero maintenance cost, and tiny, but non-zero execution cost. You have to demonstrate pretty good reasons of adding such code.

7 Likes

Code like this, 90000000.0 in range(100000000), is only supported for historical reasons, not because it is a good thing to do. It is inherently fragile because of rounding error and because floats have limited precision. This is an anti pattern and a likely source of hard to find bugs.

In general, a good rule of language implementation is that to optimize things that are good for you (making them more attractive) and not optimize things that are bad for you (making them less attractive).

The incentives should be aligned with encouraging users to make clear and robust designs. If the programmer intent is to match an integer in the range 0 to 100000000, the code should say so:

>>> x = 90000000.0
>>> x.is_integer() and 0 <= x < 100000000
True

Raymond

9 Likes

Wouldn’t all floats be O(1)? Ones where f.is_integer() is True would hit the O(1) range_contains_long path, and non integer ones would immediately return false.

Any optimization here would need to preserve the existing behavior. Is there a case where f in range(...) currently returns True for some float and f.is_integer() for that float does not?

You are correct. My understanding on how this worked was completely wrong.

We should not encourage people to perform float checks on range objects. This can cause surprising results.

n = 10**53
r = range(n-100, n+100)
print(1e53 in r)

This prints False.

If we add a fast path that encourages the use of checks like 31.0 in range(100), this will surely burn users when they hit larger values.

Maybe we should just deprecate ‘float in range’? It seems an attractive nuisance.

5 Likes

I agree this could be a possible option. Either Type Error or fast returning False. The initial issue this was about is the inadvertent usage of <float> in range() causing a large time-complexity change that could indefinitely hang some calculations that would otherwise be instant with int.

I don’t see how this issue is related to range, but rather floating point precision. The same calculation would still be false with regular equalities. As int(1e53) does not equal 10**53

n = 10**53

print(1e53 == n)
>> False
print(n-100 <= 1e53 < n+100)
>> False
print(1e53 in range(n-100, n+100))
>> False

This behavior looks pretty consistent. Adding the integral fast path would not change this behavior, just make it faster.

1 Like

Exactly! That is my point.

This is why we should not encourage people to use float in range.

Why should we take the time and energy to write an optimized fast path for something that we know people shouldn’t do, because if they do it, it will just lead them into bad habits and hard to solve bugs?

The fact that float in range works at all is a consequence of how containment testing and equality works in Python, and I’m not suggesting we remove that. (There are languages that consider automatic coercion between float and int a mistake, but Python isn’t that strict.)

But to spend time writing a fast path for something which does not do what people expect is not wise.

1 Like

float in range sounds really weird to me just like int in string.
BTW, the latter raises an error. Why not the former?

>>> 0 in '012'
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not int

It’s a consequence of the fact that things like 1 == 1.0 is True, and most sequences simply use equality when checking for containment. As far as I know, there’s no existing code for range objects that makes it happen.

You can see similar sorts of things with other containers. For example:

>>> nums = 1,2,3
>>> 1.0 in nums
True
>>> d = {1:'a'}
>>> d[1.0]
'a'
1 Like

Strings perform substring checks, not element by element comparisons. So strings require the left operand to be a string.

Other sequence types implement element comparisons which are logically equivalent to:

# obj in seq
any(obj == x for x in seq)

To me, it would be odd for the sequence to care about the type of obj.

>>> [] in range(10)
False

Well, yes, that is correct: it is false that [] appears in the sequence 0 through 9.

By the way:

>>> from fractions import Fraction
>>> Fraction(3) in range(10)
True

Thanks, Kevin and Steven. I forgot the point about string comparison.

I don’t feel the following code is odd whatever type of obj is:

>>> obj in [0,1,2,3,4,5,6,7,8,9,]

But, I feel weird about the following code, for example:

>>> 'x' in range(10)

because we know that range never yields char.

Sure, we know that range objects don’t contain chars. But that’s the wrong perspective. What does the range class itself know?

How does that range class know that the target object doesn’t compare equal to some int? Should it keep a blacklist of types that will immediately raise?

if isinstance(target, (str, list, tuple, dict, set)):
    raise TypeError
return any(target == x for x in self)

And then we will be forever dealing with complaints that we forgot to exclude this class, or that class. “Why doesn’t None trigger the TypeError?” So we add a test for None.

What about bytes, frozensets, Ellipsis, collections.deque, functions, modules, etc etc etc? It is a never-ending list.

Range objects are sequences. Sequences fall back on iteration and element by element equality tests for the in operator, because the sequence cannot predict what the target object will do:

class StrangeSet(set):
    def __eq__(self, other):
        if len(self) == 1 and isinstance(other, int):
            return next(iter(self)) == other
        return super().__eq__(other)

StrangeSet([5]) in range(10)  # Returns True

How about the “range” class acting like this?

class Range:
    def __init__(self, *v):
        self._range = range(*v)
    def __contains__(self, x):
        if isinstance(x, int):
            return x in self._range
        raise TypeError

Reading the documentation for the range class it specifies a specific arithmetic sequence of numbers (integers), never floats. So making the in operator return an error for not integer types seems sensible. This avoids all the issue with IEEE or other formats and makes the inclusion of a floating point number in a set of integers always false.

If there is demand, an frange (like numpy.arange) class/function could be defined as a buitin or in maths or itertools. Its not that complicated and given a value $\epsilon$ inclusion or not is easy to calculate.
John
Excuse the latex.

making the in operator return an error for not integer types seems sensible

Ideally Python does not switch on type. Though you coul duse the index dunder.

However, the goal of the range object is be a Sequence. and in other Sequence eq is used:

Out[3]: True

So that should work for range objects, too.

Frankly, I really don’t see why a fast path for a_float in a_range is the least bit important :slight_smile:

1 Like