Proposal:
Add a fast arithmetic path for __contains__ check of floats on range objects.
Why:
Currently the range() object already performs an arithmetic-based fast path for __contains__ checks of int values, but not float and integral float values. This results in very long comparisons for checks of floats.
This situation can arise in many cases when a list of floats (that may or may not be integral) are being compared to a range in a loop. Especially since the O(1) membership check of range() is well known (but not the fact that this does not work on integral floats), this can unexpectedly cause certain large-size checks to hang indefinitely.
90000000.0 in range(100000000)
>> True, 1.87 s
90000000 in range(100000000)
>> True, 3.81 ”s
A new check can be added to the existing fast path branching to attempt to convert float values to long, and either fast returning False if unable, or using the arithmetic-based check if able.
I am not a fan of the inconsistent performance this proposal would cause. Certain values of float would be O(1), while other values would be O(n). I canât think of a way to make this work for all values of float. But it can be made to work with a lot more values than just integral values. Where would we draw the line?
Yes, it can be done. But it would add 10-20 lines of code to a function that currently includes only 4 non-empty lines of code. It will have non-zero implementation cost and some non-zero maintenance cost, and tiny, but non-zero execution cost. You have to demonstrate pretty good reasons of adding such code.
Code like this, 90000000.0 in range(100000000), is only supported for historical reasons, not because it is a good thing to do. It is inherently fragile because of rounding error and because floats have limited precision. This is an anti pattern and a likely source of hard to find bugs.
In general, a good rule of language implementation is that to optimize things that are good for you (making them more attractive) and not optimize things that are bad for you (making them less attractive).
The incentives should be aligned with encouraging users to make clear and robust designs. If the programmer intent is to match an integer in the range 0 to 100000000, the code should say so:
>>> x = 90000000.0
>>> x.is_integer() and 0 <= x < 100000000
True
Wouldnât all floats be O(1)? Ones where f.is_integer() is True would hit the O(1) range_contains_long path, and non integer ones would immediately return false.
Any optimization here would need to preserve the existing behavior. Is there a case where f in range(...) currently returns True for some float and f.is_integer() for that float does not?
I agree this could be a possible option. Either Type Error or fast returning False. The initial issue this was about is the inadvertent usage of <float> in range() causing a large time-complexity change that could indefinitely hang some calculations that would otherwise be instant with int.
I donât see how this issue is related to range, but rather floating point precision. The same calculation would still be false with regular equalities. As int(1e53) does not equal 10**53
This is why we should not encourage people to use float in range.
Why should we take the time and energy to write an optimized fast path for something that we know people shouldnât do, because if they do it, it will just lead them into bad habits and hard to solve bugs?
The fact that float in range works at all is a consequence of how containment testing and equality works in Python, and Iâm not suggesting we remove that. (There are languages that consider automatic coercion between float and int a mistake, but Python isnât that strict.)
But to spend time writing a fast path for something which does not do what people expect is not wise.
float in range sounds really weird to me just like int in string.
BTW, the latter raises an error. Why not the former?
>>> 0 in '012'
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not int
Itâs a consequence of the fact that things like 1 == 1.0 is True, and most sequences simply use equality when checking for containment. As far as I know, thereâs no existing code for range objects that makes it happen.
You can see similar sorts of things with other containers. For example:
>>> nums = 1,2,3
>>> 1.0 in nums
True
>>> d = {1:'a'}
>>> d[1.0]
'a'
Sure, we know that range objects donât contain chars. But thatâs the wrong perspective. What does the range class itself know?
How does that range class know that the target object doesnât compare equal to some int? Should it keep a blacklist of types that will immediately raise?
if isinstance(target, (str, list, tuple, dict, set)):
raise TypeError
return any(target == x for x in self)
And then we will be forever dealing with complaints that we forgot to exclude this class, or that class. âWhy doesnât None trigger the TypeError?â So we add a test for None.
What about bytes, frozensets, Ellipsis, collections.deque, functions, modules, etc etc etc? It is a never-ending list.
Range objects are sequences. Sequences fall back on iteration and element by element equality tests for the in operator, because the sequence cannot predict what the target object will do:
class StrangeSet(set):
def __eq__(self, other):
if len(self) == 1 and isinstance(other, int):
return next(iter(self)) == other
return super().__eq__(other)
StrangeSet([5]) in range(10) # Returns True
Reading the documentation for the range class it specifies a specific arithmetic sequence of numbers (integers), never floats. So making the in operator return an error for not integer types seems sensible. This avoids all the issue with IEEE or other formats and makes the inclusion of a floating point number in a set of integers always false.
If there is demand, an frange (like numpy.arange) class/function could be defined as a buitin or in maths or itertools. Its not that complicated and given a value $\epsilon$ inclusion or not is easy to calculate.
John
Excuse the latex.