Bisect algorithm is slow

mike-neergaard · October 21, 2024, 1:25pm

In Lib/bisect.py, after boundary setting, the default bisection algorithm is

while lo < hi:
    mid = (lo + hi) // 2
    if x < a[mid]:
        hi = mid
    else:
        lo = mid + 1
return lo

An alternative algorithm eliminates the addition and gives the same results but runs significantly faster:

mid = (lo + hi) // 2
while lo < mid:
    if x < a[mid]:
        hi = mid
    else:
        lo = mid
    mid = (lo + hi) // 2
 return hi

I have done performance testing using both floats and ints. In my testing, the new algorithm is a few percent faster on array lengths up to 512, and then gains increase. On arrays of 2**20 or longer, I measure performance gains of better than 20%.

mike-neergaard · October 21, 2024, 1:41pm

This is the code I wrote to check. I redirect stdout to a file to record the results.

import time
import timeit
import random
import sys

def binary_comp(a: list, x: float, stock: bool):
    lo = 0
    hi = len(a)-1

    """ For testing, x will be valid
    # Could be that x is smaller than everything on the list
    if a[lo] > x:
        return [None, lo]
    # Could be that x is larger than everything on the list
    if a[hi] < x:
        return [hi, None]
    """

    if stock:
        while lo < hi:
            mid = (lo + hi) // 2
            if x < a[mid]:
                hi = mid
            else:
                lo = mid + 1
        # We have arranged that x < a[hi]
        # Either a[hi-1] is an exact match or
        # x not in items

        # lo and hi are equal in this case, but to match the existing code,
        # we return lo
        return lo
        """ We are not finding a range in this case, so no need to reset lo
        lo = hi-1
        """
    else:
        mid = (lo + hi) // 2
        while lo < mid:
            if x < a[mid]:
                hi = mid
            else:
                lo = mid 
            mid = (lo + hi) // 2
        # We have arranged that x < a[hi]
        # Either a[lo] is an exact match or
        # x not in items

    return hi
    """
    if items[lo] == x:
        return [lo]
    else:
        return [lo, hi]
        """


if __name__ == "__main__":
    # 100123456789 is prime!  I love this number
    random.seed(100123456789)
    total_time_stock = 0
    total_time_new = 0
    num_trials = 8192
    timeit_trial_base = 8192
    # Set div_num > 1 to test floats
    div_num = 11
    largest_array_log_len = 25
    print("Array Length, Stock Time, My Time, Percentage Improvement")
    for i in range(2,largest_array_log_len): 
        begin = time.time()
        print(i, "...", flush=True, end="", file = sys.stderr)
        time_stock = 0.0
        time_new = 0.0
        timeit_num = int(timeit_trial_base*(largest_array_log_len+1-i)*2/i)
        # Performance really degrades for me on for larger arrays.  
        # Probably paging.  Anyway, reduce the timeit_num to save time
        if i > 17: timeit_num = timeit_num // (1<<(i-14))

        for j in range(num_trials):
            items = sorted([random.randint(0, 1<<(i-1)) / div_num \
                    for j in range(1<<i)])
            value = random.randint(0, 1 << (i-1))/div_num
            while value < items[0] or value > items[-1]:
                value = random.randint(0, 1 << (i-1))/div_num

            """
            # Uncomment this to check that the two algorithms return 
            # the same results
            s =  binary_comp(items, value, True)
            m = binary_comp(items, value, False)
            if s != m:
                print(s, "!=", m, "for array", items,"and value",value)
                exit(1)
                """
            time_stock += timeit.timeit("binary_comp(items, value, True)",  \
                globals=globals(), \
                number=timeit_num)
            time_new += timeit.timeit("binary_comp(items, value, False)",  \
                globals=globals(), \
                number=timeit_num)

        print(str(1 << i)+",",\
                f"{time_stock: 0.4},",\
                f"{time_new: 0.4},",\
                f"{100*(time_stock - time_new)/time_stock: .4}")

        print(" done in", time.time()-begin,"seconds", flush=True, file=sys.stderr)
        total_time_stock += time_stock
        total_time_new += time_new

    print("all,",
            f"{total_time_stock: 0.4},",\
            f"{total_time_new: 0.4},",\
            f"{100*(total_time_stock - total_time_new)/total_time_stock: .4}",\
            flush=True)

Stefan2 · October 21, 2024, 4:00pm

Can you post the full actually equivalent function (at least for calls with just a and x)? And which bisect function is it equivalent to?

Stefan2 · October 21, 2024, 4:17pm

At least in CPython, the library code you showed isn’t actually used. The C code version is used instead. Where the indexes aren’t Python int objects, so the +1 is cheap.

JamesParrott · October 21, 2024, 4:55pm

It seems to be right there in bisect.bisect_right?

github.com

python/cpython/blob/65e43ca6d936304622ec8b3cd2a9b264b3a3755b/Lib/bisect.py#L21


      
          
              A custom key function can be supplied to customize the sort order.
              """
              if key is None:
                  lo = bisect_right(a, x, lo, hi)
              else:
                  lo = bisect_right(a, key(x), lo, hi, key=key)
              a.insert(lo, x)
          
          
          def bisect_right(a, x, lo=0, hi=None, *, key=None):
              """Return the index where to insert item x in list a, assuming a is sorted.
          
              The return value i is such that all e in a[:i] have e <= x, and all e in
              a[i:] have e > x.  So if x already appears in the list, a.insert(i, x) will
              insert just after the rightmost x already there.
          
              Optional args lo (default 0) and hi (default len(a)) bound the
              slice of a to be searched.
          
              A custom key function can be supplied to customize the sort order.

mike-neergaard · October 21, 2024, 5:00pm

James is correct as to where I found the code. Whether it is actually used is a question I can’t answer.

I can rewrite bisect_right and post it, though not in the next hour.

mike-neergaard · October 21, 2024, 6:07pm

I would replace bisect_right with the following:

def bisect_right(a, x, lo=0, hi=None, *, key=None):
    """Return the index where to insert item x in list a, assuming a is sorted.

    The return value i is such that all e in a[:i] have e <= x, and all e in
    a[i:] have e > x.  So if x already appears in the list, a.insert(i, x) will
    insert just after the rightmost x already there.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.

    A custom key function can be supplied to customize the sort order.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    # Note, the comparison uses "<" to match the
    # __lt__() logic in list.sort() and in heapq.
    mid = (lo + hi) // 2
    if key is None:
        while lo < mid:
            if x < a[mid]:
                hi = mid
            else:
                lo = mid
            mid = (lo + hi) // 2
    else:
        while lo < mid:
            if x < key(a[mid]):
                hi = mid
            else:
                lo = mid
            mid = (lo + hi) // 2
    return hi

mike-neergaard · October 21, 2024, 6:20pm

With regard to Stefan’s comment about C code, I think the C implementation is in Modules/_bisectmodule.c

I haven’t tried porting that version to the new algorithm and performance testing, so I can’t yet speak to the cost of addition in C.

Stefan2 · October 21, 2024, 6:23pm

Thanks. I get some wrong results:

a=[1, 1]  x=0  expect=0  result=1
a=[1, 1, 1]  x=0  expect=0  result=1
a=[1, 1, 2]  x=0  expect=0  result=1
a=[1, 2, 2]  x=0  expect=0  result=1
a=[2, 2, 2]  x=0  expect=0  result=1
a=[2, 2, 2]  x=1  expect=0  result=1
a=[1, 1, 1, 1]  x=0  expect=0  result=1
a=[1, 1, 1, 2]  x=0  expect=0  result=1
a=[1, 1, 1, 3]  x=0  expect=0  result=1
a=[1, 1, 2, 2]  x=0  expect=0  result=1
a=[1, 1, 2, 3]  x=0  expect=0  result=1
a=[1, 1, 3, 3]  x=0  expect=0  result=1
a=[1, 2, 2, 2]  x=0  expect=0  result=1
a=[1, 2, 2, 3]  x=0  expect=0  result=1
a=[1, 2, 3, 3]  x=0  expect=0  result=1
a=[1, 3, 3, 3]  x=0  expect=0  result=1
a=[2, 2, 2, 2]  x=0  expect=0  result=1
a=[2, 2, 2, 2]  x=1  expect=0  result=1
a=[2, 2, 2, 3]  x=0  expect=0  result=1
a=[2, 2, 2, 3]  x=1  expect=0  result=1
a=[2, 2, 3, 3]  x=0  expect=0  result=1
a=[2, 2, 3, 3]  x=1  expect=0  result=1
a=[2, 3, 3, 3]  x=0  expect=0  result=1
a=[2, 3, 3, 3]  x=1  expect=0  result=1
a=[3, 3, 3, 3]  x=0  expect=0  result=1
a=[3, 3, 3, 3]  x=1  expect=0  result=1
a=[3, 3, 3, 3]  x=2  expect=0  result=1

test script

from bisect import bisect_right as old
from itertools import combinations_with_replacement

def new(a, x, lo=0, hi=None, *, key=None):
    """Return the index where to insert item x in list a, assuming a is sorted.

    The return value i is such that all e in a[:i] have e <= x, and all e in
    a[i:] have e > x.  So if x already appears in the list, a.insert(i, x) will
    insert just after the rightmost x already there.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.

    A custom key function can be supplied to customize the sort order.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    # Note, the comparison uses "<" to match the
    # __lt__() logic in list.sort() and in heapq.
    mid = (lo + hi) // 2
    if key is None:
        while lo < mid:
            if x < a[mid]:
                hi = mid
            else:
                lo = mid
            mid = (lo + hi) // 2
    else:
        while lo < mid:
            if x < key(a[mid]):
                hi = mid
            else:
                lo = mid
            mid = (lo + hi) // 2
    return hi

for n in range(5):
    for a in map(list, combinations_with_replacement(range(n), n)):
        for x in range(n+1):
            expect = old(a, x)
            result = new(a, x)
            if result != expect:
                print(f'{a=}  {x=}  {expect=}  {result=}')
print('done')

Attempt This Online!

mike-neergaard · October 22, 2024, 7:10am

You do indeed. When x is less than every entry of a, the insertion point is incorrect. I can fix that with an initial check. I will think on whether there is a more clever way.

My apologies for the error.

Also, thanks for the cool collapsed code. I quite like that.

I modified my version of bisect_right and your script now reads as follows:

modified test script

from bisect import bisect_right as old
from itertools import combinations_with_replacement

def new(a, x, lo=0, hi=None, *, key=None):
    """Return the index where to insert item x in list a, assuming a is sorted.

    The return value i is such that all e in a[:i] have e <= x, and all e in
    a[i:] have e > x.  So if x already appears in the list, a.insert(i, x) will
    insert just after the rightmost x already there.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.

    A custom key function can be supplied to customize the sort order.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    # Note, the comparison uses "<" to match the
    # __lt__() logic in list.sort() and in heapq.
    mid = (lo + hi) // 2
    if key is None:
        if lo == hi or x < a[lo]: return lo
        while lo < mid:
            if x < a[mid]:
                hi = mid
            else:
                lo = mid
            mid = (lo + hi) // 2
    else:
        if lo==hi or x < key(a[lo]): return lo
        while lo < mid:
            if x < key(a[mid]):
                hi = mid
            else:
                lo = mid
            mid = (lo + hi) // 2
    return hi

for n in range(5):
    for a in map(list, combinations_with_replacement(range(n), n)):
        for x in range(n+1):
            expect = old(a, x)
            result = new(a, x)
            if result != expect:
                print(f'{a=}  {x=}  {expect=}  {result=}')
print('done')

mike-neergaard · October 22, 2024, 7:39am

I did some timing with Stefan’s script, and I think it’s safe to say my installation is using the C version of bisect.

I think that leaves me with three things to do:

Write down the corresponding logic for bisect_left
Think about whether there is a cleaner way to deal with an x that is less than every entry of a
Implement in C and do timing checks

mdickinson · October 22, 2024, 9:15am

I think a lo -= 1 before the initial mid = ... should do it.

With that fix, there are straightforward loop invariants. For the case of operating on the entire list a: on entry and exit from the while loop body, the following conditions are true:

lo == -1 or not x < a[lo]
hi == len(a) or x < a[hi]
-1 <= lo < hi <= len(a)

where I’m being a bit pedantic in the first invariant: usually not x < a[lo] is the same condition as a[lo] <= x, but the expressions could differ in theory, and in the case that they do differ it’s the first one that matters.

And then on exit from the loop, the above invariants apply along with the condition that hi == lo + 1, and correctness should follow.

mike-neergaard · October 22, 2024, 10:22am

Hi Mark,

I claim there are two problems. The first arises with an empty array. In that case, both lo and hi default to 0, lo is then decremented to -1 and
mid = (lo + hi) // 2
assigns mid to -1, producing an list index out of range error.

[Edit - this first objection is not well founded]

The second problem will arise in C. The C implementation uses unsigned integers, so 0-1 cannot produce -1.

Other than that, I really like it. It’s clever and it works.

mdickinson · October 22, 2024, 10:28am

In that case the list is never accessed: on entry to the while loop, both lo and mid are -1, so the while condition lo < mid is false. The loop only ever gets entered when hi - lo >= 2, and in that case we always have lo < mid < hi, so 0 <= mid < len(a) and a[mid] is valid.

Agreed with the translation into C problem, though.

mike-neergaard · October 22, 2024, 11:12am

Oh! I withdraw my first objection. Mark was right. I was wrong.

mike-neergaard · October 22, 2024, 11:58am

If I may say so, I think we are making some very reasonable unfounded assumptions, like hi >=0 and lo <= hi and hi <= len(a). At least in the python code, we don’t bother to check any of that. If the user is foolish enough to pass that in, we will cheerfully crash or return incorrect results.

I am not entirely sure why the code bothers to check lo >= 0, and I am also not sure why it returns a ValueError on that vs an IndexError. I’m sure someone wiser knows the answer to that.

Stefan2 · October 22, 2024, 12:53pm

Are you talking about Python’s code or the alternatives discussed here? Can you give example input for this?

mike-neergaard · October 22, 2024, 1:27pm

Both C and Python.

from bisect import bisect_right

print(bisect_right([0,0, 1,1,2], x=1, lo=0, hi=-3))

The correct answer is probably an error. Whatever the correct answer is, it shouldn’t be zero.

mike-neergaard · October 22, 2024, 1:32pm

Perhaps a better example:

from bisect import bisect_right

print(bisect_right([0,0, 1,1,2], x=1, lo=50, hi=-3))

Stefan2 · October 22, 2024, 1:43pm

Trying the latter, both guarantees about the result are true:

from bisect import bisect_right

a = [0,0, 1,1,2]
x = 1
lo = 50
hi = -3

ip = bisect_right(a, x, lo, hi)

print(ip)
print(all(elem <= x for elem in a[lo : ip]))
print(all(elem > x for elem in a[ip : hi]))

Output (Attempt This Online!):

50
True
True

So it looks correct to me. It does what it’s documented to do.