Basic binary search algorithm - wrangling with while loops and their conditions (x2)

Greetings fellow Pythonistas!

I am learning foundations of algorithmic design. I am trying to write a basic binary search function. I have made two separate attempts. They are both broken. I don’t know why. Below are my two specimens along with their corresponding REPL inputs and outputs where I demo the errors and tracebacks. Comments, tips, feedback welcome.

Here is the pseudo code provided by the instructor in the non-credit Udemy course I am taking:

This function accepts a sorted array and a value
Create a left pointer at the start of the array, and a right pointer at the end of the array
While the left pointer comes before the right pointer
Create a pointer in the middle
If you find the value you want, return the index pointer
If the value is too small, move the left pointer down
If the value is too large, move the right pointer up
Else return not present

Here is my first script where I attempt writing the algorithm using recursion:

def binary_seek(ordered_list, value):
    left_pointer = ordered_list[0]
    right_pointer = ordered_list[-1]
    while left_pointer < right_pointer:        
        center_pointer = int(len(ordered_list)/2)
        if center_pointer == value:
            return f"Found at: {center_pointer}"
        elif center_pointer < (ordered_list[left_pointer] < ordered_list[center_pointer]):
            new_list = ordered_list[left_pointer:center_pointer]
            return binary_seek(new_list, value)
        elif center_pointer < (ordered_list[center_pointer] < ordered_list[right_pointer]):
            new_list = ordered_list[center_pointer:right_pointer]
            return binary_seek(new_list, value)
        return "Not present"

Here is the REPL demo:

$ python
<<< from script1 import binary_seek
<<< binary_seek([ 1, 3, 4, 6, 8, 9, 11, 12, 15, 16, 17, 18, 19 ],18)
Traceback (most recent call last):
File “”, line 1, in
File “/home//file location/script1.py”, line 11, in binary_seek
elif center_pointer < (ordered_list[center_pointer] < ordered_list[right_pointer]):
~~~~~~~~~~~~^^^^^^^^^^^^^^^
IndexError: list index out of range
<<<

Please take note: At that line highlighted in the traceback, I tried adding -1 and +1 after ‘right_pointer’ which did not resolve or change any thing. I am grappling at straws there.

Here is my next attempt:

def binary_seek(ordered_list,elem):
    start = 0
    end = ordered_list[-1]
    middle = (start + end) // 2
    while (ordered_list[middle] != elem) and (start <= end):
        print(start, middle, end)
        if elem < ordered_list[middle]:
            end = middle -1
        else:
            start = middle + 1
            middle = (start + end) // 2
    if ordered_list[middle] == elem:
        return middle
    return False

When I run : binary_seek([ 1, 3, 4, 6, 8, 9, 11, 12, 15, 16, 17, 18, 19 ],18) Python begins iterating an infinite loop with the print output of: 0 9 8. The printed output always results in an infinite loop with the same ‘0 9 8’ repeating over and over regardlss which elem variable I pass in.

I’ll offer one suggestion: execute the code by hand, sketching your input list and the positions of your indexes as execution proceeds. Binary search algorithms typically end when the right and left indexes meet or cross. It sounds to me like you’re failing to consider one of those cases. Alternately, you may be failing to adjust one index or the other correctly. (Note: I didn’t look at your code. Much too early and not yet enough coffee in the tank.)

2 Likes

In your code the end (right_pointer) and middle are not pointers (=indices into the array rather than values in the array). So this line

end = orderdered_list[-1]

is incorrect. If you don’t directly see why, then run this

binary_seek([0, 1, 2, 3, 1000], 0)

The current code will crash with that input.

Another problem is that some input will trigger an infinite loop (as you already saw) caused by an indentation issue inside the while-block [1]


  1. middle` is not always being reset ↩︎

Good you have a print at the start of the loop.
I would also add more prints in the if and else branches as well as at the end of the loop. Then you will know a lot more about how you changed start, middle and end.

Thank you for your reply.

As you suggested, I sprinkled some more print statements through out my script. That helped. I am also now using my VSC Python debugger.

When I added middle = (start + end) // 2 within the first nested if conditional statement, that enables my script to behave as intended but this only works when I search for an element that is within the first half of the list of integers.

For example this:

print(binary_seek([ 1, 3, 4, 6, 8, 9, 11, 12, 15, 16, 17, 18, 19 ],6))

…returns: “Found: 3”

So that works.

But when I proceed to search for the value of 18, Python throws “index out of range”.

What might you recommend I try next?

Here is the latest version of my script:

def binary_seek(array,element):
    start = 0
    end = array[-1]
    middle = (start + end) // 2
    print(f'Initialization: {start}, {middle}, {end}')
    while (array[middle] != element) and (start <= end):
        print(start, middle, end)
        if element < array[middle]:
            print(start, middle, end)
            end = middle -1
            middle = (start + end) // 2
            print(start, middle, end)
        else:
            print(start, middle, end)
            start = middle + 1
            middle = (start + end) // 2
            print(start, middle,end)
    if array[middle] == element:
        print(start, middle, end)
        return print(f'Found:{middle}')
    else:
        print(start, middle, end)
        return print(f'Not Found')

print(binary_seek([ 1, 3, 4, 6, 8, 9, 11, 12, 15, 16, 17, 18, 19 ],18))

The output in my shell reads:

It breaks before the second iteration of the while loop can execute.

@barry-scott : Can you or anyone else clarify further and perhaps elaborate on why my while loop breaks there?

What change do I need to make to better make sure the index stays within range?

Hi @enoren5 - Please see my earlier reply? You still have the original bug (mix-up of array indices and array elements) in your latest code [1].


  1. end should be initialized as len(array) - 1 not as array[-1] ↩︎

Hi,

I believe that you want to fix your function header to (missing the asterisk * and fix the order of entry):

def binary_seek(elem, *ordered_list):

This is from Python theory. From a general standpoint, functions should be defined as:

def my_function(positional, *args, **kwargs) # They are always in this order

*args - unmatched arbitray inputs
**kwargs - key valued entries

When you make the function call, if you pass in a list:

this_list = [4,5,6,7,8]
my_function(positional_value, *this_list)

Here is an example:

H = [2,3,4,5] # can be of any length

def my_function(one_value, *x):

    print('The single value entered is: ', one_value)

    j = []
    for i in x:
        g = i**2
        j.append(g)
    return j

print(my_function(25, *H))      # Pass in predefined list (or tuple)

passing the list is what is expected in a sort alogithm not passing the contents of a lists.

This is not required.

1 Like

As @hansgeunsmeyer said, you’re mixing indexes and values at those indexes.

For example:

start = 0
end = array[-1]

What is 0? It’s the index of the first element.

What is array[-1]? It’s the value of the last element.

array[-1] is 19, but array is only 13 elements long.

That’s why it sometimes raises an IndexError.

You are right that you made a similar observation and suggestion earlier but I didn’t understand it. In your previous comment you referred to my completely different code specimen and I didn’t realize your comment was applicable to both.

This is the trivial oversight on my part. This is very clear now and I am not sure how I could have missed this. Thanks for your patience @hansgeunsmeyer.

@MRAB echoes similar advice which is also instructive.

Thank you both for clarifying further.

Let’s revisit my second original code snippet above where I attempt to implement a basic binary search but using recursion.

Here is my latest attempt:

def binary_seek(ordered_list, value):
   left_pointer = 0
   right_pointer = len(ordered_list) - 1
   while left_pointer < right_pointer:
       center_pointer = len(ordered_list) // 2
       if ordered_list[center_pointer] == value:
           return f"Found at: {center_pointer}"
       elif ordered_list[left_pointer] < value < ordered_list[center_pointer]:
           right_pointer = ordered_list[center_pointer]
           new_list = ordered_list[left_pointer:center_pointer]
           return binary_seek(new_list, value)
       elif ordered_list[right_pointer] < value < ordered_list[center_pointer]:
           left_pointer = ordered_list[center_pointer]
           new_list = ordered_list[center_pointer:right_pointer]
           return binary_seek(new_list, value)
   return "Not present"

Where I initialize the end value (I called it the right_pointer), I have wrapped the list variable with the len() method and subtracted 1. That’s squared away.

My script still has issues. The while loop exectutes infinitely. The only thing that works is when I pass in a valid integer to match the exact middle value. For example, this print statement successfully returns the centre point:

print(binary_seek([1, 3, 4, 6, 8, 9, 11, 12, 15, 16, 17, 18, 19,22],12))

So the recursive check condition works. Great. But when searching / seeking other values, it falls apart.

I’ve played around with it in my debugger and I have identified that the problem now involves my comparison operations for both elif conditions.

elif ordered_list[left_pointer] < value < ordered_list[center_pointer]:

This does not work. What I am trying to accomplish here is to test when the value is between the left pointer and the center pointer. If that is the case, then I wish to keep the left pointer where it is and move the right pointer down to center and to then proceed with recursion, to run the function again with a modified new list (sliced accordingly) and with the same value as before. (And then perform a similar elif condition for testing the value if it is to the right of the center instead of left.)

How might you people re-write the elif comparison operations to better make sure the script runs as intended with recursion? What other leads and insight might you people be able to provide? Feedback, hints, comments welcome.

For a recursive version you do not need a loop.
Either you are down to 1 cell to check and return found/not found
Or you recurse on the left or right half.

1 Like

Your code has right_pointer = ordered_list[center_pointer] and left_pointer = ordered_list[center_pointer], but those lines aren’t getting indexes, they’re getting the values at those indexes.

@barry-scott pointed out that you don’t need a while or for loop if you use recursion. It’s best to make a clear choice there: either use recursion (which is actually simplest) or a loop.

You also need to make sure that your conditional statements are correct and cover all cases.
I’ll rename some variables (since long names make the code less readable imo). You then have

if lis[center] == value:
   ...
elif lis[left] < value < lis[center]:
   ...
elif lis[right] < value < lis[center]:
   ...

The conditions here are partially redundant and partially incorrect.
It’s partially redundant, since during the iteration (or recursion) you need to keep the invariant that left <= center <= right, and you assume that the list is sorted so it makes no sense to do the double-checking with two < comparisons.
So, this should be:

if value == lis[center]:  # done
    ...
elif value < lis[center]:  # search left side
    ...
else: # search right side
   ....

This makes it clear you indeed cover all cases, and that you are not introducing incorrect cases (in your last code fragment the last condition was incorrect).

Then inside those clauses, you are still mixing up pointers and values. For instance:
right_pointer = ordered_list[center_pointer]
Ask yourself, once more, after this assignment, is right_pointer a pointer (index) or a value?

I was thinking about how you might better have found this mistake
(having made the mistake, we all have a tendency to see “what the code
should have done” when we reread our code).

This kind of thing can be hard to debug, particularly when your list
values are similar in type and size to the list length.

Maybe consider testing with different types, eg a list like:

 ['abc', 'def', 'g', 'hij']

Then it might be glaringly obvious that you’d picked up a value instead
of an index because they’re different types.

This appraoch isn’t always applicable, but might help in a future
scenario.

Also, if you are dealing with the same type (int indices and int
values) making the values very different from the indicaes might
similarly help.

Also of this supposes you already suspect this kind of mistake, but the
principle of having the domain and range be obviously different feels
like a general approach.

1 Like

Your conditions are:

elif ordered_list[left_pointer] < value < ordered_list[center_pointer]:

and

elif ordered_list[right_pointer] < value < ordered_list[center_pointer]:

Shouldn’t the second one be something like:

elif ordered_list[right_pointer] > value > ordered_list[center_pointer]:

if you’re looking for a value between center and right?

ed: Hans’ is better, should’ve kept reading

Based on all of your feedback since my last appearance, here are some changes I have made to my script:

  • As per @barry-scott, I flattened my control flow by removing the while loop.
  • As @cameron noted, since my ordered list values are similar in type and size as well as list length which makes it confusing to debug, my new list that I am experimenting with is: [101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234].
  • Reworked the three conditional cases as @hansgeunsmeyer suggested.
  • Refactored center_pointer, right_pointer, left_pointer to: center, right, left

Having said all of that, here is my latest iteration of my script:

def binary_seek(ordered_list, value):
    left = 0
    right = len(ordered_list) - 1
    center = len(ordered_list) // 2
    if value == ordered_list[center]:
        return f"Found at: {center}"
    elif value < ordered_list[center]:
        right = ordered_list[center]
        new_list = ordered_list[left:center]
        # center = len(new_list) // 2
        return binary_seek(new_list, value)
    else:
        left = ordered_list[center]
        new_list = ordered_list[center:right]
        # center = len(new_list) // 2
        return binary_seek(new_list, value)
    return "Not present"

When I call the function passing in the list [101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234] and also pass in the value to search for 567, then it successfully returns "Found at 2. So that is progress. But when I pass in 812 as the value to seek, the function returns 1. Something still is not right.

@MRAB identified this line in my prior code snippet: right_pointer = ordered_list[center_pointer] and left_pointer = ordered_list[center_pointer]. The problem there is that I was confusing pointer positioning by pulling the list values rather than list indexes. Other forum members like @hansgeunsmeyer reinforced similar messaging. Therefore, here is the same script but with (as far as I can tell) correct index assignment / reassignment at those lines:

def binary_seek(ordered_list, value):
    left = 0
    right = len(ordered_list) - 1
    center = len(ordered_list) // 2
    if value == ordered_list[center]:
        return f"Found at: {center}"
    elif value < ordered_list[center]:
        right = center
        new_list = ordered_list[left:center]
        # center = len(new_list) // 2
        return binary_seek(new_list, value)
    else:
        left = center
        new_list = ordered_list[center:right]
        # center = len(new_list) // 2
        return binary_seek(new_list, value)
    return "Not present"

When I pass in 999, as I follow along in my debugger as I step through each line, the first pass works as intended where the first conditional is triggered, a new list is created which is sliced perfectly and then the script recursively calls the function again with the new list and same value. But on the next routine, the second conditional is not triggered and proceeds to the third conditional and then slices a new list which excludes the 999 value. This means my function is still not behaving as I’d expect. When I let the script run until it’s full completion, the rightpoint ends up being declared as a -1 integer and Python throws ‘index out of range’ again.

The same issue occurs when I test a value such as 9990000 that is clearly not present in the list. In that case my intention is to have the function return "Not present". I figure the reason why that doesn’t work is that the else statement finalizes the routine which prevents return "Not present" from ever being invoked. One solution I explored was to change the existing else line back into elif and test with a comparison operation like I had before and then use a fourth else case to account for situations where the item is not found. So here is that change:

   ...
    elif value > ordered_list[center]:
        left = center
        new_list = ordered_list[center:right]
        return binary_seek(new_list, value)
    else:
        return "Not present"

The “Not present” return statement is still never triggered even when I test a value not present in the list I am evaluating.

First I’m simplifing your code as left and right are not interesting as you have coded them.

def binary_seek(ordered_list, value):
    center = len(ordered_list) // 2
    if value == ordered_list[center]:
        return f"Found at: {center}"
    elif value < ordered_list[center]:
        new_list = ordered_list[:center]
        # center = len(new_list) // 2
        return binary_seek(new_list, value)
    else:
        new_list = ordered_list[center:]
        # center = len(new_list) // 2
        return binary_seek(new_list, value)
    return "Not present"


data = [101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234]

for value in data:
    print('Seeking %d: %r' % (value, binary_seek(data, value)))

Which runs like this:

Seeking 101: 'Found at: 0'
Seeking 444: 'Found at: 1'
Seeking 567: 'Found at: 2'
Seeking 812: 'Found at: 1'
Seeking 999: 'Found at: 1'
Seeking 1101: 'Found at: 5'
Seeking 1232: 'Found at: 1'
Seeking 1522: 'Found at: 2'
Seeking 1684: 'Found at: 1'
Seeking 222234: 'Found at: 1'

Ok so why is this?
Lets look at what each recursive call is passed as data.

def binary_seek(ordered_list, value):
    print('binary_seek(%r, %r)' % (ordered_list, value))
    center = len(ordered_list) // 2
    if value == ordered_list[center]:
        return f"Found at: {center}"
    elif value < ordered_list[center]:
        new_list = ordered_list[:center]
        # center = len(new_list) // 2
        return binary_seek(new_list, value)
    else:
        new_list = ordered_list[center:]
        # center = len(new_list) // 2
        return binary_seek(new_list, value)
    return "Not present"


data = [101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234]
value = 999
result = binary_seek(data, value)
print('Seeking %d: %r' % (value, result))

Which outputs this:

binary_seek([101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234], 999)
binary_seek([101, 444, 567, 812, 999], 999)
binary_seek([567, 812, 999], 999)
binary_seek([812, 999], 999)
Seeking 999: 'Found at: 1'

As you can see the list is smaller each time and you do not keep track of the position that the list starts at.

You could pass in the index of the left edge and add that the result.

But you could also consider that you do not need to create new lists on each recursive call.
What if you pass in the indexes instead and never created a new list?

def binary_seek(ordered_list, value, left, right):
1 Like

Think about this:

    elif value < ordered_list[center]:
        right = center
        new_list = ordered_list[left:center]

At this point, you know that the element at center isn’t the one you want, yet you’re including it when you’re slicing ordered_list to make new_list.

Similarly for the else part.

You’re make shorter and shorter slices until either you find the element or the slice is empty, in which case it’s not found, but how does your function handle an empty slice? Badly!

Incidentally, when it says “Found at …”, the position it gives is the position in that list, which is usually a slice of a list that its caller had.

Personally, I’d forget about recursion and just use a while loop and a half-open range.

Bare in mind that @enoren5 is exploring coding ideas, not looking for a production answer.

My advice is to study the code step by step; have a clear expectation of what should happen at each step, check what does happen, and look for a bug.

Some useful resources:

(You may be able to find similar guides that are specific to the IDE you’re using, if you’re using one)

For example: when you call binary_seek([101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234], 812), what do you think should be the values set for left, right and center at the top? Will the value be found immediately? If not, which half of the list should be searched? What should the new_list be? (Try also using a value that’s in the second half of the original ordered_list, and at the middle, so that you test each branch of the condition.) Does all of that work as you expect?

Now, think about the recursive step. What do you expect to happen at each recursion? (Try testing the ordered_list value at the start of each call.) What values do you expect to get returned? What should be the relationship between the value you get from the recursive call, and the value that you return from the current call?

Solving problems like this is a matter of reasoning and careful analysis - you already know everything you need to know about Python to figure it out.