Basic binary search algorithm - wrangling with while loops and their conditions (x2)

Hi @kknechtel! Thank you for these resources. I especially liked your first link. The two main takeaways I got from reading that older but still instructive Wordpress article is to (1) ‘obtain a rubber ducky’ and (2) write technical specifications for each method (which I think are similar to docstrings) which includes stating preconditions and postconditions. I’ve attempted to do this below. But I still struggle. It is incomplete.

I begin with practicing Rubber Ducky Debugging as I analyze @barry-scott’s code first since it is ‘low hanging fruit’ and easier for me to understand and describe for starters. My self-narrative I have inserted in-line and then discuss what I learned and other considerations afterwards.

Here is Barry’s code with my annotations:

def binary_seek(ordered_list, value):
    print('binary_seek(%r, %r)' % (ordered_list, value))
    # Initialize center index:
    center = len(ordered_list) // 2
    # Declare recursive base case (correctly positioned):
    if value == ordered_list[center]:
        return f"Found at: {center}"
    # In the case where the target value is less than the center value, proceed:
    elif value < ordered_list[center]:
        # Slice a new list based on the first half of the original list to process on the next recursive pass:
        new_list = ordered_list[:center]
        return binary_seek(new_list, value)
    # In the case where the target value is to the right of the original center value, proceed:
    else:
        # Slice new list, but on the opposite side
        new_list = ordered_list[center:]
        # Seek value again this time in sliced list
        return binary_seek(new_list, value)

data = [101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234]
value = 999
result = binary_seek(data, value)
print('Seeking %d: %r' % (value, result))

Which outputs this:

binary_seek([101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234], 999)
binary_seek([101, 444, 567, 812, 999], 999)
binary_seek([567, 812, 999], 999)
binary_seek([812, 999], 999)
Seeking 999: 'Found at: 1'

Barry’s testing methodology is very helpful because the problem is now clear: Every time the recursive function is called, the list being processed becomes smaller and smaller. Once the base case is hit, the index being “Found” is the index of the reduced list, not the original list. Before using Barry’s output technique with print statements, using my debugger I could see a similar pattern. I will try to combine rubber ducky annotations with my debugger in this way going forward.

In Barry’s script, there are no left/right mechanisms any more. I think Barry may have removed them just to make it easier to teach and highlight the error I was making. But I am pretty sure right / left are still required to achieve a working binary search algorithm.

The problem now is I can’t Rubber Ducky my way to a different solution yet.

What I can do is use Karl’s line of questioning as guidance and to persevere as far as I can.

Before I get started, I think it is safe to toss out the idea of recursively processing the slicing of smaller and smaller temporary lists. That’s gone.

Here goes my feeble attempt at answering Karl’s questions:

Based on these questions, if I were to rewrite my script from scratch, here is a docstring written in free-hand (which after writing it, seems to more so resemble pseudo code than a formal docstring).

When the function is called the first time, these variables need to be initialized:

  • The left variable start at 0,
  • The right must start at the total length of the list minus 1 (so: len(ordered_list)-1),
  • The center value needs to be an integer value half way between the farthest left point and right point. This can be accomplished with floor division (in this way: (left+right)// 2 or (len(ordered_list)-1)//2.
  • Shortly after initializing the above variables, the base case needs to be promptly established (this is always true for recursion in general where the base case appears at the top and subsequent conditions come later).
  • If the base case is not triggered, the script should proceed to define the next condition which is to evaluate if the target value is smaller than the value at the center index. If this condition tests true, then the function is called recursively to seek the target value again but this time the center index becomes the new right index and the new center is re-calculated using a similar floor division operation.
  • In the only other possible case (else:) in the event where the value is greater than the item at the original center, proceed to call the function recursively until the base case is hit (in the opposite direction to the right instead of to the left).

I’ve scrapped the approach where I am slicing the ordered_list into new_lists at each recursive function call. That doesn’t work as expected as Barry helped explain and as I then demonstrated.

At each recursive call, what I need to do is slightly shift left and right pointers as well as the center index values rather than slicing the list. I get that much. While this seems to be the key crucial change I need to make now, I really don’t know how to model this variation in the next iteration of my script. I can’t make the leap from this English written quasi-pseudo code understanding to Python code. I am all frazzled.

At each recursive call the list and target value must always remain the same. Those are static variables. Previously my list was not static and that was the problem. The only variables which should vary at each recursive function call I would think are center, left, and right. But again, I don’t know how to model these variable re-assignments in Python within the two conditional case declarations.

That’s all I got. I only feel marginally closer to understanding a basic binary search algorithm compared to before.

Special thanks to @kknechtel and @barry-scott for their help and insight. Further tips and suggestions on how to improve my pseudo code welcome. I am open to new ideas on how to better approach the next iteration of my binary seek function in Python.

Perhaps you are right that the original while loop model would work best in the real world but in my situation I am exploring algorithmic software design principals. I am interested in different Python implementations and techniques just for the sake of comparison and learning.

In recursive algorithms, instead of “re-assigning”, you pass the new value as an argument, so that the recursive call knows what value to use.

(It’s also possible to make the algorithm work by slicing the list; but it’s both more complex overall and less efficient, as a bunch of slices will remain in memory while tracing through the recursion. The trick is that when the second half is sliced, you need to account for how many elements were taken off the front, in order to translate an index from the “next” recursive call into an index for the “current” one.)

Yes that was the idea :wink:

Personally I rarely use debuggers in any language.
Instead I implement logging that answers questions about behaviour of code. The act of figuring out the questions is that rubber ducky thing.

1 Like

Despite my valiant attempt at completing this algorithm using recursion on own and even after doing exactly what other forum members suggested (by using the Rubber Ducky Technique to talk myself through my previously attempted iteration of my script), now that I have exhausted all avenues and don’t see any other tips or hints coming in, I made the decision to “break the seal” so to speak by asking ChatGPT. See below for the code snippet. It works. But there is an interesting mechanic I noticed which I only partially understand. I was wondering if some of you might be able to provide some additional insight into it.

As you can see in the two conditional statements in the code below, when the function is returning the function for another pass, the mid (center) index is either being decremented 1 index position to left or being incremented 1 index position to the right. However with my understanding of binary search, the center index should be divided by 2 again at every pass, not traversing each position in the list one at a time. With a true binary search, at every pass, a new comparison should be made with the new left and new right. In the context of Big-O notation, with really large ordered lists (say with one million or more items), after the first center position determination (the 500,000 index position), it would be extremely inefficient to begin incrementing 1 index position to reach the final destination if the target were found at the 750,000 mark. The beauty of the way binary search (and eventually binary search trees) should work is that you’d keep cutting the numbers in half meaning that you’d reach the 750,000 destination in only 2 operations! That is the whole point and advantage of binary search compared to linear search algorithms which would otherwise take 750,000 operations to find the same item. But with the code snippet provided by ChatGPT, the mid index is only calculated once with floor division and then it effectively becomes a linear search algorithm by having to traverse each position by 1 value sequentially until it reaches 750,000 which therefore would take 250,000 operations. That’s not a well designed binary search algorithm. For small lists with 10 items like I have been playing with, the current state is fine. But there is probably a better alternative here.

Here is the code:

def binary_search_recursive(arr, target, low, high):
   # Base case: element not found
   if low > high:
       return -1 

   mid = (low + high) // 2

   if arr[mid] == target:
       # Base case: element found at mid index
       return mid 
   elif arr[mid] > target:
       # Search left half
       return binary_search_recursive(arr, target, low, mid - 1) 
   else:
       # Search right half
       return binary_search_recursive(arr, target, mid + 1, high) 

# Example usage:
arr = [101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234]
target = 222234
result = binary_search_recursive(arr, target, 0, len(arr) - 1)

if result != -1:
   print(f"Element: {target} which also can be referred to as: {arr[result]} found at index: {result}")
else:
   print("Element not found.")

Here is the successful output:

"Element: 222234 which also can be referred to as: 222234 found at index: 9"

To improve the script to perform more like an actual binary search as discussed above, I tried replacing mid - 1 and mid + 1 paramaters for each recursive call under the conditionals while also including mid = (low + high) // 2 right above like this:

def binary_search_recursive(arr, target, low, high):
    # Base case: element not found
    if low > high:
        return -1  

    mid = (low + high) // 2

    if arr[mid] == target:
        # Base case: element found at mid index
        return mid  
    elif arr[mid] > target:
        # Search left half
        mid = (low + high) // 2
        return binary_search_recursive(arr, target, low, mid)  
    else:
        mid = (low + high) // 2
        return binary_search_recursive(arr, target, mid, high)  


# Example usage:
arr = [101, 444, 567, 812, 999, 1101, 1232, 1522, 1684, 222234]
target = 222234
result = binary_search_recursive(arr, target, 0, len(arr) - 1)

if result != -1:
    print(f"Element: {target} which also can be referred to as: {arr[result]} found at index: {result}")
else:
    print("Element not found.")

…but then in my debugger it gets stuck in a perpetual recursive loop.

If anyone could shed some more light on this, that would be great. Thanks.

It doesn’t become a linear search.

How does a binary search work?

It does this:

Are there any items? If no, the desired item isn’t there (obviously!).

Otherwise:

Is the desired item in the middle? If yes, found it.

Otherwise:

If the desired item is below the middle item, look in the bottom half, else look in the top half.

Any how does it look in the bottom or top half? By using a binary search.

So why the - 1 and + 1?

Well, at that point it knows that the middle item isn’t the desired one, so there’s no point in including it when looking in the bottom/top half. Why keeping looking where you know it isn’t?

And if you include it anyway? You’ll be including an item, that you already know isn’t that one you want, in both halves, and at some point you’ll searching a list of 1 item.

Half of a list containing 1 item will be a list of 1 item - the same list - if you include that 1 item.

1 Like

Up to this point I have been playing with a list of ~10 items. And by ‘playing’ I mean I have been substituting different search values and then watching each variable change using my debugger as the pointer traverses each line, one at time. In the name space box in my code editor, every mid value kept decrementing by one. So I set out to prove you wrong, @MRAB, using the technique below and I ended up proving you right. Ha! :slightly_smiling_face:

For my future reference, I’ve “rubber duckied” myself through the solution at the end of this post. While I understand most of it now, there is one minor quirk which I struggle to explain. Anyways, all of that is at below my code snippets. Have a read. Thanks.

To better model the issue, I began using a much longer list as a test case. I generated a list of 400 random integers (varying within the range of 100 to 10,000), sorted it, and printed it in my shell like so:

import random

arr = [random.randint(100, 10000) for _ in range(400)]
arr.sort()
print(arr)

Next I copied the list into my script like this:

def binary_search_recursive(arr, target, low, high):
    # Base case: element not found
    if low > high:
        return -1  

    mid = (low + high) // 2

    if arr[mid] == target:
        # Base case: element found at mid index
        return mid  
    elif arr[mid] > target:
        # Search left half
        return binary_search_recursive(arr, target, low, mid - 1)  
    else:
        # Search right half
        return binary_search_recursive(arr, target, mid + 1, high)  

arr = [184, 193, 204, 211, 213, 222, 231, 250, 272, 357, 448, 541, 579, 608, 620, 630, 641, 674, 677, 714, 728, 746, 765, 781, 836, 846, 868, 871, 911, 985, 1074, 1096, 1124, 1133, 1142, 1176, 1178, 1204, 1208, 1239, 1245, 1246, 1304, 1314, 1356, 1372, 1377, 1424, 1426, 1452, 1454, 1480, 1494, 1548, 1561, 1584, 1668, 1687, 1689, 1691, 1745, 1750, 1756, 1812, 1819, 1826, 1900, 1912, 1940, 1963, 1971, 2035, 2070, 2080, 2100, 2160, 2291, 2301, 2301, 2311, 2322, 2342, 2348, 2401, 2409, 2419, 2419, 2423, 2523, 2567, 2571, 2582, 2614, 2620, 2640, 2651, 2652, 2705, 2709, 2743, 2747, 2766, 2782, 2793, 2797, 2831, 2867, 2870, 2896, 2928, 2959, 2979, 3031, 3038, 3040, 3064, 3101, 3103, 3117, 3170, 3228, 3234, 3297, 3322, 3327, 3340, 3372, 3398, 3424, 3427, 3458, 3464, 3481, 3496, 3501, 3575, 3581, 3616, 3617, 3631, 3672, 3693, 3693, 3706, 3707, 3724, 3738, 3806, 3903, 3910, 3934, 3953, 3968, 4017, 4033, 4117, 4151, 4191, 4199, 4202, 4204, 4211, 4256, 4325, 4378, 4395, 4422, 4438, 4443, 4481, 4483, 4503, 4606, 4619, 4672, 4691, 4712, 4724, 4741, 4743, 4752, 4808, 4889, 4917, 4924, 4948, 4964, 4974, 5060, 5061, 5074, 5085, 5088, 5089, 5119, 5123, 5131, 5191, 5215, 5221, 5263, 5465, 5568, 5599, 5636, 5658, 5673, 5700, 5723, 5742, 5809, 5811, 5829, 5835, 5858, 5865, 5899, 5910, 5916, 5971, 6037, 6052, 6087, 6089, 6108, 6121, 6136, 6213, 6241, 6252, 6258, 6290, 6338, 6346, 6382, 6401, 6415, 6443, 6451, 6482, 6498, 6531, 6576, 6599, 6617, 6627, 6646, 6676, 6688, 6713, 6724, 6728, 6742, 6777, 6785, 6805, 6874, 6891, 6931, 6965, 6994, 7011, 7018, 7106, 7121, 7127, 7142, 7167, 7179, 7180, 7227, 7285, 7308, 7320, 7323, 7355, 7358, 7361, 7367, 7375, 7390, 7415, 7416, 7464, 7478, 7502, 7513, 7516, 7534, 7543, 7592, 7626, 7636, 7689, 7736, 7767, 7767, 7805, 7812, 7871, 7882, 7904, 7904, 7940, 7944, 7946, 7946, 7953, 8042, 8043, 8068, 8109, 8121, 8149, 8247, 8249, 8253, 8255, 8273, 8280, 8318, 8321, 8355, 8398, 8433, 8443, 8462, 8483, 8484, 8578, 8590, 8611, 8620, 8620, 8630, 8645, 8646, 8662, 8667, 8692, 8703, 8714, 8719, 8780, 8789, 8801, 8817, 8840, 8884, 8899, 8909, 8913, 8918, 8971, 8979, 9007, 9016, 9030, 9104, 9113, 9124, 9126, 9174, 9179, 9187, 9187, 9238, 9250, 9250, 9257, 9278, 9292, 9297, 9300, 9406, 9407, 9410, 9473, 9479, 9512, 9535, 9595, 9599, 9638, 9646, 9676, 9697, 9779, 9779, 9800, 9801, 9815, 9857, 9859, 9877, 9925, 9935, 9940, 9978, 9990]

print(f"The middle of a 400 item long list is the 199th's position which is: {arr[199]}")
print(f"The 99th item (which is the first quarter's exact position) is: {arr[99]}")
target = 2743

result = binary_search_recursive(arr, target, 0, len(arr) - 1)

if result != -1:
    print(f"Element: {target} which also can be referred to as: {arr[result]} found at index: {result}")
else:
    print("Element not found.")

What I am doing above is trying to design a scenario where a binary search algorithm reaches the base case easily on the second pass. This means if there is a list of 400 items, the middle item is 199 and the next saught after item would be at 99 or 299 positions. I figured if my script had to call the recursive function 100 times to get from 199 to 99 for example, then that would show it became a linear search. But to my suprise the function only executed twice. Therefore it is a true binary search. @MRAB is right.

Here is why.

When the function is first executed, when searching for 2743 for example, the high position is initialized as 399, the low is 0, and the mid is first defined as 199. So far so good. Since the resulting 199 item is 5221 and is not equal to the target of 2743, the pointer continues its traversal down to the next conditional where 5221 is greater than 2743. What happens next is my missing puzzle piece: The array, the target, the low are all the same and are passed into the next recursive call, but the new high value is based on the old mid - 1. While high started out as 399, as the next recursive function call is executed, the new high value becomes mid which is 199 - 1 (so 198). Python then continues to declare mid again with low being the same 0 but since new high is 198, the floor division of 2 turns the new mid into 99. Then the base case is reached because position 198 is 2743 which is the same as the target. They are equal. Algorihtm complete.

That addresses pretty much all of my prior confusion. Binary search with recursion makes sense I guess.

As far as I am concerned, that should be the end but then what happens next in my debugger baffles me: After visiting line 10 (return mid) where it should end, the pointer continues by moving down 3 more lines. It doesnn’t even stop at elif . It proceeds to visit return binary_search_recursive(arr, target, low, mid - 1) one more time. According to my editor, high in my name space visualizer miraculously becomes 399 again and mid becomes 199. Only after that does the pointer go back to where I expect it to where the print statements show the desired and expected final output which says: “Element: 2743 which also can be referred to as: 2743 found at index: 99”.

While the output matches what I expect in the end, I don’t understand what my debugger is doing right after the base case is supposedly triggered. If anyone could shed some additional light onto what is happening at the end there, I am all ears.

1 Like

When a function returns, it returns to its caller.

The return mid doesn’t magically return all the way to the top level in one go. It returns to its caller, which happens to be a return statment, so on the next step it returns to its caller, etc. It works its way back up the levels, step by step, each time you single-step.