Basic binary search algorithm - wrangling with while loops and their conditions (x2)

Up to this point I have been playing with a list of ~10 items. And by ‘playing’ I mean I have been substituting different search values and then watching each variable change using my debugger as the pointer traverses each line, one at time. In the name space box in my code editor, every mid value kept decrementing by one. So I set out to prove you wrong, @MRAB, using the technique below and I ended up proving you right. Ha! :slightly_smiling_face:

For my future reference, I’ve “rubber duckied” myself through the solution at the end of this post. While I understand most of it now, there is one minor quirk which I struggle to explain. Anyways, all of that is at below my code snippets. Have a read. Thanks.

To better model the issue, I began using a much longer list as a test case. I generated a list of 400 random integers (varying within the range of 100 to 10,000), sorted it, and printed it in my shell like so:

import random

arr = [random.randint(100, 10000) for _ in range(400)]
arr.sort()
print(arr)

Next I copied the list into my script like this:

def binary_search_recursive(arr, target, low, high):
    # Base case: element not found
    if low > high:
        return -1  

    mid = (low + high) // 2

    if arr[mid] == target:
        # Base case: element found at mid index
        return mid  
    elif arr[mid] > target:
        # Search left half
        return binary_search_recursive(arr, target, low, mid - 1)  
    else:
        # Search right half
        return binary_search_recursive(arr, target, mid + 1, high)  

arr = [184, 193, 204, 211, 213, 222, 231, 250, 272, 357, 448, 541, 579, 608, 620, 630, 641, 674, 677, 714, 728, 746, 765, 781, 836, 846, 868, 871, 911, 985, 1074, 1096, 1124, 1133, 1142, 1176, 1178, 1204, 1208, 1239, 1245, 1246, 1304, 1314, 1356, 1372, 1377, 1424, 1426, 1452, 1454, 1480, 1494, 1548, 1561, 1584, 1668, 1687, 1689, 1691, 1745, 1750, 1756, 1812, 1819, 1826, 1900, 1912, 1940, 1963, 1971, 2035, 2070, 2080, 2100, 2160, 2291, 2301, 2301, 2311, 2322, 2342, 2348, 2401, 2409, 2419, 2419, 2423, 2523, 2567, 2571, 2582, 2614, 2620, 2640, 2651, 2652, 2705, 2709, 2743, 2747, 2766, 2782, 2793, 2797, 2831, 2867, 2870, 2896, 2928, 2959, 2979, 3031, 3038, 3040, 3064, 3101, 3103, 3117, 3170, 3228, 3234, 3297, 3322, 3327, 3340, 3372, 3398, 3424, 3427, 3458, 3464, 3481, 3496, 3501, 3575, 3581, 3616, 3617, 3631, 3672, 3693, 3693, 3706, 3707, 3724, 3738, 3806, 3903, 3910, 3934, 3953, 3968, 4017, 4033, 4117, 4151, 4191, 4199, 4202, 4204, 4211, 4256, 4325, 4378, 4395, 4422, 4438, 4443, 4481, 4483, 4503, 4606, 4619, 4672, 4691, 4712, 4724, 4741, 4743, 4752, 4808, 4889, 4917, 4924, 4948, 4964, 4974, 5060, 5061, 5074, 5085, 5088, 5089, 5119, 5123, 5131, 5191, 5215, 5221, 5263, 5465, 5568, 5599, 5636, 5658, 5673, 5700, 5723, 5742, 5809, 5811, 5829, 5835, 5858, 5865, 5899, 5910, 5916, 5971, 6037, 6052, 6087, 6089, 6108, 6121, 6136, 6213, 6241, 6252, 6258, 6290, 6338, 6346, 6382, 6401, 6415, 6443, 6451, 6482, 6498, 6531, 6576, 6599, 6617, 6627, 6646, 6676, 6688, 6713, 6724, 6728, 6742, 6777, 6785, 6805, 6874, 6891, 6931, 6965, 6994, 7011, 7018, 7106, 7121, 7127, 7142, 7167, 7179, 7180, 7227, 7285, 7308, 7320, 7323, 7355, 7358, 7361, 7367, 7375, 7390, 7415, 7416, 7464, 7478, 7502, 7513, 7516, 7534, 7543, 7592, 7626, 7636, 7689, 7736, 7767, 7767, 7805, 7812, 7871, 7882, 7904, 7904, 7940, 7944, 7946, 7946, 7953, 8042, 8043, 8068, 8109, 8121, 8149, 8247, 8249, 8253, 8255, 8273, 8280, 8318, 8321, 8355, 8398, 8433, 8443, 8462, 8483, 8484, 8578, 8590, 8611, 8620, 8620, 8630, 8645, 8646, 8662, 8667, 8692, 8703, 8714, 8719, 8780, 8789, 8801, 8817, 8840, 8884, 8899, 8909, 8913, 8918, 8971, 8979, 9007, 9016, 9030, 9104, 9113, 9124, 9126, 9174, 9179, 9187, 9187, 9238, 9250, 9250, 9257, 9278, 9292, 9297, 9300, 9406, 9407, 9410, 9473, 9479, 9512, 9535, 9595, 9599, 9638, 9646, 9676, 9697, 9779, 9779, 9800, 9801, 9815, 9857, 9859, 9877, 9925, 9935, 9940, 9978, 9990]

print(f"The middle of a 400 item long list is the 199th's position which is: {arr[199]}")
print(f"The 99th item (which is the first quarter's exact position) is: {arr[99]}")
target = 2743

result = binary_search_recursive(arr, target, 0, len(arr) - 1)

if result != -1:
    print(f"Element: {target} which also can be referred to as: {arr[result]} found at index: {result}")
else:
    print("Element not found.")

What I am doing above is trying to design a scenario where a binary search algorithm reaches the base case easily on the second pass. This means if there is a list of 400 items, the middle item is 199 and the next saught after item would be at 99 or 299 positions. I figured if my script had to call the recursive function 100 times to get from 199 to 99 for example, then that would show it became a linear search. But to my suprise the function only executed twice. Therefore it is a true binary search. @MRAB is right.

Here is why.

When the function is first executed, when searching for 2743 for example, the high position is initialized as 399, the low is 0, and the mid is first defined as 199. So far so good. Since the resulting 199 item is 5221 and is not equal to the target of 2743, the pointer continues its traversal down to the next conditional where 5221 is greater than 2743. What happens next is my missing puzzle piece: The array, the target, the low are all the same and are passed into the next recursive call, but the new high value is based on the old mid - 1. While high started out as 399, as the next recursive function call is executed, the new high value becomes mid which is 199 - 1 (so 198). Python then continues to declare mid again with low being the same 0 but since new high is 198, the floor division of 2 turns the new mid into 99. Then the base case is reached because position 198 is 2743 which is the same as the target. They are equal. Algorihtm complete.

That addresses pretty much all of my prior confusion. Binary search with recursion makes sense I guess.

As far as I am concerned, that should be the end but then what happens next in my debugger baffles me: After visiting line 10 (return mid) where it should end, the pointer continues by moving down 3 more lines. It doesnn’t even stop at elif . It proceeds to visit return binary_search_recursive(arr, target, low, mid - 1) one more time. According to my editor, high in my name space visualizer miraculously becomes 399 again and mid becomes 199. Only after that does the pointer go back to where I expect it to where the print statements show the desired and expected final output which says: “Element: 2743 which also can be referred to as: 2743 found at index: 99”.

While the output matches what I expect in the end, I don’t understand what my debugger is doing right after the base case is supposedly triggered. If anyone could shed some additional light onto what is happening at the end there, I am all ears.