Selection Sort algorithm

enoren5 · August 14, 2024, 4:34pm

I am learning algorithmic design. As I am tackling Bubble Sort in another thread, I am venturing to explore a similar and related algorithm: Selection Sort.

While bubble sort compares each element next to each other and swaps positions until the largest integer value is “bubbled up” or pushed to the right most position and stops when the entire list is sorted, with the selection sort, it is the smallest (minimum) element which is pushed to the front. Based on what I have learned so far in the thread linked to above about bubble sort, I have done my best to re-fashion it for selection sort, including using tuple unpacking.

Below is my selection sort algorithm so far and its output.

Please be advised that while this is not for a course at a college or university, I am pretending that I am so my ask is that for those who reply, please provide hints, tips, and references to documentation around the web instead of providing a complete algorithm for me. This way I am trying to preserve my learning experience.

arr = [5,3,4,1,2]

print(f"Original list: {arr}")


def sorting_by_selection(arr):
    for iteration in range(0,len(arr)):
        for i in range(iteration):
            element1 = arr[i]
            element2 = arr[i+1]
            if element1 < element2:
                minimum = element1
                arr[minimum], arr[i] = arr[minimum], arr[i]

    return arr

sorted_list = sorting_by_selection(arr)

print(f"Final sorted list: {sorted_list}")

$ python Selection-Sort/script1.py
Original list: [5, 3, 4, 1, 2]
Final sorted list: [5, 3, 4, 1, 2]

MRAB · August 14, 2024, 6:21pm

You have element1 = arr[i], so element1 is a value, and you have minimum = element1, so minimum is a value, but then you have arr[minimum], arr[i] = arr[minimum], arr[i], where you’re using minimum, a value, as an index.

bschubert · August 14, 2024, 7:18pm

The idea behind selection sort is that, for each element in the array, you swap the current element with the next minimum element, which is “selected” from all elements that appear after the current element.

In your code, the “current element” is tracked by the index in iteration. You’ll note that nowhere in the current code is there a line that writes to arr[iteration] —that’s probably where things are going kerflooey. In a complete selection sort implementation, you’d expect exactly one write to arr[iteration] in each iteration of the outer loop.

Something that can help with catching issues like that and the one Barry mentioned is giving your variables names that describe their specific semantic meaning in the algorithm. For example, instead of iteration, i, and minimum, you might use something like write_index, search_index, and min_index which would make it more obvious that 1) you’re missing a write to the thing called write_index and 2) you’re assigning something other than an index to a thing called min_index, 3) the inner loop is looking at elements other than the ones as min_index and search_index

enoren5 · August 18, 2024, 6:17am

This is coming in very clear. I created an index but I never referred to it. Thank you for poiting this out.

Thank you for this suggestion. I have since replaced my misnamed index variables with the ones you recommended. I have reworked my script.

Below are my next two attempts. While my algorithms don’t sort by selection as intended, I am making progress because the outputs are partially sorted although still flawed. Feedback, tips, and hints welcome.

Below each script are the expected output in my shell and the actual output.

def sorting_by_selection(arr):
    for write_index in range(0,len(arr)):
        for search_index in range(write_index):
            element1 = arr[search_index]
            element2 = arr[search_index+1]
            if element1 < element2:
                min_index = element1
                arr[min_index], arr[search_index] =  arr[search_index], arr[min_index]
            else: 
                continue
    return arr

$ python Selection-Sort/script2.py
Original list: [5, 3, 4, 1, 2]
Final sorted list: [5, 1, 4, 3, 2]

def sorting_by_selection(arr):
    for write_index in range(0,len(arr)):
        for search_index in range(write_index): #,0):
            element1 = arr[search_index]
            element2 = arr[search_index+1]
            if element1 < element2:
                arr[write_index] = element1
                element1, element2 =  element2, element1
            else: 
                continue
    return arr

$ python Selection-Sort/script3.py
Original list: [5, 3, 4, 1, 2]
Final sorted list: [5, 3, 3, 1, 1]

bschubert · August 20, 2024, 7:52pm

Making progress! Here’s some miscellaneous tips and resources to help you through the next steps.

Thinking about algorithms

Sorting algorithms work by establishing some invariant and then gradually working through the data while maintaining that invariant until the invariant becomes equivalent to the statement “the list is fully sorted”.

As you’re writing your code, think carefully about what invariant the algorithm is trying to maintain. For each line in your program, ask yourself “what is this code doing?” and “are there under conditions under which this code could break my invariant?”. If the answer to the second question is yes, make sure that a) your answer to “what is this code doing?” fits your algorithm (and not a different one) and b) that your code guards against the condition that can break your invariant.

For selection sort, the invariant that you want to maintain is ^{(hidden as spoiler - try to answer this for yourself, then click to reveal)} that the entries to the left of write_index are in their fully sorted positions.

Every time you assign to the original list (arr[abc] = xyz), make sure that you can explain to yourself why the assignment is guaranteed to maintain the invariant (think about where abc and xyz are coming from, what had to be true to reach the current state, etc.). If the assignment isn’t guaranteed to maintain the invariant, you haven’t implemented the algorithm correctly.

Debugging

You have some code, and it’s not doing what you want it to do. This is a situation you’ll find yourself in for 90%^[1] of your programming career. This is a great time to invest in learning one of the more critical skills in programming: debugging.

Good debugging is huge topic (there’s entire books written on it!). Below are a few general strategies you can try out. You can find much for information about each one by searching online.

1. Rubber Duck Debugging

Go through your program line-by-line and explain (optionally to an inanimate of your choice) what each line is doing. This can help force you to think about your code from a different perspective, and make it apparent when what the code is supposed to do and what it actually does aren’t the same thing.

2. `print` logging

Add a bunch of print statements to your code. Focus on points where key state transitions occur. You don’t want to add too many, or the output will become unreadable. Good locations include beginning/end of functions, the beginning/end of loop bodies, the branches of an if/elif/else statement, etc.

Run your code. See how the state of your program evolved during execution and if it matches your expectations. Oftentimes there will be something very obviously suspicious. Focus on solving that one issue before moving on to the next. Rinse and repeat until the output is what you expected.

Here’s one of your attempts with some debugging print statements added, along with the corresponding output. You should notice a few things that seem off right away. For example, during the first iteration, the search_index never advances through the list.

sorting_by_selection with prints

def sorting_by_selection(arr):
    print(f"initial array: {arr=}")
    for write_index in range(0,len(arr)):
        print(f"{write_index=}")
        for search_index in range(write_index):
            element1 = arr[search_index]
            element2 = arr[search_index+1]
            print(f"  {search_index=}   {element1=} {element2=}")
            if element1 < element2:
                min_index = element1
                print(f"    swapping entries at {min_index=} and {search_index=}")
                arr[min_index], arr[search_index] =  arr[search_index], arr[min_index]
                print(f"    array now {arr=}")
            else:
                continue
    return arr

output

initial array: arr=[5, 3, 4, 1, 2]
write_index=0
write_index=1
  search_index=0   element1=5 element2=3
write_index=2
  search_index=0   element1=5 element2=3
  search_index=1   element1=3 element2=4
    swapping entries at min_index=3 and search_index=1
    array now arr=[5, 1, 4, 3, 2]
write_index=3
  search_index=0   element1=5 element2=1
  search_index=1   element1=1 element2=4
    swapping entries at min_index=1 and search_index=1
    array now arr=[5, 1, 4, 3, 2]
  search_index=2   element1=4 element2=3
write_index=4
  search_index=0   element1=5 element2=1
  search_index=1   element1=1 element2=4
    swapping entries at min_index=1 and search_index=1
    array now arr=[5, 1, 4, 3, 2]
  search_index=2   element1=4 element2=3
  search_index=3   element1=3 element2=2

3. Write reusable tests

Often times, in the course of debugging, you’ll fix one issue only to introduce a new one. A helpful way to catch that is by maintaining a suite of reusable tests that you can run every time you make changes to a piece of code.

Writing a representative test suite also has the secondary benefit of forcing you to think critically about the intended behavior of your code (e.g. exactly what combinations of inputs are allowable, what should happen for bad inputs, what relevant edge cases might exist…).

Some people even recommend writing a test suite before you even start writing the actual code (so-called test driven development).

For short scripts like this. it’s easy to your own ad-hoc test code. For example:

Example 1: Custom test runner

Code:

def sorting_by_selection(arr): 
    ...

if __name__ == "__main__":
    test_cases = [
        [1, 2, 3, 4, 5],  # sorted ascending
        [5, 4, 3, 2, 1],  # sorted descending
        [5, 3, 4, 1, 2],  # unsorted
        [],               # empty
        [1],              # single entry
        [2, 1],           # two entries
        [1, 2, 1, 2, 1]   # repeat entries
    ]

    for case in test_cases:
        expected = sorted(case)
        actual = sorting_by_selection(case)
        if actual != expected:
            print(f"FAILURE: {case=} {expected=} {actual=}")
        else:
            print(f"OK: {case=}")

Output:

$ python3 ./sorting_by_selection_demo.py
FAILURE: case=[2, 1, 4, 3, 5] expected=[1, 2, 3, 4, 5] actual=[2, 1, 4, 3, 5]
FAILURE: case=[5, 4, 3, 2, 1] expected=[1, 2, 3, 4, 5] actual=[5, 4, 3, 2, 1]
FAILURE: case=[5, 1, 4, 3, 2] expected=[1, 2, 3, 4, 5] actual=[5, 1, 4, 3, 2]
OK: case=[]
OK: case=[1]
FAILURE: case=[2, 1] expected=[1, 2] actual=[2, 1]
FAILURE: case=[2, 1, 1, 2, 1] expected=[1, 1, 1, 2, 2] actual=[2, 1, 1, 2, 1]

For more complicated code with multiple independently testable units, it can be helpful to use a test framework. The standard library has doctest (demonstrated below) and unittest. There’s also 3rd party options like pytest. Personally, I use a combination of doctest and pytest in most of my projects.

Example 2: Using doctests

Code:

def sorting_by_selection(arr):
    """
    >>> sorting_by_selection([1, 2, 3, 4, 5])  # sorted ascending
    [1, 2, 3, 4, 5]
    >>> sorting_by_selection([5, 4, 3, 2, 1])  # sorted descending
    [1, 2, 3, 4, 5]
    >>> sorting_by_selection([5, 3, 4, 1, 2])  # unsorted
    [1, 2, 3, 4, 5]
    >>> sorting_by_selection([])               # empty
    []
    >>> sorting_by_selection([1])              # single entry
    [1]
    >>> sorting_by_selection([2, 1])           # two entries
    [1, 2]
    >>> sorting_by_selection([1, 2, 1, 2, 1])  # repeat entries
    [1, 1, 1, 2, 2]
    """
    ...

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Output:

$ python3 ./sorting_by_selection_demo.py
**********************************************************************
File "sorting_by_selection_demo.py", line 17, in __main__.sorting_by_selection
Failed example:
    sorting_by_selection([1, 2, 3, 4, 5])  # sorted ascending
Expected:
    [1, 2, 3, 4, 5]
Got:
    [2, 1, 4, 3, 5]
**********************************************************************
File "sorting_by_selection_demo.py", line 19, in __main__.sorting_by_selection
Failed example:
    sorting_by_selection([5, 4, 3, 2, 1])  # sorted descending
Expected:
    [1, 2, 3, 4, 5]
Got:
    [5, 4, 3, 2, 1]
**********************************************************************
File "sorting_by_selection_demo.py", line 21, in __main__.sorting_by_selection
Failed example:
    sorting_by_selection([5, 3, 4, 1, 2])  # unsorted
Expected:
    [1, 2, 3, 4, 5]
Got:
    [5, 1, 4, 3, 2]
**********************************************************************
File "sorting_by_selection_demo.py", line 27, in __main__.sorting_by_selection
Failed example:
    sorting_by_selection([2, 1])           # two entries
Expected:
    [1, 2]
Got:
    [2, 1]
**********************************************************************
File "sorting_by_selection_demo.py", line 29, in __main__.sorting_by_selection
Failed example:
    sorting_by_selection([1, 2, 1, 2, 1])  # repeat entries
Expected:
    [1, 1, 1, 2, 2]
Got:
    [2, 1, 1, 2, 1]
**********************************************************************
1 items had failures:
   5 of   7 in __main__.sorting_by_selection
***Test Failed*** 5 failures.

4. Use a debugger

Debuggers allow your to run your code line-by-line and see the exact state of every variable as your program executes. Typically, your IDE will have one already integrated. See the links below for more on what they can do and how to use one:

Specific Hints

Selection Sort Animation

Here’s an animation of how selection sort sorts a list. Note in particular how the relevant indices advance through the list and at what points swaps occur.

(Entries in pink denote the current minimum and search indices)

Hint 1

for search_index in range(write_index):

This iterates over the range [0, write_index), i.e. the left sorted sublist. You want the search index to iterate over the right unsorted sublist [write_index, end].

Hint 2

element1 = arr[i]
element2 = arr[i+1]

Think carefully about which elements need to be compared inside the inner loop. They generally won’t be adjacent elements, like in bubble sort.

the other 10% is spent drinking your preferred caffeinated beverage and removing the cat from your keyboard ↩︎

enoren5 · August 31, 2024, 10:06am

@bschubert! Thank you for your detailed and throughful reply. I see you went to great lengths to support my effort to learn how to write a selection sort algorithm. I really appreciate the time and care you put into your reply. I wish I could have responded sooner. I apologize for the delay. Anyways, I am back!

All sorting algorithms use invariants. As you pointed out, invariants are usually the marker which designate the sorted elements to the left and remaining unsorted items to its right. In my previous code snippet, the name of my invariant was write_index but a more appropriate name might be min_index as you actually originally suggested. As you can see in my next attempt at this algorithm below, I’ve renamed the invariant to: smallest.

I’ve used pytest extensively in years previous on the PyBites and exercism.or platforms. They provide the unit tests to check whether the student’s Python script runs as expected. If there are 15 assertions in a unit test and I spend hours wrangling in my Python interpreter, when I finally reach a point where all the unit tests light up in green - -it’s a head rush!

While writing unit tests are best practices, for now I am exploring your other recommendations - - using debugging, print logging, and rubber ducky programming.

Here is my latest attempt with a discussion afterwards about how I understand the code should work vs how it actually works:

arr = [5,3,4,1,2]
def sorting_by_selection(arr):
    for i in range(len(arr)-1):
        smallest = arr[i]
        for j in range(i+1, len(arr)):
            if arr[j] < smallest:
                arr[smallest], arr[j] = arr[j], arr[smallest]
        arr[i] = arr[smallest]
    return arr
sorted_list = sorting_by_selection(arr)

Here is the same code snippet but with inline print statements:

arr = [5,3,4,1,2]
print(f"Original list: {arr}")
def sorting_by_selection(arr):
    print(f"Initial array: {arr=}")
    for i in range(len(arr)-1):
        print(f"    Outer loop iteration: {i=}")
        smallest = arr[i]
        print(f"    Invariant: {smallest=}")        
        for j in range(i+1, len(arr)):
            print(f"        Inner loop iteration: {j=}, {smallest=}, {arr=}")
            if arr[j] < smallest:
                print(f'        Swapping entries: {arr[j]=}, {smallest=}')
                arr[smallest], arr[j] = arr[j], arr[smallest]
        arr[i] = arr[smallest]
                print(f'        New: {arr=}')
    return arr
sorted_list = sorting_by_selection(arr)
print(f"Final sorted list: {sorted_list}")

Here is the traceback:

$ python Selection-Sort/script5.py
Original list: [5, 3, 4, 1, 2]
Initial array: arr=[5, 3, 4, 1, 2]
    Outer loop iteration: i=0
    Invariant: smallest=5
        Inner loop iteration: j=1, smallest=5, arr=[5, 3, 4, 1, 2]
        Swapping entries: arr[j]=3, smallest=5
Traceback (most recent call last):
  File "/home/<user>/dev/projects/python/2018-and-2020/algos/Selection-Sort/script5.py", line 20, in <module>
    sorted_list = sorting_by_selection(arr)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<user>/dev/projects/python/2018-and-2020/algos/Selection-Sort/script5.py", line 16, in sorting_by_selection
    arr[smallest], arr[j] = arr[j], arr[smallest]
                                    ~~~^^^^^^^^^^
IndexError: list index out of range

There are quite a few observations to be had. Here is me rubber ducky’ing my way through the algorithm above:

The invariant is established up front in the outer loop as smallest. Since it is the first iteration, it is in the left most index position which is the integer of 5 which is the largest value. The invariant has to start somewhere and in this context, it is the left most position which just happens to be the largest integer in the array. This means that it will be the first to be swapped.
In the iteration of the next inner loop, that invariant is checked against the second index position of the array which is the integer value of 3.
At the next line is a condition which compares these two values. Since the case determins that 3 is indeed less than 5, the conditional is triggered and proceeds to the next line and then is supposed to complete the swap via tuple unpacking.
At this point I am expecting Python to reformat the array from this: [5,3,4,1,2] to this: [3,5,4,1,2].
Instead the traceback throws an IndexError telling me that the “list index is out of range”. Not sure why.

Using my debugger I step through each line above which all seems to confirm my rubber ducky explanation.

UPDATE: Having spent a considerable amount of time writing the above analysis, I am now noticing that the ‘swapping’ of the first two list items because “5 is larger than 3” - - such behaviour is for a bubble sort algorithm which is not the same thing as selection sort. So the analysis above is (partially) wrong / misguided.

Arriving at this updated realization is proof that rubber ducky programming is helping!

I think I see more clearly when @bschubert says that the invariant needs to remain “guaranteed” as it proceeds to check every list item in a selection sort. My current script above doesn’t do that yet.

Having said all that, this is what I need to do next to complete my selection sort algorithm:

First declare arr[i] to be the smallest element (invariant) seen in the subarray so far (which I have done), and then go through the rest of the subarray (which I tried to fashion with the second loop) updating the index of the smallest element every time I find an element less than the current smallest (I really think I am still kind of already doing most of that as-is).

I am on the right track although I am still off slightly. Any additional hints / tips / insight that you folks can provide would be greatly appreciated. Thank you.