What is the purpose of this recursive function?

Hi all,

I’ve been following a Python course from thegreatcourses.com and towards the middle of the 21st lecture, an example of a recursive function is given, and I cannot for the life of me work out why it calls itself or even how it’s supposed to achieve what it sets out to do.

The course is paid so I can’t link the video, but I’ll do my best to describe what the function is supposed to do, and then I’ll post the code.

Basically the function takes a list as it’s only parameter and then divides it in two, storing the leading half of it’s elements in one variable and the rest in another. It then recursively calls itself two times, passing the first half of the list to the first call, and the second half of the list to the second call. Finally it calls a separately defined merge() function, passing in each half of the list.

merge() merges the two halves of the list (which at this point are supposedly sorted, although I can’t quite work out how this is meant to have happened). There’s an accompanying materials pack with the course that contains a list of any errors that the producers noticed after the video was made, but they don’t mention anything about it in there. Anyway, here is the code.

def mergeSort(l):

    n = len(l)
    # if the input list has only one element, or is empty, there is no need to sort, so return
    if n <= 1:
        return

    # divide the input list into two lists
    l1 = l[:n//2]
    l2 = [n//2:]

    # recursive calls, passing in the two halves of the divided list
    mergeSort(l1)
    mergeSort(l2)

    # merge the elements of the two halves of the divided list into one list
    merge(l1, l2, l)

def merge(a, b, c):

'''merge list a and list b into list c'''

    i = 0 # counter to track the index of the element in the merged list c who's value is
          # being calculated for a given iteration of the while loop
    j = 0 # counter to track the index of the element in a, being considered for c, in a given
          # iteration of the loop
    k = 0 # counter to track the index of the element in b, being considered for c, in a given
          # iteration of the loop


    # while at least one of j or k is a valid index into it's respective list, run the loop
    while j < len(a) or k < len(b):

        if j < len(a):

            if k < len(b):
                # both indices are valid - we haven't consumed all the element of either.
                # Compare a[j] and b[k], adding the element with the smallest value to
                # the final list
                if a[j] < b[k]:
                    c[i] = a[j]
                    j += 1
                else:
                    c[i] = b[k]
                    k += 1

            else:
                # we've consumed all elements of b, add the next element of a to the final list
                c[i] = a[j])
                j += 1

        else:
            # we've consumed all elements of a, add the next element of b to the final list
            c[i] = b[k]
            k += 1

    i += 1

I know merge() isn’t very pythonic, I think the reason for that is that the particular section in question of the course addresses algorithms, specifically writing them out using pseudo-code and then implementing them. It’s not stated explicitly, but I get the impression that during this section of the course the focus shifts slightly from pure Python to teaching a general computer science concept, using Python as a vehicle to do so.

Anyway, I fully understand the merge() routine, what I don’t understand is how the lists given to merge() become sorted, I can only imagine it’s got something to do with the mutability of l (the initial list passed into the first call to mergeSort()). Either way the course material doesn’t go very far out of it’s way to explain what’s actually going on here.

I’ve copied the example from the course perfectly, save for changing the name of a few variables to something that I felt was easier to follow, but if anyone wants to see a screenshot of the code in the lecture video I’m happy to provide.

Any help is appreciated as always.

Recursive merge sort! Classic divide and conquer algorithm. I would agree with you that this is more about algorithms in general than Python in particular. Universities might teach this in a second year general computer science or algorithms course. I also agree with your impression of the merge routine - that is a version of the merge function that could be written in any imperative programming language [1], which as you noted tends to be how algorithms are taught (implementing a “pseudocode” version) since it keeps the focus on the structure of the algorithm instead of the details of a particular language.

Personally I understand it best either by seeing a diagram or other visual representation, or by working through small examples by hand. The diagram here is the kind of thing: Merge sort algorithm overview (article) | Khan Academy

Have you encountered other recursive functions? If not, there are more intuitive ones that might help the concept “click”, but the key thing with recursion is always the “base case” - what your data looks like when the function stops calling itself.


  1. Roughly speaking, languages that work primarily by writing out the steps to execute and running them in order to change variables. More formally: Imperative programming - Wikipedia ↩︎

2 Likes

Do you understand why, if you take the elements of a list, split them into two other lists, and sort each, then you have two sorted lists?

Do you understand why, if you have those two sorted lists and then merge their contents together by the algorithm shown, the result will also be sorted?

Do you understand why, if those two sorted lists contain (between them) exactly the elements that you started out with, then the resulting sorted list will also have exactly the elements that you started with?

Therefore, do you see why “split a list into two halves, sort each half, and merge the halves together”, is a reasonable way to sort a larger list (as long as “split into two halves” actually made smaller lists)?

Do you understand why the merging step would not care how the sorted halves became sorted?

Then, if we are implementing the logic “sort a list by splitting it into two halves, sorting each half and then merging the results”, why does adding the word “recursively” before “sorting each” cause a conceptual problem?

Do you mean, how merge is sorting the lists? It isn’t. a and b get all of their elements put into c, in a sorted order. The original contents of c are irrelevant to this step of the process. The mergeSort shown here makes new lists from the original, and then uses merge to put the contents of those new lists back into the original storage space (since it is already of the correct length, and so that the sorting appears “in-place” to the user).

Or do you mean, why are the split half-lists sorted already, before merge is called? That’s because we made two, completely independent half-lists (by slicing), and sorted each of them (by using the mergeSort function).

Well, yes, the function is assigning to elements of that list. What exactly is the conceptual difficulty here? Were you expecting that a list passed to a function can’t be modified this way? It can, because it is the same list object, just like how you can write b = [] and then a = b and then a.append(1), and b is changed.

1 Like

Yep, this is classic and pure merge sort. Python uses a variant of this every time you call the sorted() function or a list’s .sort() method. So what you’re seeing here is exactly the sort of thing that happens inside Python every time you sort things!

While it’s definitely not essential to know this sort of thing, I do find it very helpful to get an idea of the work that’s happening behind the scenes.

1 Like

Thanks for the speedy reply! I actually laughed when I read this because I’d specifically deliberated whether to use the terms “recursive merge sort” and “divide and conquer” but was unsure if these were general terms or just buzzwords that the course writers had bandied around.

The article you linked was very helpful, thank you. One thing that I did get caught up on (which I think I’ve now managed to mentally process) is the flow of execution when the function calls itself recursively - something that doesn’t come across in diagrams of how the algorithm works.

According to the way it’s coded in my example, if one pictures the recursion as an inverted tree, I believe that on each level the flow of execution would prefer the left branch, Until there came a point it began to step back up the tree (symbolic of the lowest level mergeSort() invocations returning) at which point it would begin to take the right-hand branches, while still favouring the left-hand branches it encountered.

I’m not sure if that’s a good way of thinking about it but it helps me visualize how the recursion takes place, much like your method of working through small examples by hand.

The key thing I was missing, which I now understand, is that on each level of recursion the order of the elements in each of the two sub-arrays is not the same after the calls to mergeSort() as they were before.

As an example, if the lowest level of recursion (i.e the level on which the base case occurs) is level 0. The two sub-arrays on level one have a length of one, and the array a length of two. The sub arrays a and b can be merged into the array c as they are naturally in order owing to the fact they only contain a single element each.

When the mergeSort() invocations on level two return, the sub-arrays that were passed to them (each of which was a whole array in the context of level one) have been sorted, meaning they in turn can be merged into the array for level two. This is repeated until execution returns to the top level. At this point the two sub-arrays of the original array have been sorted and can be merged into the original.

I’ve used one or two for trivial stuff, and there was one I recall that I think was something to do with implementing a tkinter progress bar. I remember it because I had to work my smooth brain so hard that I felt like I could have moved a bag of flour using thought alone (in reality it was likely not hard).

The problem for me is that I don’t really see it working like that. I could understand if there were two distinct operations, sort then merge, but it seems to me that the merge is the sort.

The sorting is a by-product of the order in which all the merges occur. For example, If I extracted the merge() function from the code in my OP and passed it two unsorted lists plus a third list to merge into, the resulting list wouldn’t be sorted.

But by breaking an array down into tiny sub-arrays of two elements in length, merging their elements back into them in order of value and then merging each sub-array, with another sub, into a larger array - using the same method - the merge() function ends up only ever working with sorted data even though it kind of was never explicitly sorted.

When looking at this for the first time my only thought was “but where is the sort?!” As in my limited experience a sort is a bit more… conspicuous? I don’t really know, but there didn’t appear to be anything that resembled a sort in the code, to my eyes anyway.

This was basically my thought yes, but again I feel like there is some nuance to the manner in which the sub-arrays are sorted. There is nothing in the code for mergeSort() or merge() that suggests any sorting is taking place. One has to consider the affects of the recursion and follow it to it’s logical conclusion to understand how the data gets sorted, which of course often isn’t the case when a learner encounters the code for the first time.

1 Like

There is, but it’s subtle:

# Compare a[j] and b[k], adding the element with the smallest value to
# the final list
if a[j] < b[k]:

The input was two sorted lists. They are merged by iterating over both and taking the smaller element at each step (until one list is empty). So the result is a sorted list.

1 Like

Maybe let’s try it this way.

Suppose I have a deck of cards - let’s say I write the numbers 1 through 52 on them, so we don’t have to argue about the proper sorting order. I shuffle the deck and give you half, then I sort mine in ascending order (according to the numbers I wrote on them), by any method I like. And I get you to sort the half I gave you, by any method you like, but using the same ascending order.

Now I take the pile back from you. I have two piles of cards, sorted in ascending order - so the smallest card from each pile is on top.

Now I’m going to “merge” the piles. My rule is this: I’m going to look at the top card of each pile, and whichever one is smallest, will be at the “top” of my final result. I’ll take that away, and then look at the top card of each pile again, and choose the second card that way. (Let’s just gloss over the fact that it’s tricky to put each new card at the bottom of the input pile with actual cards; data structures in a computer program don’t care about things like that.)

When I finish this process, I end up with a completely sorted deck. Are you surprised? Why?

I know for a fact that, when I chose the first card, I chose the correct card to put at the top of the deck. Why? Because when I shuffled and split the cards, it had to go into one of those piles; and no matter whether it went in your pile or my pile, we sorted our piles, so that card went to the top of whichever pile it was in. So, when I looked at the top card of each pile, it had to have been one of those two cards. And since I looked at and compared those two cards, I could choose the right one.

Then, after I take that card away, the remainder of that pile is still sorted, and the other pile is still sorted, so the same logic applies: whichever is the lowest-numbered out of the remaining cards, has to be at the top of one of those piles. So I only need to look at the top of each pile again, choose the next card… rinse and repeat. If one pile runs out early, I can take a shortcut and just add the other pile to the end; it’s already dealt with (pun intended).


Recursion is not special. It is just calling a function to do a smaller piece of the work, that happens to be the same one you’re writing. In the example above, the “any method you like” that you could have chosen to sort your cards (as long as I gave you at least two) could have been… to split it in halves, give half to a friend, sort yours some other way, etc. You could just as easily have split that half again by the same rule, and your friends could have too, except for the ones who only ended up with one card (who would just give it back).

The “sort” happens during the merge phase, where you take two candidates and select which one goes first. Since the two sub-arrays are sorted at this point, you know for sure that there are only two possible candidates [1] and one of those must be the next lowest value in the array. So sorting simply means picking from those.

The best way to find the place where sorting happens is to look for the comparison. In bubble sort, there’s a “compare and swap” behaviour. In selection sort, there’s “compare, keep or don’t keep”. In quick sort, the partitioning phase involves “compare to the pivot, then place in the upper or lower half”. Those are the places where the sorting happens.


  1. note that there are variants of merge sort that could have more than two sub-arrays, which means more candidates, but the same concept applies ↩︎

2 Likes

This is it exactly! With many recursive functions, you’ll see the “work” happening on the way back “up” the tree of recursive calls. (It’s not always the case but it can be a good intuition!) And thinking of the recursion as a tree like you described is a very common and (imo) helpful way of thinking about it.

I very much agree with this as well. There are some algorithms that sort without directly comparing pairs of values, but most of the more common algorithms work using comparisons, so this is a good rule of thumb!

I will tell you the ones that really made it click for me - different for everyone probably, so ymmv. :slightly_smiling_face:

I found the recursive algorithm for finding the length of a list to be something simple enough that I could keep it in my mind while still being a practical recursive function. If you want to try to figure it out without looking it up, it helps to have these parts:

def is_empty(alist):
    return alist == []

def first(alist):
    return alist[0]

def rest(alist):
    return alist[1:]

“first” gets the first item in a list of there is one, while “rest” gets everything in a list after the first item. You can make a function that uses recursion to calculate the length of a list with just those pieces. It’s not the most Pythonic way of doing this task; in some other languages recursion is much more heavily used and it’s more “idiomatic”. And after all, you could just use the len function :wink:

After list length, a nice next level is reversing a list with recursion. I bit more complicated, but you construct the solution in basically the same way! For that, the only other helper piece you need is to be able to add elements to a list, which you can do with just the + operator.

Those are two “linear” recursive algorithms that I find simple but beautiful. When you see them written down it seems so obvious in hindsight, so I’d guess it’s worth trying to figure them out on your own, but depending on how much you want to learn about this (it’s more theoretical than practical!) you can always just look them up too. :slightly_smiling_face:

1 Like

I don’t think I’m making myself clear, I understand how the merge works, and I understand that merging two sorted arrays will yield a larger sorted array. What I am saying is that it’s not immediately obvious how the sub-arrays passed to merge() get sorted in the first place.

The best way to explain what I mean is that the success of the entire algorithm depends upon breaking the original array down into sub-arrays with a length of two, merging their elements back into them and then working back up the recursion tree from there, merging each sub into the main array on each level.

It would not be guaranteed to work if, for example, the base case was changed to:

n = len(l)

if n <=2:
    return

If that were the case and a merge() on the lowest level (on which a merge occurs) was for arguments sake passed merge([2,1], [3,4], [2,1,3,4]) then the result would still be [2,1,3,4]. The sort mechanism is now broken because, normally, each subsequent merge() relies on the fact that all the calls to merge() that precede it were given sorted data as their input, in order for it to produce sorted data itself.

The only way to guarantee that is by having each of the lowest level merge()s merge two lists of one element in length, as those lists are inherently sorted.

All that is to say that, in my opinion at least, the mechanism by which the algorithm sorts the array is a little bit opaque.

All I’d say is that the fact one has to one has to ruminate on the dynamics in play on the lowest levels of the recursion tree to understand how the algorithm works as a whole, it is something, if not special. It’s not something I’ve encountered writing a plain old non-recursive function.

Your cards analogy is very appropriate, but I feel like it’s more of an illustration of how ‘merge()’ functions. Would you not say it’s also important to consider how the data given to merge() (analogised by the two piles of cards) comes to be sorted in the first place?

Yes. It does indeed depend on going all the way down. If you like, you could replace the “merge each half” step with a different algorithm, and the merging would still be fine; the only reason you don’t see anything like that is that the base case is trivially sorted (a single element cannot be out of order).

1 Like

They get sorted in the first place because they were broken down into smaller arrays, which were themselves sorted and then merged.

Yes. So you do, in fact, understand it perfectly well. I maintain, however, that’s really not as complicated as you make it out to be.

Yes. That’s because there exist length-2 lists that are not sorted, therefore failing to sort them would mean passing unsorted data to merge. The recursion would no longer split this list into two length-1 lists and merge them together, therefore they wouldn’t be sorted.

No. merge only relies on the fact that its inputs are sorted. It doesn’t care how that happened. It especially doesn’t care about any previous calls to merge. For example, instead of calling itself recursively, you can have mergeSort invoke the standard library sorting routine on each half, and then call merge. This is also a valid (although pointless, since you could just use the standard library sort on the whole thing directly) sorting algorithm.

They may be sorted by any mechanism before merge is called. Recursion is only one possible option. But, yes, “length-1 lists are inherently sorted” is the reason why the recursive splitting process is able to feed sorted data to merge consistently. Recursion needs a base case.

But that’s not anything to with understanding recursion. It’s a matter of understanding the specific problem being solved.

The general form of recursion involves reaching a base case that can’t be solved by breaking the problem down any further, because it is already “atomic”, yes. But sometimes there is some hard-coded work to do at this point, still. For example, if we recursively modify a tree, then recursion allows us to find each node, but the algorithm still has to include the code for modifying a node.

Because that’s what you specifically appeared to be wondering about at the point when I offered it.

No, because understanding this is completely independent from explaining why the merge step works.

Once you understand the merge, you can understand the overall algorithm, but if you don’t understand the merge then that has to be addressed first, and separately.

That said, after that there is barely anything to understand. As I put it, recursion is not special. If we can sort 52 cards by splitting them into two piles of 26, sorting each, and merging them; and given that merging the sorted piles doesn’t care about how they were sorted[1] ; then obviously we can choose “do the split-sort-merge thing again” as the way that we sort the smaller piles.

The only problem comes up when the pile is too small to split any more, and at that point, “a pile of one card is already sorted” is the only interesting thing to realize. But even if that were somehow not true, it wouldn’t matter - we would just say “if we have a pile of one card, then we’ll sort it with the special one-card-pile sorting algorithm; otherwise we’ll do the split-sort-merge thing”. And in fact, that is exactly what we do. It’s just that “the special one-card-pile sorting algorithm” consists of “just hand the pile back”.

And if we look at it in that framework, then obviously changing the rule to “if we have a pile of two cards, we’ll just hand that back as well” will break the algorithm, because we know that the two cards could be the wrong way around, and that the merge step won’t work properly in that case. We don’t have to think through all the steps or draw a “recursion tree” or talk about “the stack” or anything like that. We only need to reason about the pre-conditions of merge (the inputs need to be sorted) and the post-conditions of splitting a list (each half is necessarily sorted if it’s one card long, and might not be sorted otherwise).


  1. which obviously it doesn’t, because the cards don’t remember anything about that experience, they just… exist in a particular order ↩︎

More practically, it is completely valid - and often correct - to sort the smallest sublists (anything below some predefined threshold) using a different algorithm, such as insertion or selection sort. Those algorithms have bad asymptotic behaviour but can be more efficient than merge sort with small data sets. Python’s own sorting algorithm (Timsort) does this, flipping to insertion sort for anything less than (I think) 64 elements.

4 Likes

Lots of things seem simple after you understand them :wink:

1 Like

Anyone else notice that line 10 should have been

l2 = l[n//2:]

I’m also going to point out the obvious, never never never use “l” as a variable name.

I mean this can be true or not true depending on how you frame it. Taken in isolation, a single call to merge() doesn’t care how it’s inputs were sorted, it just requires that they are.

But, in the broader context of the algorithm, inputs for each call to merge() are the accumulative result of merge() calls that occur below them in the recursion tree.

for example, the call to mergeSort() that is the root of the tree contains the merge() that should yield the final, sorted array. But it’s inputs are the product of lower calls to merge() and will only be sorted if all those calls were given sorted data themselves. If even one was given unsorted data, this will be ‘passed back up’ the tree, poisoning the final result.

So in that sense, I’d say that calls to merge() do rely on those that have come before them. You could argue I’m quibbling over semantics, but I think looking at it that way is important to understanding the system as a whole.

Well spotted, I’m sorry I can’t edit it now (or at least I can’t work out how to).

Out of interest, why should a variable never be named l? I suppose the obvious answer is that it is too easily confused with other characters when working in certain fonts?

‘I’ and ‘l’ are nearly identical on my monitor here; the former is a capital ‘eye’ and the latter a lower case ‘ell’.

Well, first of all, the obvious. Lower case L is too easily mistaken for a one. Depending on the font, they could be indistinguishable. Secondly, the single letter is not descriptive. I try to avoid single letter names except in very specific cases. For example, when the scope is very narrow I may use x and y for coordinates, and w and h for width and height. If the scope is, for example, a small function and the usage is obvious. I would likely still add a comment like “calculate blah using the cursor coordinates and the display width and height”. Single variable names like i (for indexing a collection), again, where the scope is limited to a small loop, are also acceptable.

How it looks on mine:

And PEP 8 talks about it:

Names to Avoid

Never use the characters ‘l’ (lowercase letter el), ‘O’ (uppercase letter oh), or ‘I’ (uppercase letter eye) as single character variable names.

In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use ‘l’, use ‘L’ instead