Sorting inconsistency

Hello, guys

I am playing around with an exercise where the goal is to find the most frequent character in the sentence.

For some reason when I use “sorted” function the tuples in the list that is being sorted get sorted differently when I run the code multiple times. What is going on?

Here is the code:

sentence = "This is a common interview question"
letters = []
numbers = []
for x in sentence:
    letters.append(x), numbers.append(sentence.count(x))
result = list(zip(letters, numbers))
result_modded = set(result)
answer = sorted(result_modded, key=lambda item: item[1])
# print(f"most frequent letter is: {answer[-1]}")
print(answer)


And this is the result in the terminal after 10 runs:

[('c', 1), ('h', 1), ('a', 1), ('T', 1), ('q', 1), ('v', 1), ('r', 1), ('u', 1), ('w', 1), ('m', 2), ('t', 2), ('e', 3), ('n', 3), ('s', 3), ('o', 3), (' ', 5), ('i', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('c', 1), ('v', 1), ('r', 1), ('w', 1), ('u', 1), ('T', 1), ('h', 1), ('a', 1), ('q', 1), ('m', 2), ('t', 2), ('o', 3), ('n', 3), ('e', 3), ('s', 3), ('i', 5), (' ', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('u', 1), ('h', 1), ('a', 1), ('v', 1), ('w', 1), ('r', 1), ('T', 1), ('c', 1), ('q', 1), ('m', 2), ('t', 2), ('n', 3), ('o', 3), ('e', 3), ('s', 3), ('i', 5), (' ', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('c', 1), ('w', 1), ('a', 1), ('q', 1), ('u', 1), ('r', 1), ('v', 1), ('h', 1), ('T', 1), ('m', 2), ('t', 2), ('s', 3), ('o', 3), ('e', 3), ('n', 3), (' ', 5), ('i', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('w', 1), ('v', 1), ('q', 1), ('u', 1), ('r', 1), ('h', 1), ('a', 1), ('T', 1), ('c', 1), ('t', 2), ('m', 2), ('e', 3), ('o', 3), ('n', 3), ('s', 3), (' ', 5), ('i', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('h', 1), ('r', 1), ('w', 1), ('q', 1), ('u', 1), ('c', 1), ('a', 1), ('v', 1), ('T', 1), ('t', 2), ('m', 2), ('n', 3), ('s', 3), ('o', 3), ('e', 3), ('i', 5), (' ', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('q', 1), ('v', 1), ('T', 1), ('c', 1), ('h', 1), ('u', 1), ('a', 1), ('w', 1), ('r', 1), ('m', 2), ('t', 2), ('n', 3), ('e', 3), ('o', 3), ('s', 3), (' ', 5), ('i', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('v', 1), ('c', 1), ('w', 1), ('u', 1), ('T', 1), ('q', 1), ('r', 1), ('h', 1), ('a', 1), ('m', 2), ('t', 2), ('s', 3), ('n', 3), ('o', 3), ('e', 3), (' ', 5), ('i', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('v', 1), ('w', 1), ('u', 1), ('a', 1), ('r', 1), ('c', 1), ('T', 1), ('h', 1), ('q', 1), ('t', 2), ('m', 2), ('s', 3), ('o', 3), ('e', 3), ('n', 3), (' ', 5), ('i', 5)]
PS C:\Hello World> & C:/Users/kdonc/AppData/Local/Programs/Python/Python310/python.exe "c:/Hello World/app.py"
[('h', 1), ('v', 1), ('c', 1), ('w', 1), ('q', 1), ('u', 1), ('a', 1), ('r', 1), ('T', 1), ('t', 2), ('m', 2), ('o', 3), ('s', 3), ('n', 3), ('e', 3), ('i', 5), (' ', 5)]

When list.sort() or sorted() is given items which compare equal (here because of your key function), it makes sure that they stay in the same order as the the input had them. That allows you to do things like combine sorts. But because sets are unordered, it’s preserving the semi-random order that produces.

As an aside, the way you’re doing this right now is rather inefficient - you’re re-computing the letter-number pair several times, looping over the string with count() each time. You should probably look into using dicts instead. Alternatively if you do want to keep your current algorithm and only want the largest value in the set, max() also supports a key function.

There are two issues here. (1) Sets are not ordered, meaning that there
is no predictable order in the set. So when you form the set:

result_modded = set(result)

it mixes up the order of the tuples according to the hash of the
strings, which is unpredictable. Then you sort the tuples by the
count, which preserves the order of items where the count is a tie.

So although sorted will ensure that all the tuples with count 1 come
first, then tuples with count 2, then count 3, and so on, within each
group, the order of tuples will depend on how the set happened to be
formed. So the order of:

('c', 1), ('h', 1), ('a', 1), ('q', 1), ('v', 1)

etc is unpredictable, although sorted will always keep the count 1
tuples together.

(2) Why does that order change from run to run? Because the hash of
strings changes from run to run. This is deliberate, as there are
security attacks on Python programs that attackers can do if they can
predict the hash of strings. To prevent that, Python uses a different,
unpredictable hash system for strings each time you start up the
interpreter.

The bottom line here is that what you are seeing is perfectly normal and
nothing to worry about. Although the order of the letters will move
around within each group, the groups (count 1, count 2, count 3 etc)
will always be sorted.

If you want to keep a consistent order within each group, you have to
either:

(a) remove the call to set. Remove this line:

result_modded = set(result)

and change this:

# Before
sorted(result_modded, key=lambda item: item[1])

# After
sorted(result, key=lambda item: item[1])

then the results will be sorted firstly by the count, and ties will be
sorted according to the order in the original string.

Alternatively you can:

(b) sort by both the count and the letter:

# Before
sorted(result_modded, key=lambda item: item[1])

# After
sorted(result, key=lambda item: (item[1], item[0]))

which will sort ties alphabetically.

Either way, you will get consistent results.

By the way, does space " " count as a letter?

1 Like

By the way, this is not really what the function is designed for, but
you can also do this:

import statistics
statistics.mode("This is a common interview question")
# returns 'i'

statistics.multimode("This is a common interview question")
# returns ['i', ' ']
1 Like

I did a variant with a dictionary and it worked great. I just wanted to try different things and ran into this. Thank’s.