Found Bug in "Sorted" Built in Function of Python

Hi Everyone,

I was using Python built-in function Sorted and found a bug where it doesn’t sort the percentages correctly.
Check it out.

Regards:
Ahsan Aftab (ML Engineer)

This isn’t a bug. Your values there are not numbers, they’re strings (text) that just happen to contain digit characters. When comparing strings to each other, they’re ordered lexicographically the same way you’d order words alphabetically. For your first two, ("9%" and "83%") we first compare "9" and "8", then "%" and "3", etc. 9 comes after 8 in Unicode’s character order, so it goes first.

What you should do is convert all your strings here first into numbers - I’d suggest stripping the % sign, converting to a float then dividing by 100 so you get values between 0 and 1 (0.09, 0.83, 0.25). That way it has all the correct behaviour when adding, comparing, multiplying etc. You’d switch to percentage form only when you need to display it to the user. Note that there’s a % format specifier (f"value={val:%}) that automatically adds the % and multiplies by 100 to display correctly.

2 Likes

There’s the natsort library on PyPI which sorts strings the way a human would expect. I can’t remember if it handles percentages but I don’t see why it wouldn’t.

1 Like

I have solved this issue by appending “0” in the start for numbers less than 10 and it worked.
for int(percentage) < 10:
percentage = “0” + percentage

If you try to convert string 83% to int you will get ValueError. If you fix that for-loop as it’s written will raise SyntaxError:

>>> int('83%')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '83%'
>>> num = '83'
>>> for int(num) < 10:
  File "<stdin>", line 1
    for int(num) < 10:
                     ^
SyntaxError: invalid syntax

Python sorted supports keyfunctions, so simple lambda to strip ‘%’ sign and convert to int or float should be fine:

>>> percentages = ['9%', '83%', '25%']
>>> sorted(percentages, key=lambda s: float(s.strip('%')))
['9%', '25%', '83%']
1 Like

What you appear to be looking for is called natural sorting, This stackoverflow question contains some options for this, including a package on PyPI.