Sorting a list options

I’ve got the following list which i would like to know how to sort properly…

test = ['chip>>X.1.A1.A2', 'chippy>>X.1.B2.B2','cheese>>X.10','califlower>>X.12' , 'cabbage>>X.2' , 'cake>>X.1.B1.A2']
test.sort(key = lambda x: x.split('>>')[1]) 
print (test)

output…

[‘cheese>>X.10’, ‘califlower>>X.12’, ‘cabbage>>X.2’, ‘chip>>X.1.A1.A2’, ‘cake>>X.1.B1.A2’, ‘chippy>>X1.B2.B2’]

Desired output

chip>>X.1.A1.A2
cake>>X.1.B1.A2
chippy>>X.1.B2.B2
cabbage>>X.2
cheese>>X.10
califlower>>X.12

So basically i want the alphanumeric values in descending order and I can’t convert to an integer. Are there any straightforward methods to achieve this ?

Your description does not specify what exactly you mean by “the alphanumeric values” and how you want to sort them.

You mean:

  • the characters between >> and first . like X1 and X?
  • from >> to the end of the line with . removed?
  • something else?

The alphanumeric characters” from the first string could be for example:

  • chipX1A1A2
  • X1A1A2
  • X1
  • ['X1', 'A1', 'A2']
  • and many more possibilities

What do you mean by: “I can’t convert to an integer”?

Are you going to sort the digits as ordinary characters or as numbers? Etc.

@vbrozik

Hi I thought the desired output illustrated what I was looking for.

I split the list elements by >> so I am aiming to sort by the characters in the second element and not splitting them up. Alphanumeric reference is to strings like X.1.A.1 etc perhaps it is not a good reference.

Currently X.10 gets sorted before X.1.A.1 etc

I would expect it to be

X.1.xxxx
X.2.xxxx
X.10

So not sure why X.10 is before X.1

Not at all. :slight_smile:

You seem to overlook important details. Except X.10 there is no sequence X.1 in your strings but there is X1. The difference is very important.

You are sorting the strings as strings so they are sorted character by character. Try this:

num_list = ['0', '5', '10']
print(sorted(num_list))

I guess this result is not what you want:

['0', '10', '5']

If you want to sort numbers of various lengths (not say variably combined with other characters like .), you have to convert the numbers to a numeric format (or a string left-padded with 0s) for sorting.
Let’s try it:

num_list = ['0', '5', '10']
print(sorted(num_list, key=int))
['0', '5', '10']

Your task is a little bit more complicated because you seem to want to sort by multiple parts. Some are strings some are numbers. You will need to split the string to the parts (maybe remove insignificant parts like .) and convert the numbers… sort is able to sort by multiple values in list or tuple. You got the idea…

Maybe take a peek at natsort

2 Likes

Thanks I’ll give it a go

@vbrozik

I’ve edited my original post as I made a typo or two. All characters after >> are in the X.1 format and not X1

“So not sure why X.10 is before X.1”

Because you are sorting strings not numbers, and strings are sorted lexicographically.

To be precise: ‘X.1’ actually sorts before ‘X.10’. Try this:

sorted(['X.1', 'X.10'])

But ‘X.10’ will sort before ‘X.2’, for the same reason that Thursday comes before Wednesday in the dictionary, and ‘aardvark’ comes before ‘penguin’.

To sort this numerically is not an easy task. You need to split each substring into a tuple of fields, with the letters left as strings and the numeric fields converted to actual ints:

'X.10' --> ('X', 10)

and then sort the tuples, then reassemble them back to strings. Natural sorting is tricky to get right, and there are a ton of special cases needed to get it working correctly for arbitrary strings.

My recommendation is that if you need anything more than the simplest, most basic natural sorting, use the natsort library.

On approach could be using helper function with groupby for sorting.

Helper function splits item in list to get sort values, groups decimals and converts to integers and returns tuple of groups for sorting.

from itertools import groupby

test = ['chip>>X.1.A1.A2', 'chippy>>X.1.B2.B2','cheese>>X.10','califlower>>X.12' , 'cabbage>>X.2' , 'cake>>X.1.B1.A2']

def separate(item):
    sort_value = item.split('>>')[1]
    stream = ((is_num, ''.join(group)) for is_num, group in groupby(sort_value, str.isdecimal))
    return tuple(int(group) if is_num else group for is_num, group in stream)

print(*sorted(test, key=separate), sep='\n')

chip>>X.1.A1.A2
cake>>X.1.B1.A2
chippy>>X.1.B2.B2
cabbage>>X.2
cheese>>X.10
califlower>>X.12
2 Likes

@BowlOfRed

Natsort did the job perfectly. Thanks for the tip