Regex for custom string and sort

Hello,
i’m trying to sort string by value and stuck in regex.

import re
ls = 'TOTAl: $2 subitem_0 TOTAL: $6 subitem0 subitem1'
regex_splitlines = re.findall(r'TOTAL: \$\d[^\d]*', ls)
print(regex_splitlines)

python3 sort.py
['TOTAL: $6 subitem']

but i need

[‘TOTAL: $2 subitem_0’, ‘TOTAL: $6 subitem0 subitem1’]

then i need to sort it by value in $ and get

['TOTAL: $6 subitem0 subitem1', 'TOTAL: $2 subitem_0']

First of all, ls contains “TOTAl” (with a lowercase letter at the end), so that’s not going to match.

In the pattern, [^\d]* will match until the next digit, so it’ll drop the final digit of subitem0, etc.

What you want to do is match "TOTAL: “, then a dollar sign, then a digit, then whatever follows lazily until " TOTAL:” (note the leading space) or the end of the string.

That gives you: TOTAL: \$\d.*?(?= TOTAL:|$).

Assuming that the “TOTAl” was a typo:

import re
ls = 'TOTAL: $2 subitem_0 TOTAL: $6 subitem0 subitem1'
regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', ls)
print(regex_splitlines)
2 Likes

indeed it was typo, my bad.
Regex works as expected.
And i made sorting and received desired output.

import re
lines = 'Title\nTOTAL: $2\nsubitem_0\nTOTAL: $6\nsubitem0\nsubitem1'
splitted_lines = lines.splitlines()
liststr = ' '.join([str(elem) for elem in splitted_lines[1:]])

regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', liststr)

# Sort the list using a lambda function as the sorting key in descending order
sorted_ls = sorted(regex_splitlines, key=lambda item: int(re.search(r'\$([0-9]+)', item).group(1)), reverse=True)
formatted_output = []

for item in sorted_ls:
    parts = item.split()
    formatted_output.append(parts[0] + ' ' + parts[1])  # Combine "TOTAL" and value
    formatted_output.extend(parts[2:])  # Add the subitems

formatted_output = '\n'.join(formatted_output)  # Join the lines with newline characters
print(splitted_lines[0])
print(formatted_output)

It outputs:

Title
TOTAL: $6
subitem0
subitem1
TOTAL: $2
subitem_0

My final task bit expanded, i need to receive similar output for multiple sections.
The input.txt has these sections. How to treat them individually?

Title
TOTAL: $2
subitem_0
TOTAL: $6
subitem0
subitem1

Title1
TOTAL: $5
subitem_0
TOTAL: $10
subitem0
subitem1

or

lines = 'Title\nTOTAL: $2\nsubitem_0\nTOTAL: $6\nsubitem0\nsubitem1\n\nTitle1\nTOTAL: $5\nsubitem_0\nTOTAL: $10\nsubitem0\nsubitem1'

desired output:

Title
TOTAL: $6
subitem0
subitem1
TOTAL: $2
subitem_0

Title1
TOTAL: $10
subitem0
subitem1
TOTAL: $5
subitem_0

The simplest solution would be to read the text, split it into sections with something like text.split('\n\n'), and then process each section.

i’ve added loop to split text into sections, then processed each one individually. Maybe it’s not quite good solution, but i got desired output.

import re
with open('input.txt', 'r') as file:
    sections = []  # List to store sections

    for line in file:
        line = line.strip()

        if line:  # Check if the line is not empty
            section_lines = []  # List to store lines in the current section
            section_lines.append(line)  # Add the section title

            for line in file:
                line = line.strip()
                if not line:  # Empty line indicates the end of the section
                    break
                section_lines.append(line)  # Add lines to the current section

            sections.append(section_lines)  # Add the section list to the sections list

    # Process the sections
    for section in sections:
        liststr = ' '.join([str(elem) for elem in section[1:]]) 
        regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', liststr)            
        sorted_ls = sorted(regex_splitlines, key=lambda item: int(re.search(r'\$([0-9]+)', item).group(1)), reverse=True)
        formatted_output = []
        for item in sorted_ls:
            parts = item.split()
            formatted_output.append(parts[0] + ' ' + parts[1])
            formatted_output.extend(parts[2:])  
        formatted_output = '\n'.join(formatted_output) 
        print(section[0])
        print(formatted_output)
        print("")