Regex for custom string and sort

Feels · October 18, 2023, 3:19pm

Hello,
i’m trying to sort string by value and stuck in regex.

import re
ls = 'TOTAl: $2 subitem_0 TOTAL: $6 subitem0 subitem1'
regex_splitlines = re.findall(r'TOTAL: \$\d[^\d]*', ls)
print(regex_splitlines)

python3 sort.py
['TOTAL: $6 subitem']

but i need

[‘TOTAL: $2 subitem_0’, ‘TOTAL: $6 subitem0 subitem1’]

then i need to sort it by value in $ and get

['TOTAL: $6 subitem0 subitem1', 'TOTAL: $2 subitem_0']

MRAB · October 18, 2023, 4:43pm

First of all, ls contains “TOTAl” (with a lowercase letter at the end), so that’s not going to match.

In the pattern, [^\d]* will match until the next digit, so it’ll drop the final digit of subitem0, etc.

What you want to do is match "TOTAL: “, then a dollar sign, then a digit, then whatever follows lazily until " TOTAL:” (note the leading space) or the end of the string.

That gives you: TOTAL: \$\d.*?(?= TOTAL:|$).

Assuming that the “TOTAl” was a typo:

import re
ls = 'TOTAL: $2 subitem_0 TOTAL: $6 subitem0 subitem1'
regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', ls)
print(regex_splitlines)

Feels · October 18, 2023, 7:05pm

indeed it was typo, my bad.
Regex works as expected.
And i made sorting and received desired output.

import re
lines = 'Title\nTOTAL: $2\nsubitem_0\nTOTAL: $6\nsubitem0\nsubitem1'
splitted_lines = lines.splitlines()
liststr = ' '.join([str(elem) for elem in splitted_lines[1:]])

regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', liststr)

# Sort the list using a lambda function as the sorting key in descending order
sorted_ls = sorted(regex_splitlines, key=lambda item: int(re.search(r'\$([0-9]+)', item).group(1)), reverse=True)
formatted_output = []

for item in sorted_ls:
    parts = item.split()
    formatted_output.append(parts[0] + ' ' + parts[1])  # Combine "TOTAL" and value
    formatted_output.extend(parts[2:])  # Add the subitems

formatted_output = '\n'.join(formatted_output)  # Join the lines with newline characters
print(splitted_lines[0])
print(formatted_output)

It outputs:

Title
TOTAL: $6
subitem0
subitem1
TOTAL: $2
subitem_0

My final task bit expanded, i need to receive similar output for multiple sections.
The input.txt has these sections. How to treat them individually?

Title
TOTAL: $2
subitem_0
TOTAL: $6
subitem0
subitem1

Title1
TOTAL: $5
subitem_0
TOTAL: $10
subitem0
subitem1

or

lines = 'Title\nTOTAL: $2\nsubitem_0\nTOTAL: $6\nsubitem0\nsubitem1\n\nTitle1\nTOTAL: $5\nsubitem_0\nTOTAL: $10\nsubitem0\nsubitem1'

desired output:

Title
TOTAL: $6
subitem0
subitem1
TOTAL: $2
subitem_0

Title1
TOTAL: $10
subitem0
subitem1
TOTAL: $5
subitem_0

MRAB · October 18, 2023, 7:12pm

The simplest solution would be to read the text, split it into sections with something like text.split('\n\n'), and then process each section.

Feels · October 19, 2023, 1:53pm

i’ve added loop to split text into sections, then processed each one individually. Maybe it’s not quite good solution, but i got desired output.

import re
with open('input.txt', 'r') as file:
    sections = []  # List to store sections

    for line in file:
        line = line.strip()

        if line:  # Check if the line is not empty
            section_lines = []  # List to store lines in the current section
            section_lines.append(line)  # Add the section title

            for line in file:
                line = line.strip()
                if not line:  # Empty line indicates the end of the section
                    break
                section_lines.append(line)  # Add lines to the current section

            sections.append(section_lines)  # Add the section list to the sections list

    # Process the sections
    for section in sections:
        liststr = ' '.join([str(elem) for elem in section[1:]]) 
        regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', liststr)            
        sorted_ls = sorted(regex_splitlines, key=lambda item: int(re.search(r'\$([0-9]+)', item).group(1)), reverse=True)
        formatted_output = []
        for item in sorted_ls:
            parts = item.split()
            formatted_output.append(parts[0] + ' ' + parts[1])
            formatted_output.extend(parts[2:])  
        formatted_output = '\n'.join(formatted_output) 
        print(section[0])
        print(formatted_output)
        print("")

Topic		Replies	Views
Need help with sorting numbers Python Help help	1	301	September 20, 2021
Need help with my code (Regular Expression) Python Help	3	336	April 4, 2022
Parsing a formatted string Python Help	9	632	March 5, 2021
How do I do this Python Help	5	416	January 27, 2023
Python regex issues Python Help	2	656	June 5, 2022

Regex for custom string and sort

Related Topics