First of all, ls contains “TOTAl” (with a lowercase letter at the end), so that’s not going to match.
In the pattern, [^\d]* will match until the next digit, so it’ll drop the final digit of subitem0, etc.
What you want to do is match "TOTAL: “, then a dollar sign, then a digit, then whatever follows lazily until " TOTAL:” (note the leading space) or the end of the string.
That gives you: TOTAL: \$\d.*?(?= TOTAL:|$).
Assuming that the “TOTAl” was a typo:
import re
ls = 'TOTAL: $2 subitem_0 TOTAL: $6 subitem0 subitem1'
regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', ls)
print(regex_splitlines)
indeed it was typo, my bad.
Regex works as expected.
And i made sorting and received desired output.
import re
lines = 'Title\nTOTAL: $2\nsubitem_0\nTOTAL: $6\nsubitem0\nsubitem1'
splitted_lines = lines.splitlines()
liststr = ' '.join([str(elem) for elem in splitted_lines[1:]])
regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', liststr)
# Sort the list using a lambda function as the sorting key in descending order
sorted_ls = sorted(regex_splitlines, key=lambda item: int(re.search(r'\$([0-9]+)', item).group(1)), reverse=True)
formatted_output = []
for item in sorted_ls:
parts = item.split()
formatted_output.append(parts[0] + ' ' + parts[1]) # Combine "TOTAL" and value
formatted_output.extend(parts[2:]) # Add the subitems
formatted_output = '\n'.join(formatted_output) # Join the lines with newline characters
print(splitted_lines[0])
print(formatted_output)
It outputs:
Title
TOTAL: $6
subitem0
subitem1
TOTAL: $2
subitem_0
My final task bit expanded, i need to receive similar output for multiple sections.
The input.txt has these sections. How to treat them individually?
i’ve added loop to split text into sections, then processed each one individually. Maybe it’s not quite good solution, but i got desired output.
import re
with open('input.txt', 'r') as file:
sections = [] # List to store sections
for line in file:
line = line.strip()
if line: # Check if the line is not empty
section_lines = [] # List to store lines in the current section
section_lines.append(line) # Add the section title
for line in file:
line = line.strip()
if not line: # Empty line indicates the end of the section
break
section_lines.append(line) # Add lines to the current section
sections.append(section_lines) # Add the section list to the sections list
# Process the sections
for section in sections:
liststr = ' '.join([str(elem) for elem in section[1:]])
regex_splitlines = re.findall(r'TOTAL: \$\d.*?(?= TOTAL:|$)', liststr)
sorted_ls = sorted(regex_splitlines, key=lambda item: int(re.search(r'\$([0-9]+)', item).group(1)), reverse=True)
formatted_output = []
for item in sorted_ls:
parts = item.split()
formatted_output.append(parts[0] + ' ' + parts[1])
formatted_output.extend(parts[2:])
formatted_output = '\n'.join(formatted_output)
print(section[0])
print(formatted_output)
print("")