I’d read the lines, strip off the trailing newlines, split each line on multiple whitespace, and then remove those lines that contain only repeated ‘=’. That would result in a list where one of the entries was just ['']
, which corresponds to the blank line that separated the 2 tables. I’d then use the first part (excluding the table’s heading) to make a dict, and the second part to make some other convenient structure for a table.
Does the file have to come that way? Can it come as a CSV?
yes it can come
can u pls send the code for parsing
actually it is in plain text format
Inspection Summary TOTAL TRACK 1 TRACK 2 TRACK 3 TRACK 4 TRACK 5
Units Inspected 2654 516 533 538 538 529
Units Passed 2621 508 524 530 533 526
Units Yield 98.76 98.5 98.4 98.6 99.1 99.5
Units Failed 33 8 9 8 5 3 how to automate the parsing of text file in python
Here’s a simple example:
import re
# Path to the input file.
path = '/path/to/input'
# Pattern for splitting a line into columns.
pattern = re.compile(r'\s{2,}')
rows = []
with open(path) as file:
for line in file:
# Remove trailing whitespace.
line = line.rstrip()
# Ignore the line if it's empty or only repeated '-'.
if not line.strip('-'):
continue
# Split into columns.
columns = pattern.split(line)
# New row, with leading and trailing whitespace stripped off.
rows.append([cell.strip() for cell in columns])
print(rows)