How to parse this file like I need to parse the content which are marked red like it is in plain text format I have many file like this I have to add all the file like sum the number which are placed at same location

Hi abinaya!

That’s an interesting problem you have. I’m assuming that output is from some kind of PLC/HMI/SCADA?

Couple questions:

  • Are you allowed to post an example txt file?
  • Are all the files from the same source, meaning, will they all have the same tabular data, row/column labels, “————-“ text border at the top of the table, etc? Even better if it’ll be basically exactly the same (other than the cell values)

If the answers to those questions are “yes”, post an example txt file, and I’m sure we try writing a specific parsing script for that file together.

1 Like

ya every file is of same format . same tabular column with same rows and columns output should be in xlsx format but like I am not supposed to post the text file . every text file is same in format

Unfortunately it’ll be hard verify if the following works without an example. However this should work if all your files have ‘Inspection Summary’ as the table heading, and columns are seperated by spaces (not tabs):

import pandas as pd
import re

# Read the plain text file
with open('data.txt', 'r') as file:
    lines = file.readlines()

# Find the start and end indices of the Inspection Summary section
start_index = next(i for i, line in enumerate(lines) if line.startswith('Inspection Summary'))
end_index = next(i for i, line in enumerate(lines[start_index:]) if line.strip() == '') + start_index

# Extract column headings
columns = re.split(r' {2,}', lines[start_index].strip())

# Extract summary lines and split based on multiple spaces
summary_lines = [re.split(r' {2,}', line.strip()) for line in lines[start_index+2:end_index]]

# Create a DataFrame from the list of lists
df = pd.DataFrame(summary_lines, columns=columns)

# Remove the 'Total' row
df = df.drop('TOTAL', axis=1)

# Transpose the DataFrame, set the first row as column headers, and reset the index
df_transposed = df.transpose().reset_index()
df_transposed.columns = df_transposed.iloc[0]
df_transposed = df_transposed[1:].reset_index(drop=True)

# Print the resulting DataFrame
print(df)
# Print the transposed DataFrame
print(df_transposed)

# Get the sum of the 'Units Inspected' column
column_sum = df_transposed['Units Inspected'].astype(float).sum()
print(column_sum)

I once did a lot of processing of output files. My generic procedure: Use a loop to fine a ‘landmark’ line; here the one beginning with ‘Inspection Summary’. Read and ignore as many more non-data lines as needed; here 1. Read and process data lines. Ideally they are followed by a blank line, as it appears here.

1 Like

can u please tell me how to use for loop and loop this

Read the tutorial to learn how to use while and for loops, and practice with variations of examples therein.