How to parse this file like I need to parse the content which are marked red like it is in plain text format I have many file like this I have to add all the file like sum the number which are placed at same location

_Abinaya · July 6, 2023, 11:34am

ssweber · July 6, 2023, 12:29pm

Hi abinaya!

That’s an interesting problem you have. I’m assuming that output is from some kind of PLC/HMI/SCADA?

Couple questions:

Are you allowed to post an example txt file?
Are all the files from the same source, meaning, will they all have the same tabular data, row/column labels, “————-“ text border at the top of the table, etc? Even better if it’ll be basically exactly the same (other than the cell values)

If the answers to those questions are “yes”, post an example txt file, and I’m sure we try writing a specific parsing script for that file together.

_Abinaya · July 7, 2023, 9:58am

ya every file is of same format . same tabular column with same rows and columns output should be in xlsx format but like I am not supposed to post the text file . every text file is same in format

ssweber · July 7, 2023, 5:09pm

Unfortunately it’ll be hard verify if the following works without an example. However this should work if all your files have ‘Inspection Summary’ as the table heading, and columns are seperated by spaces (not tabs):

import pandas as pd
import re

# Read the plain text file
with open('data.txt', 'r') as file:
    lines = file.readlines()

# Find the start and end indices of the Inspection Summary section
start_index = next(i for i, line in enumerate(lines) if line.startswith('Inspection Summary'))
end_index = next(i for i, line in enumerate(lines[start_index:]) if line.strip() == '') + start_index

# Extract column headings
columns = re.split(r' {2,}', lines[start_index].strip())

# Extract summary lines and split based on multiple spaces
summary_lines = [re.split(r' {2,}', line.strip()) for line in lines[start_index+2:end_index]]

# Create a DataFrame from the list of lists
df = pd.DataFrame(summary_lines, columns=columns)

# Remove the 'Total' row
df = df.drop('TOTAL', axis=1)

# Transpose the DataFrame, set the first row as column headers, and reset the index
df_transposed = df.transpose().reset_index()
df_transposed.columns = df_transposed.iloc[0]
df_transposed = df_transposed[1:].reset_index(drop=True)

# Print the resulting DataFrame
print(df)
# Print the transposed DataFrame
print(df_transposed)

# Get the sum of the 'Units Inspected' column
column_sum = df_transposed['Units Inspected'].astype(float).sum()
print(column_sum)

tjreedy · July 7, 2023, 9:14pm

I once did a lot of processing of output files. My generic procedure: Use a loop to fine a ‘landmark’ line; here the one beginning with ‘Inspection Summary’. Read and ignore as many more non-data lines as needed; here 1. Read and process data lines. Ideally they are followed by a blank line, as it appears here.

_Abinaya · July 10, 2023, 5:38am

can u please tell me how to use for loop and loop this

tjreedy · July 10, 2023, 5:50pm

Read the tutorial to learn how to use while and for loops, and practice with variations of examples therein.

Topic		Replies	Views
How to parse the file in Python Help help	7	1103	May 30, 2023
Parsing Text File Python Help	23	8283	June 27, 2022
Tabular data in .txt into .csv/xls Python Help help	15	1525	December 7, 2022
Retreive in Excel some data from a plain txt file Python Help	0	436	July 2, 2020
How to read excel file in a specific text file format? Python Help	1	2699	August 13, 2021

How to parse this file like I need to parse the content which are marked red like it is in plain text format I have many file like this I have to add all the file like sum the number which are placed at same location

Related Topics