Automate specific values from text file to an existing excel file

0

Is there any way to automate specific values from few text files to an existing excel file?

The input file is a log file of a process that contains process events, timestamps and more. I need to extract specific data such as Test Number, TX Time, RX Time and RX Time- TX Time. This is an example how the log file looks. This is the needed specific information and this is only one test from one text file. There will be few tests in one text file and there are few text files.

The existing excel file has a table that will store the extracted values from the text file. I tried to do with simple shell scripting initially but it only creates a new excel file rather than writing to the existing excel file. Is it possible to do with python? I am really stucked here and hope someone can help me.

I will also include the bash script i had that deletes the existing table and create a new sheet. Please do help me if there is any alternative.

#!/bin/bash
##################################################################################################
CWD=$(pwd)

 
  for j in {1..5} # number of reset iterations
  
      do

awk '$0~/^TX Time/ {print $5}' log_file_$j.txt >> 1_val.txt
awk '$0~/^RX Time/ {print $5}' log_log file_$j.txt >> 2_val.txt 

grep -irns "RX_Time - TX_Time" log_file_$j.txt >> test1.csv

awk '{print $5}' test1.csv > 1.txt     

grep -o "/.*" 1.txt > 2.txt

sed "s/^.//g" 2.txt > 3_val.txt
     done

paste -d' ' 1_val.txt 2_val.txt 3_val.txt >values.txt
tr " " "," < values.txt > test1.csv

Welcome, @data . It should certainly be quite straightforward to do this in Python. Just to note, at least based on your bash script above, you’re not writing to “an excel file”, but rather a CSV, which is a quite different entity (and a lot simpler and easier to work with from a programming perspective, though it is possible to read and write excel files as well). Also, it would have been very helpful to include your sample logs and CSV as verbatim text here (inside code blocks), or via a service like GitHub Gist, as it is not possible to copy text from screenshots (in order to experiment with approaches), and requires email users navigate here to answer your question, among other downsides.

In any case, this should be pretty straightforward to do; if you can do it in bash/awk/grep/sed, you’ll find Python to be much easier and simpler (and less cryptic) once you’re familiar with the basics. You could use the built-in csv module, as I do below, but in practice its much easier to use the popular Pandas library.

Basically, you can just create loop with each file, in which you open the file and loop over each line in it, checking for the lines that contain the data you need, and storing them in a list of dicts. Here’s a quick example, untested (since you didn’t share your data) and without error handling and such, but should give you the idea.

log_data = []
for file_number in range(START_NUMBER, STOP_NUMBER + 1):
    with open(f"log_file_{file_number}.txt", encoding="utf-8") as log_file:
        for line in log_file:
            # For a new test, create a new dict (line) and get the test number
            if line.startswith("Timestamps #"):
                test_number = line.split("#")[1].strip()
                test_data = {
                    "Text File Number": str(file_number),
                    "Test Number": test_number,
                    }
            # For each data item, extract it from the line and strip whitespace
            elif line.startswith("TX Egress Timestamp"):
                test_data["TX Time"] = line.split(":")[-1].strip()
            elif line.startswith("RX Ingress Timestamp"):
                test_data["RX Time"] = line.split(":")[-1].strip()
            # Etc for other things you want to record
            # For the last item, append the complete data dict to the list
            elif line.startswith("RX_ITS - TX_ETS"):
                test_data["RX Time - TX Time"] = line.split(":")[-1].strip()
                log_data.append(test_data)

str.split is used to split the line by a certain character, [-1] is used to get the last (i.e. second) item in the list of splits, i.e. the data you want, and strip is used to strip any whitespace around the character. These same basic tools, as well as replace, can be used to do additional processing, like stripping ns from the end, and you can also convert the hex to ints with e.g. int("0x10", base=16).

Then, you have a list of dicts, each dict containing one line of your data. You then simply open your existing CSV file in append (a) mode and use csv.DictWriter to write the new lines in append mode. You can also use csv.Sniffer (not shown) to ensure you pick the correct CSV dialect for your existing file (line ending, quoting mode, etc). You should also check that the file exists, and if not, create it with w instead of a and write the column headers with csv_writer.writeheader() first. Example:

import csv

COLUMNS = [
    "Text File Number", "Test Number", "TX Time", "RX Time", "RX Time - TX Time"]
with open("test1.csv", mode="a", encoding="utf-8", newline="") as csv_file:
    csv_writer = csv.DictWriter(csv_file, fieldnames=COLUMNS)
    csv_writer.writerows(log_data)

If you want to write to an actual Excel file, you’ll need pandas; read in the file and then add new rows to the dataframe, one each for your list of dicts, and output it with to_excel().

Best of luck!

1 Like