Welcome, @data . It should certainly be quite straightforward to do this in Python. Just to note, at least based on your bash script above, you’re not writing to “an excel file”, but rather a CSV, which is a quite different entity (and a lot simpler and easier to work with from a programming perspective, though it is possible to read and write excel files as well). Also, it would have been very helpful to include your sample logs and CSV as verbatim text here (inside code blocks), or via a service like GitHub Gist, as it is not possible to copy text from screenshots (in order to experiment with approaches), and requires email users navigate here to answer your question, among other downsides.
In any case, this should be pretty straightforward to do; if you can do it in bash/awk/grep/sed, you’ll find Python to be much easier and simpler (and less cryptic) once you’re familiar with the basics. You could use the built-in csv module, as I do below, but in practice its much easier to use the popular Pandas library.
Basically, you can just create loop with each file, in which you open the file and loop over each line in it, checking for the lines that contain the data you need, and storing them in a list of dicts. Here’s a quick example, untested (since you didn’t share your data) and without error handling and such, but should give you the idea.
log_data = []
for file_number in range(START_NUMBER, STOP_NUMBER + 1):
with open(f"log_file_{file_number}.txt", encoding="utf-8") as log_file:
for line in log_file:
# For a new test, create a new dict (line) and get the test number
if line.startswith("Timestamps #"):
test_number = line.split("#")[1].strip()
test_data = {
"Text File Number": str(file_number),
"Test Number": test_number,
}
# For each data item, extract it from the line and strip whitespace
elif line.startswith("TX Egress Timestamp"):
test_data["TX Time"] = line.split(":")[-1].strip()
elif line.startswith("RX Ingress Timestamp"):
test_data["RX Time"] = line.split(":")[-1].strip()
# Etc for other things you want to record
# For the last item, append the complete data dict to the list
elif line.startswith("RX_ITS - TX_ETS"):
test_data["RX Time - TX Time"] = line.split(":")[-1].strip()
log_data.append(test_data)
str.split
is used to split the line by a certain character, [-1]
is used to get the last (i.e. second) item in the list of splits, i.e. the data you want, and strip
is used to strip any whitespace around the character. These same basic tools, as well as replace
, can be used to do additional processing, like stripping ns
from the end, and you can also convert the hex to ints with e.g. int("0x10", base=16)
.
Then, you have a list of dicts, each dict containing one line of your data. You then simply open your existing CSV file in append (a
) mode and use csv.DictWriter
to write the new lines in append
mode. You can also use csv.Sniffer
(not shown) to ensure you pick the correct CSV dialect for your existing file (line ending, quoting mode, etc). You should also check that the file exists, and if not, create it with w
instead of a
and write the column headers with csv_writer.writeheader()
first. Example:
import csv
COLUMNS = [
"Text File Number", "Test Number", "TX Time", "RX Time", "RX Time - TX Time"]
with open("test1.csv", mode="a", encoding="utf-8", newline="") as csv_file:
csv_writer = csv.DictWriter(csv_file, fieldnames=COLUMNS)
csv_writer.writerows(log_data)
If you want to write to an actual Excel file, you’ll need pandas; read in the file and then add new rows to the dataframe, one each for your list of dicts, and output it with to_excel()
.
Best of luck!