Parsing Text File

#11688 Data ; Jmax 90 ; St Dev 0.159
5. 5. 2. 3. 3. .5 Spin Statistics , Spin Y
P1 D66 0 1 0 1 P0 D6 0 0 0 0 D1 dip
801.0 264.0 1031.5 388.4 0.13778094357E+00 0.42182248646E-07
72 0.d+00 0 Para Number ; Model Accuracy Parameters
28SiF4
Dim 21 fév 2021 16:09:29 CET  Hmn  Frdm         Value/cm-1  St.Dev./cm-1
   1  2(0,0A1) 0000A1 0000A1 A1 02   224  0.13778023448E+00 0.3915693E-06
   2  4(0,0A1) 0000A1 0000A1 A1 04   139 -0.41039338392E-07 0.6560125E-10
• • •
   3  4(4,0A1) 0000A1 0000A1 A1 04   536 -0.33591716068E-08 0.4290270E-11
   4  6(0,0A1) 0000A1 0000A1 A1 06     0  0.00000000000E+00 0.0000000E+00
   5  6(4,0A1) 0000A1 0000A1 A1 06     0  0.00000000000E+00 0.0000000E+00
   6  6(6,0A1) 0000A1 0000A1 A1 06     0  0.00000000000E+00 0.0000000E+00
   7  8(0,0A1) 0000A1 0000A1 A1 08     0  0.00000000000E+00 0.0000000E+00
   8  8(4,0A1) 0000A1 0000A1 A1 08     0  0.00000000000E+00 0.0000000E+00
   9  8(6,0A1) 0000A1 0000A1 A1 08     0  0.00000000000E+00 0.0000000E+00
  10  8(8,0A1) 0000A1 0000A1 A1 08     0  0.00000000000E+00 0.0000000E+00
  11  0(0,0A1)  0100E 0100E  A1 20   330  0.26421941002E+03 0.3967863E-04
  12  2(0,0A1)  0100E 0100E  A1 22   130 -0.14303321917E-03 0.3393096E-07
  13  2(2,0E )  0100E 0100E  E  22   248 -0.46790609420E-04 0.2657215E-07
  14  3(3,0A2)  0100E 0100E  A2 23   197  0.14085216624E-06 0.2969422E-09
  15  4(0,0A1)  0100E 0100E  A1 24   152  0.38404874052E-09 0.6656298E-11
  16  4(2,0E )  0100E 0100E  E  24   204 -0.10234422562E-09 0.3485302E-11

Applying the K.I.S.S. principle (“Keep It Super-Simple”)…
…this data is separated by spaces with no meaningful spaces inside the column values, so it can be made to behave like a ‘space-delimited’ text file. After removing the meaningless space characters, we can break each line at the spaces and find the columns that way.

The process is:

  1. Read each line from the CSV file.
  2. Find the first line of data.
    • as Václav pointed out, the ‘Dim’ is not reliable since it’s a day of the week and is almost certain to change.
    • the numeral '1 ’ with a space after it appears to be reliable, but this should be thoroughly investigated.
    • The code below assumes that the '1  is reliable (‘1’ + <space>). If that turns out to be not reliable, we could assume that the data consistently starts on the 8th line OR look for two ‘:’ that are two characters apart OR a number of any other methods. The programmer needs to decide what method works best for this file format. (Ideally, the file has a firm specification that gives some certainty on how the header is structured, like “data ALWAYS starts on line 8”.)
  3. Read each line of data. Remove the padded space from the 2nd Column.
  4. Find the location of each string of space characters and pull that column’s data into the corresponding column of a two-dimensional list. BONUS: this is exactly what the Python split() function does!
  5. Read the target column with a for: loop and list[row][col] reference.

NOTE: Step 3 can just read the target column if none of the other column data are needed. The code below reads all columns and is probably more useful.

new_row = []
data_table2 = []
data_start_marker = '1 '

csv_file = open("KikiData.csv",'r')         #if the file is too large to fit into memory...
csv_rows = csv_file.readlines()             #...loop through the file line-by-line using 'readlines()'

for line_num,row in enumerate(csv_rows):
    if row.startswith(data_start_marker):   #find the first data line
        data_start = line_num
        break                               #stop looping; go to the next line of code after the loop

col_num = 7     # ←←this is the column you asked for (first item in a list is position 0)
data_table = [row.replace( ' )' , ')' ) for row in csv_rows]
data_table1 = [row.split() for row in data_table [data_start:]]
data_col = [data_table1[i][col_num] for i in range(len(data_table1))]   #print the column

 
The code below has print() loops to print the columns vertically AND also has a for: loop that shows what the data_table1 = [row.split() for row… line does.

new_row = []
data_table2 = []
data_start_marker = '1 '

csv_file = open("KikiData.csv",'r')         #if the file is too large to fit into memory...
csv_rows = csv_file.readlines()             #...loop through the file line-by-line using 'readlines()'

for line_num,row in enumerate(csv_rows):
    if row.startswith(data_start_marker):   #find the first data line
        data_start = line_num
        break                               #stop looping; go to the next line of code after the loop

col_num = 7     #this is the column you asked for (first item in a list is position 0)
data_table = [row.replace( ' )' , ')' ) for row in csv_rows]
data_table1 = [row.split() for row in data_table [data_start:]]
data_col = [data_table1[i][col_num] for i in range(len(data_table1))]   #print the column
for item in data_col:
    print(item)
#THIS LOOP ↓↓↓ DOES THE SAME THING AS 'data_table1 =' ABOVE ↑↑↑  Use the one that is clearest to you.
for row in csv_rows[data_start:]:           #process the data rows from data_start row to end of csv_rows list
    new_row = row.split()                   #break the columns on this row into a list; 'space' is the default character to split at => string.split(" ")
    data_table2.append(new_row)
    new_row = []

data_col = [data_table2[i][col_num] for i in range(len(data_table2))]   #print the column
for item in data_col:
    print(item)