Parsing Text File

mlgtechuser · June 25, 2022, 1:04pm

#11688 Data ; Jmax 90 ; St Dev 0.159
5. 5. 2. 3. 3. .5 Spin Statistics , Spin Y
P1 D66 0 1 0 1 P0 D6 0 0 0 0 D1 dip
801.0 264.0 1031.5 388.4 0.13778094357E+00 0.42182248646E-07
72 0.d+00 0 Para Number ; Model Accuracy Parameters
28SiF4
Dim 21 fév 2021 16:09:29 CET  Hmn  Frdm         Value/cm-1  St.Dev./cm-1
   1  2(0,0A1) 0000A1 0000A1 A1 02   224  0.13778023448E+00 0.3915693E-06
   2  4(0,0A1) 0000A1 0000A1 A1 04   139 -0.41039338392E-07 0.6560125E-10

• • •

   3  4(4,0A1) 0000A1 0000A1 A1 04   536 -0.33591716068E-08 0.4290270E-11
   4  6(0,0A1) 0000A1 0000A1 A1 06     0  0.00000000000E+00 0.0000000E+00
   5  6(4,0A1) 0000A1 0000A1 A1 06     0  0.00000000000E+00 0.0000000E+00
   6  6(6,0A1) 0000A1 0000A1 A1 06     0  0.00000000000E+00 0.0000000E+00
   7  8(0,0A1) 0000A1 0000A1 A1 08     0  0.00000000000E+00 0.0000000E+00
   8  8(4,0A1) 0000A1 0000A1 A1 08     0  0.00000000000E+00 0.0000000E+00
   9  8(6,0A1) 0000A1 0000A1 A1 08     0  0.00000000000E+00 0.0000000E+00
  10  8(8,0A1) 0000A1 0000A1 A1 08     0  0.00000000000E+00 0.0000000E+00
  11  0(0,0A1)  0100E 0100E  A1 20   330  0.26421941002E+03 0.3967863E-04
  12  2(0,0A1)  0100E 0100E  A1 22   130 -0.14303321917E-03 0.3393096E-07

  13  2(2,0E )  0100E 0100E  E  22   248 -0.46790609420E-04 0.2657215E-07
  14  3(3,0A2)  0100E 0100E  A2 23   197  0.14085216624E-06 0.2969422E-09
  15  4(0,0A1)  0100E 0100E  A1 24   152  0.38404874052E-09 0.6656298E-11
  16  4(2,0E )  0100E 0100E  E  24   204 -0.10234422562E-09 0.3485302E-11

Applying the K.I.S.S. principle (“Keep It Super-Simple”)…
…this data is separated by spaces with no meaningful spaces inside the column values, so it can be made to behave like a ‘space-delimited’ text file. After removing the meaningless space characters, we can break each line at the spaces and find the columns that way.

The process is:

Read each line from the CSV file.
Find the first line of data.
- as Václav pointed out, the ‘Dim’ is not reliable since it’s a day of the week and is almost certain to change.
- the numeral '1 ’ with a space after it appears to be reliable, but this should be thoroughly investigated.
- The code below assumes that the '1 ’ is reliable (‘1’ + <space>). If that turns out to be not reliable, we could assume that the data consistently starts on the 8th line OR look for two ‘:’ that are two characters apart OR a number of any other methods. The programmer needs to decide what method works best for this file format. (Ideally, the file has a firm specification that gives some certainty on how the header is structured, like “data ALWAYS starts on line 8”.)
Read each line of data. Remove the padded space from the 2^nd Column.
Find the location of each string of space characters and pull that column’s data into the corresponding column of a two-dimensional list. BONUS: this is exactly what the Python split() function does!
Read the target column with a for: loop and list[row][col] reference.

NOTE: Step 3 can just read the target column if none of the other column data are needed. The code below reads all columns and is probably more useful.

new_row = []
data_table2 = []
data_start_marker = '1 '

csv_file = open("KikiData.csv",'r')         #if the file is too large to fit into memory...
csv_rows = csv_file.readlines()             #...loop through the file line-by-line using 'readlines()'

for line_num,row in enumerate(csv_rows):
    if row.startswith(data_start_marker):   #find the first data line
        data_start = line_num
        break                               #stop looping; go to the next line of code after the loop

col_num = 7     # ←←this is the column you asked for (first item in a list is position 0)
data_table = [row.replace( ' )' , ')' ) for row in csv_rows]
data_table1 = [row.split() for row in data_table [data_start:]]
data_col = [data_table1[i][col_num] for i in range(len(data_table1))]   #print the column

The code below has print() loops to print the columns vertically AND also has a for: loop that shows what the data_table1 = [row.split() for row… line does.

new_row = []
data_table2 = []
data_start_marker = '1 '

csv_file = open("KikiData.csv",'r')         #if the file is too large to fit into memory...
csv_rows = csv_file.readlines()             #...loop through the file line-by-line using 'readlines()'

for line_num,row in enumerate(csv_rows):
    if row.startswith(data_start_marker):   #find the first data line
        data_start = line_num
        break                               #stop looping; go to the next line of code after the loop

col_num = 7     #this is the column you asked for (first item in a list is position 0)
data_table = [row.replace( ' )' , ')' ) for row in csv_rows]
data_table1 = [row.split() for row in data_table [data_start:]]
data_col = [data_table1[i][col_num] for i in range(len(data_table1))]   #print the column
for item in data_col:
    print(item)
#THIS LOOP ↓↓↓ DOES THE SAME THING AS 'data_table1 =' ABOVE ↑↑↑  Use the one that is clearest to you.
for row in csv_rows[data_start:]:           #process the data rows from data_start row to end of csv_rows list
    new_row = row.split()                   #break the columns on this row into a list; 'space' is the default character to split at => string.split(" ")
    data_table2.append(new_row)
    new_row = []

data_col = [data_table2[i][col_num] for i in range(len(data_table2))]   #print the column
for item in data_col:
    print(item)