Help please: Reading .txt file with multiple array of numbers the right way

cubicmagnet · October 24, 2022, 8:45am

Hello there, I would like to read the following file:

After that I would like to convert each line to a tab separted one instead of the array look and write it back to an output file. It should look like this:

4   6   9   21  37  42
13  15  16  34  37  39
4   9   13  21  28  38
1   9   13  27  30  36  
2   5   9   13  29  43 
2   14  17  33  36  42
3   7   8   18  24  42
6   25  35  38  42  44
2   26  32  36  37  44
5   10  25  34  35  42

However I am totally stuck with my code here:

import argparse

def Input():
    parser = argparse.ArgumentParser(description = 'Read a file of number arrays and output it tab seperated')
    parser.add_argument('--infile', '-i', type = str, required = True, help='specifiy input file name')
    parser.add_argument('--outfile', '-o', type = str, default='numbers.txt', required = False, help='specify output file name')
    return parser.parse_args()


if __name__ == '__main__':
    args = Input()

    # Check if the filess are txt files so we have some security checks at least
    if ('.txt' in args.infile and '.txt' in args.outfile):

        # Open function returns a iterable file object we can iterate over with a for loop
        # The .strip() is used the remove the \n (newline) character after each line
        # The with is used to close it when the code block finishes. We don't need to do this ourself then
        with open(args.infile, encoding="utf-8") as f:
            for line in f:
                curr_line = line.strip()
                print(curr_line.split(','))

    else:
        print('Files must be valid .txt files...')

I can only get a String array from the file and it doesn’t really work when I then try to split it with ‘\t’. I have to do this by using list comprehension by the way. I am fairly new to python and hope you can help me please. For the printing or writing with using list comprehension I would use the following code:

print('\t'.join([str(x) for x in curr_line]))

Issue: My key issue here is that I can’t read the lines properly as they are somehow still strings like “[1,2,3,4,5]” or so. Hope you can explain it and help me please!

Thank you very much!

abessman · October 24, 2022, 9:18am

Using pandas:

import pandas as pd

df = pd.read_csv("input.txt", header=None)
df.iloc[:, 0] = df.iloc[:, 0].str.strip("[").astype(int)
df.iloc[:, -1] = df.iloc[:, -1].str.strip("]").astype(int)
df.to_csv("output.txt", sep="\t")

Note that there is probably a way to strip out the ‘[’ and ‘]’ characters when reading the file using an appropriate regex for the sep argument. The two lines in the middle (starting with df.iloc) could then be omitted. My regex-foo is too weak to figure it out at the moment.

aivarpaalberg · October 24, 2022, 9:37am

One way of doing it is using literal_eval from ast module:

from ast import literal_eval

with open("source.txt", "r") as source, open("out.txt", "w") as dest:
    for line in source.readlines():
        print(*literal_eval(line),  sep="\t", file=dest)

This is also a good example why one should never post screenshots…

John_Carter · October 24, 2022, 10:28am

Try this. It only uses standard library and built in functions.

import csv                                     # python has a csv module, see docs.
outfile = 'x.csv'
infile = 'x.data'
with open(outfile, 'w', newline = '') as fout: # need to add "newline = ''"" as csv does it own thing see docs
    write = csv.writer(fout)                   # get a csv writer to output formatteded dats
    with open(infile) as fin:                  # files closed at end with block
        for line in fin:                       # looping over a text file gets you a new line each time.
            line = line.strip()                # get rid of excess space and end of line
            lst = eval(line)                   # this works as the data in the input file looks like a list on each line
            write.writerow(lst)                # splits the list and adds the commas

Its not the shortest way to do it but I think its clear

vbrozik · October 24, 2022, 10:36am

This would be very dangerous in production code. Never call eval() on an unsanitized text coming from an external source! You are opening your program to injection vulnerabilities.

ast.literal_eval() shown by Aivar is safer but still not completely safe.

cubicmagnet · October 24, 2022, 11:12am

(post deleted by author)

cubicmagnet · October 24, 2022, 11:14am

Thank you for that answer! It works nicely but I never used pandas before and it says read_csv so this may not be ok for the task I was given. But it is a really nice solution. I am curious if it is possible without any libraries.

cubicmagnet · October 24, 2022, 11:14am

Thank you for that answer! It works great although I don’t really know that “ast” module or understand it yet. What whould happen if the file contains “1, 2, 3, 4, 5” without a “” enclosing it? I guess I have to assume it always does.

cubicmagnet · October 24, 2022, 11:15am

That is great! I was looking for something with the base modules but it seems I am also needing the “csv” module.

aivarpaalberg · October 24, 2022, 12:03pm

It’s quite simple to find out - add such a row to the file and observe what will happen.