Help please: Reading .txt file with multiple array of numbers the right way

Hello there, I would like to read the following file:
image

After that I would like to convert each line to a tab separted one instead of the array look and write it back to an output file. It should look like this:

4   6   9   21  37  42
13  15  16  34  37  39
4   9   13  21  28  38
1   9   13  27  30  36  
2   5   9   13  29  43 
2   14  17  33  36  42
3   7   8   18  24  42
6   25  35  38  42  44
2   26  32  36  37  44
5   10  25  34  35  42

However I am totally stuck with my code here:

import argparse

def Input():
    parser = argparse.ArgumentParser(description = 'Read a file of number arrays and output it tab seperated')
    parser.add_argument('--infile', '-i', type = str, required = True, help='specifiy input file name')
    parser.add_argument('--outfile', '-o', type = str, default='numbers.txt', required = False, help='specify output file name')
    return parser.parse_args()


if __name__ == '__main__':
    args = Input()

    # Check if the filess are txt files so we have some security checks at least
    if ('.txt' in args.infile and '.txt' in args.outfile):

        # Open function returns a iterable file object we can iterate over with a for loop
        # The .strip() is used the remove the \n (newline) character after each line
        # The with is used to close it when the code block finishes. We don't need to do this ourself then
        with open(args.infile, encoding="utf-8") as f:
            for line in f:
                curr_line = line.strip()
                print(curr_line.split(','))

    else:
        print('Files must be valid .txt files...')

I can only get a String array from the file and it doesn’t really work when I then try to split it with ‘\t’. I have to do this by using list comprehension by the way. I am fairly new to python and hope you can help me please. For the printing or writing with using list comprehension I would use the following code:

print('\t'.join([str(x) for x in curr_line]))

Issue: My key issue here is that I can’t read the lines properly as they are somehow still strings like “[1,2,3,4,5]” or so. Hope you can explain it and help me please!

Thank you very much!

Using pandas:

import pandas as pd

df = pd.read_csv("input.txt", header=None)
df.iloc[:, 0] = df.iloc[:, 0].str.strip("[").astype(int)
df.iloc[:, -1] = df.iloc[:, -1].str.strip("]").astype(int)
df.to_csv("output.txt", sep="\t")

Note that there is probably a way to strip out the ‘[’ and ‘]’ characters when reading the file using an appropriate regex for the sep argument. The two lines in the middle (starting with df.iloc) could then be omitted. My regex-foo is too weak to figure it out at the moment.

One way of doing it is using literal_eval from ast module:

from ast import literal_eval

with open("source.txt", "r") as source, open("out.txt", "w") as dest:
    for line in source.readlines():
        print(*literal_eval(line),  sep="\t", file=dest)

This is also a good example why one should never post screenshots…

3 Likes

Try this. It only uses standard library and built in functions.

import csv                                     # python has a csv module, see docs.
outfile = 'x.csv'
infile = 'x.data'
with open(outfile, 'w', newline = '') as fout: # need to add "newline = ''"" as csv does it own thing see docs
    write = csv.writer(fout)                   # get a csv writer to output formatteded dats
    with open(infile) as fin:                  # files closed at end with block
        for line in fin:                       # looping over a text file gets you a new line each time.
            line = line.strip()                # get rid of excess space and end of line
            lst = eval(line)                   # this works as the data in the input file looks like a list on each line
            write.writerow(lst)                # splits the list and adds the commas

Its not the shortest way to do it but I think its clear

This would be very dangerous in production code. Never call eval() on an unsanitized text coming from an external source! You are opening your program to injection vulnerabilities.

ast.literal_eval() shown by Aivar is safer but still not completely safe.

1 Like

(post deleted by author)

Thank you for that answer! It works nicely but I never used pandas before and it says read_csv so this may not be ok for the task I was given. But it is a really nice solution. I am curious if it is possible without any libraries.

Thank you for that answer! It works great although I don’t really know that “ast” module or understand it yet. What whould happen if the file contains “1, 2, 3, 4, 5” without a “” enclosing it? I guess I have to assume it always does.

That is great! I was looking for something with the base modules but it seems I am also needing the “csv” module.

It’s quite simple to find out - add such a row to the file and observe what will happen.