Hello there, I would like to read the following file:
After that I would like to convert each line to a tab separted one instead of the array look and write it back to an output file. It should look like this:
import argparse
def Input():
parser = argparse.ArgumentParser(description = 'Read a file of number arrays and output it tab seperated')
parser.add_argument('--infile', '-i', type = str, required = True, help='specifiy input file name')
parser.add_argument('--outfile', '-o', type = str, default='numbers.txt', required = False, help='specify output file name')
return parser.parse_args()
if __name__ == '__main__':
args = Input()
# Check if the filess are txt files so we have some security checks at least
if ('.txt' in args.infile and '.txt' in args.outfile):
# Open function returns a iterable file object we can iterate over with a for loop
# The .strip() is used the remove the \n (newline) character after each line
# The with is used to close it when the code block finishes. We don't need to do this ourself then
with open(args.infile, encoding="utf-8") as f:
for line in f:
curr_line = line.strip()
print(curr_line.split(','))
else:
print('Files must be valid .txt files...')
I can only get a String array from the file and it doesn’t really work when I then try to split it with ‘\t’. I have to do this by using list comprehension by the way. I am fairly new to python and hope you can help me please. For the printing or writing with using list comprehension I would use the following code:
print('\t'.join([str(x) for x in curr_line]))
Issue: My key issue here is that I can’t read the lines properly as they are somehow still strings like “[1,2,3,4,5]” or so. Hope you can explain it and help me please!
Note that there is probably a way to strip out the ‘[’ and ‘]’ characters when reading the file using an appropriate regex for the sep argument. The two lines in the middle (starting with df.iloc) could then be omitted. My regex-foo is too weak to figure it out at the moment.
One way of doing it is using literal_eval from ast module:
from ast import literal_eval
with open("source.txt", "r") as source, open("out.txt", "w") as dest:
for line in source.readlines():
print(*literal_eval(line), sep="\t", file=dest)
This is also a good example why one should never post screenshots…
Try this. It only uses standard library and built in functions.
import csv # python has a csv module, see docs.
outfile = 'x.csv'
infile = 'x.data'
with open(outfile, 'w', newline = '') as fout: # need to add "newline = ''"" as csv does it own thing see docs
write = csv.writer(fout) # get a csv writer to output formatteded dats
with open(infile) as fin: # files closed at end with block
for line in fin: # looping over a text file gets you a new line each time.
line = line.strip() # get rid of excess space and end of line
lst = eval(line) # this works as the data in the input file looks like a list on each line
write.writerow(lst) # splits the list and adds the commas
Its not the shortest way to do it but I think its clear
This would be very dangerous in production code. Never call eval() on an unsanitized text coming from an external source! You are opening your program to injection vulnerabilities.
ast.literal_eval() shown by Aivar is safer but still not completely safe.
Thank you for that answer! It works nicely but I never used pandas before and it says read_csv so this may not be ok for the task I was given. But it is a really nice solution. I am curious if it is possible without any libraries.
Thank you for that answer! It works great although I don’t really know that “ast” module or understand it yet. What whould happen if the file contains “1, 2, 3, 4, 5” without a “” enclosing it? I guess I have to assume it always does.