Hi, there are several ways of doing this. Which one is best depends on the scale of the data and the time constraints you have (if any).
If File 1 has a strict, fixed format (and no format errors), then the simplest and probably fastest way is to simply use str.replace. So, load file 2 and file 3, and convert them into a dictionary mapping strings to strings:
with open(file2) as f:
src= [line.strip() for line in f.readlines()]
with open(file3) as f:
dst = [line.strip() for line in f.readlines()]
rep = dict(zip(patterns, subs)) # dictionary mapping src strings to dst strings
Then load file 1 and apply the replacements. If you’re lucky (depending on the exact format and content of file1) you can simply do sth like this:
with open(file1) as f:
s = f.read()
for src, dst in rep.items():
s = s.replace(src, dst)
with open(final, "w+") as f:
f.write(s)
Btw, the way the data is organized in files, makes it very likely you’ll get data bugs. For instance, file2 and file3 should really be organized as a 2-column csv or tsv file – containing both the source strings and the destination strings --, since currently it’s easy for those files to get out of sync, or just have an off-by-one-line error.
There are other, faster ways of doing this too, but if your files are rather small, this may be the simplest way. If your file1 is very large, and you have thousands of replacements, then the above code is not very optimal. It will be better in that case to load the src/dst strings into a trie and only process the content of file1 once (in the code above the content is processed repeatedly, once for every src/dst pair).
Btw, if your src and dst strings don’t contain white-space, then you could also speed up the code above by using instead something like this:
with open(file1) as f:
lines = f.readlines()
lines_new = []
for line in lines:
line_new = []
for token in line.split(): # split line on white-space
line_new.append(rep.get(token, token)) # lookup token in the dict and replace it, if found
lines_new.append(" ".join(line_new) + "\n")
with open(final, "w+") as f:
f.writelines(lines_new)