Find and replace text

arjuna01menon · January 20, 2024, 8:34pm

Hi Guys, I am fairly new to python. I am trying to find and replace texts on one file with the help of another file which has the list of texts to be replaced with.

File 1 : Is the actual file which requires the replacement of texts
File 2 : Has the list of texts that needs to be replaced
File 3: Has the list of texts that will be used to replace with.

File 1 : Content
object network Obj_24.221
ip address host 172.16.24.221 255.255.255.0
next

object network Obj_42.21
ip address host 172.16.42.21 255.255.255.0
next

File 2 :Content
172.16.24.221 255.255.255.0
172.16.42.21 255.255.255.0

File 3: Content
192.168.24.84 255.255.255.0
192.168.42.56 255.255.255.0

Final Result :
object network Obj_24.221
ip address host 192.168.24.84 255.255.255.0
next

object network Obj_42.21
ip address host 192.168.42.56 255.255.255.0
next

Any support for the same would be helpfull.

Thank you,

tjreedy · January 20, 2024, 9:13pm

When manipulating data in files, there are two problems: converting file data to python objects; manupulating the python objects; and possibly converting python object to file data. I strongly recommend starting with the middle problem, with example data in in-code strings.

s = """\
object network Obj_24.221
ip address host 172.16.24.221 255.255.255.0
next

object network Obj_42.21
ip address host 172.16.42.21 255.255.255.0
next
"""

olds = "172.16.24.221", "172.16.42.21"
news = "192.168.24.84", "192.168.42.56"
for old, new in zip(olds, news):
    s.replace(old, new)
print(s)

This prints what you requested starting with the example data.

hansgeunsmeyer · January 20, 2024, 9:15pm

Hi, there are several ways of doing this. Which one is best depends on the scale of the data and the time constraints you have (if any).

If File 1 has a strict, fixed format (and no format errors), then the simplest and probably fastest way is to simply use str.replace. So, load file 2 and file 3, and convert them into a dictionary mapping strings to strings:

with open(file2) as f:
    src= [line.strip() for line in f.readlines()]
with open(file3) as f:
    dst = [line.strip() for line in f.readlines()]
rep = dict(zip(patterns, subs))  # dictionary mapping src strings to dst strings

Then load file 1 and apply the replacements. If you’re lucky (depending on the exact format and content of file1) you can simply do sth like this:

with open(file1) as f:
     s = f.read()
for src, dst in rep.items():
     s = s.replace(src, dst)
with open(final, "w+") as f:
     f.write(s)

Btw, the way the data is organized in files, makes it very likely you’ll get data bugs. For instance, file2 and file3 should really be organized as a 2-column csv or tsv file – containing both the source strings and the destination strings --, since currently it’s easy for those files to get out of sync, or just have an off-by-one-line error.

There are other, faster ways of doing this too, but if your files are rather small, this may be the simplest way. If your file1 is very large, and you have thousands of replacements, then the above code is not very optimal. It will be better in that case to load the src/dst strings into a trie and only process the content of file1 once (in the code above the content is processed repeatedly, once for every src/dst pair).

Btw, if your src and dst strings don’t contain white-space, then you could also speed up the code above by using instead something like this:

with open(file1) as f:
     lines = f.readlines()
lines_new = []
for line in lines:
     line_new = []
     for token in line.split():  # split line on white-space
        line_new.append(rep.get(token, token))  # lookup token in the dict and replace it, if found
     lines_new.append(" ".join(line_new) + "\n")
with open(final, "w+") as f:
    f.writelines(lines_new)