Hi, another possible solution could be implement the replace algorithm using find method…
For my solution I downloaded a txt version of “El Quijote de la mancha”, in order to have a string long enough to measure time.
with urlopen("https://gist.githubusercontent.com/jsdario/6d6c69398cb0c73111e49f1218960f79/raw/8d4fc4548d437e2a7203a5aeeace5477f598827d/el_quijote.txt") as f:
text = f.read()
text = str(text, 'utf-8')
to_replace = list(set([t for t in choices(text.split(), k=4000) if len(t)>3 ]))
replace_map = list(map(lambda x: (x, f'new_string_to_replace_with_{x}'), to_replace))
print(replace_map)
print(len(replace_map))
Then I created a function using the nested calls to replace method
def multireplace_v1(s, changes):
for old, new in changes:
s = s.replace(old, new)
return s
And another function using find method, and creating a list of all possible replacements using the changes
def multireplace_v2(s, changes):
right = len(s)-1
replacements = []
for old, new in changes:
i = 0
l = len(old)
while True:
n = text_test.find(old, i, right)
if n == -1:
break
i = n + l
replacements.append((n, i, l, new))
replacements = sorted(replacements, key= lambda x: x[0])
i = 0
prev_s = -1
prev_e = -1
new_s = ""
for b, e, l, t in replacements:
if b >= prev_s and b+l <= prev_e:
continue
prev_s = b
prev_e = b+l
new_s += s[i:b] + t
i = e
new_s += s[i:]
return new_s
The call
result1 = multireplace_v1(text, replace_map)
took 3.06 seconds to finish
And
result2 = multireplace_v2(text, replace_map)
took 914ms
The proposed solution is faster, and also prevents from replace the already replaced string, the priority is the occurrence of one of the string in changes
In the v1 function you’ll see things like this
Miguel de Cervnew_string_to_replace_with_new_string_to_replace_with_antes Saavedra
because is replacing the words ante and antes
In the v2 version you’ll get this:
Miguel de Cervnew_string_to_replace_with_antes Saavedra
Only one replacement per “word”.