Making str.replace() accept lists

Hi, another possible solution could be implement the replace algorithm using find method…

For my solution I downloaded a txt version of “El Quijote de la mancha”, in order to have a string long enough to measure time.

with urlopen("https://gist.githubusercontent.com/jsdario/6d6c69398cb0c73111e49f1218960f79/raw/8d4fc4548d437e2a7203a5aeeace5477f598827d/el_quijote.txt") as f:
    text = f.read()
text = str(text, 'utf-8')
to_replace = list(set([t for t in choices(text.split(), k=4000) if len(t)>3 ]))
replace_map =  list(map(lambda x: (x, f'new_string_to_replace_with_{x}'), to_replace))
print(replace_map)
print(len(replace_map))

Then I created a function using the nested calls to replace method

def multireplace_v1(s, changes):
    for old, new in changes:
        s = s.replace(old, new)
    return s

And another function using find method, and creating a list of all possible replacements using the changes

def multireplace_v2(s, changes):
    right = len(s)-1
    replacements = []
    for old, new in changes:
        i = 0
        l = len(old)
        while True:
            n = text_test.find(old, i, right)
            if n == -1:
                break
            i = n + l
            replacements.append((n, i, l, new))
    replacements = sorted(replacements, key= lambda x: x[0])
    
    i = 0
    prev_s = -1
    prev_e = -1
    new_s = ""
    for b, e, l, t in replacements:
        if b >= prev_s and  b+l <= prev_e:
            continue
        prev_s = b
        prev_e = b+l
        new_s += s[i:b] + t
        i = e
    new_s += s[i:]
    return new_s

The call

result1 = multireplace_v1(text, replace_map)

took 3.06 seconds to finish

And

result2 = multireplace_v2(text, replace_map)

took 914ms

The proposed solution is faster, and also prevents from replace the already replaced string, the priority is the occurrence of one of the string in changes

In the v1 function you’ll see things like this

Miguel de Cervnew_string_to_replace_with_new_string_to_replace_with_antes Saavedra

because is replacing the words ante and antes

In the v2 version you’ll get this:

Miguel de Cervnew_string_to_replace_with_antes Saavedra

Only one replacement per “word”.

1 Like