Making str.replace() accept lists

rangeles89 · August 27, 2021, 6:49am

Hi, another possible solution could be implement the replace algorithm using find method…

For my solution I downloaded a txt version of “El Quijote de la mancha”, in order to have a string long enough to measure time.

with urlopen("https://gist.githubusercontent.com/jsdario/6d6c69398cb0c73111e49f1218960f79/raw/8d4fc4548d437e2a7203a5aeeace5477f598827d/el_quijote.txt") as f:
    text = f.read()
text = str(text, 'utf-8')
to_replace = list(set([t for t in choices(text.split(), k=4000) if len(t)>3 ]))
replace_map =  list(map(lambda x: (x, f'new_string_to_replace_with_{x}'), to_replace))
print(replace_map)
print(len(replace_map))

Then I created a function using the nested calls to replace method

def multireplace_v1(s, changes):
    for old, new in changes:
        s = s.replace(old, new)
    return s

And another function using find method, and creating a list of all possible replacements using the changes

def multireplace_v2(s, changes):
    right = len(s)-1
    replacements = []
    for old, new in changes:
        i = 0
        l = len(old)
        while True:
            n = text_test.find(old, i, right)
            if n == -1:
                break
            i = n + l
            replacements.append((n, i, l, new))
    replacements = sorted(replacements, key= lambda x: x[0])
    
    i = 0
    prev_s = -1
    prev_e = -1
    new_s = ""
    for b, e, l, t in replacements:
        if b >= prev_s and  b+l <= prev_e:
            continue
        prev_s = b
        prev_e = b+l
        new_s += s[i:b] + t
        i = e
    new_s += s[i:]
    return new_s

The call

result1 = multireplace_v1(text, replace_map)

took 3.06 seconds to finish

And

result2 = multireplace_v2(text, replace_map)

took 914ms

The proposed solution is faster, and also prevents from replace the already replaced string, the priority is the occurrence of one of the string in changes

In the v1 function you’ll see things like this

Miguel de Cervnew_string_to_replace_with_new_string_to_replace_with_antes Saavedra

because is replacing the words ante and antes

In the v2 version you’ll get this:

Miguel de Cervnew_string_to_replace_with_antes Saavedra

Only one replacement per “word”.

Topic		Replies	Views
Is there an easier way to replace multiple different things at once Python Help help	4	7551	March 14, 2022
Hi! I am a newbie. Why doesn't list.replace() work? Help! Python Help help	4	1874	March 19, 2022
Str.replace of a set of characters Ideas	10	7773	December 22, 2019
Substring replace using variables as pattern and replace string Python Help help	6	499	January 30, 2023
Python regex issues Python Help	2	688	June 5, 2022

Making str.replace() accept lists

Related Topics