Making str.replace() accept lists

Syntax of the method: str.replace(old, new, count=-1)

What if replace could accept two lists instead of two strings.

Often I find in code:

text.replace(a, b).replace(c, d)

The concatenation of replace calls can cause a unnecessary performance hit if the string is large or the call is the calls are made too many times.

My suggestion:

text.replace([a, c], [b, d])

That would behave like (text.replace(a, b).replace(c, d)) but, b could not capture text from c, that is, old[i] would not capture new[i - 1] replaced text, or from any new[i - k] with K being a positive integer.

This way we can avoid having to reallocating giant strings multiple times.

Also, this makes room for huge replace commands that could be a handful tool for some cases.

Having str.replace support multiple targets (but with a single
replacement) has been suggested many times before, and it has always
foundered over the problem of what to do when the targets overlap.

text.replace(('pepper', 'red pepper', 'green pepper'), 'tomato')

The conclusion was always to recommend that if your replacement needs
were more complex than just changing a single substring at a time, you
should move to using regular expressions.

The example you gave can be trivially moved to regular expressions:

text = 'pepper, red pepper, green pepper'
regex = re.compile(r'(pepper|red pepper|green pepper)')

print(regex.sub('tomato', text))

But, if instead, we consider my example.

text.replace(a, b).replace(c, d)
text.replace([a, c], [b, d])

How can this be replaced with regular expressions?

[João Marcos Bezerra]
“How can this be replaced with regular expressions?”

See the docs for re.sub.

The replacement can be a callback function that takes the match object
as argument, so you can decide what to replace it with.

By the way, if you’re going to still argue in favour of changing
str.replace, the other string methods take tuples, not lists, for
multi-valued arguments. This is a common pattern in Python:

string.startwith(('a', 'b'))
isinstance(obj, (int, float))

I think there may be a good case for a helper function in re that
takes an iterable of (original, replacement) pairs and calls re.sub for

re.multireplace([(a, b), (c, d)])
# replace a with b, c with d

but I don’t think the str.replace method should be changed.

Other problem with the original proposition is that when a function accepts lists, it is rarely accepts only lists. It usually accepts arbitrary iterables. But string is an iterable, so str.replace('ab', 'cd') should be equivalent to str.replace(['a', 'b'], ['c', 'd']).

If we solve the ambiguity by choosing other syntax, we will still need to add a lot of complex code. We will need to implement a part of the regular expression engine in the interpreter core. But we already have a dedicated module for regular expressions.

Interesting! A few days ago I wrote an example for Fluent Python 2ed (unpublished) that looks like this:

from typing import Iterable, Tuple

FromTo = Tuple[str, str]

def zip_replace(text: str, changes: Iterable[FromTo], count:int = -1) -> str:
    for from_, to in changes:
        text = text.replace(from_, to, count)
    return text

It’s an example for one of the chapters about type hints in the book. I do my best to avoid foo/bar examples, and this is a nice little function to discuss a type alias and an Iterable argument.

If same set of replaces happen many times, something like Go’s Replacer is better for performance point of view.

It solves the proposed restriction too; " but, b could not capture text from c , that is, old[i] would not capture new[i - 1] replaced text, or from any new[i - k] with K being a positive integer."

I think there are some TRIE implementations in PyPI.

Oops, thanks for notifying!

Here it is:

text.replace((a, c), (b, d))

I don’t think that solving this restriction is a need because wanting c to capture b looks like a very rare usage case, where a replacement is taken on top of another, for this rare occasion you could just concatenate two uses of replace, also because I created this restriction to make the replacement algorithm faster.

I see how a Trie can solve this problem, but the memory usage would be a problem, and linked lists can not make use of the sequential memory caching (complexity is lower, but runtime can be larger).
Maybe a linear sweep similar to the current implementation of replace is a better solution.

The Idea you cited is very different from mine, so let’s not hastily close this, thinking that is the same thing.

If someone is replacing text of a string, it is very likely that it’s being done more than just once.

Let’s compare the available solutions.

Consider we want to do 3 changes to the string text:

text: str = get_text()

changes: List[Tuple[str, str]] = [
	(a, b),
	(c, d),
	(e, f)
] # Imagine those variables are strings

Common solution (same as what @luciano teaches):

for from_, to in changes:
    text = text.replace(from_, to)

RegEx solution (suggested by @steven.daprano and based on the re.sub() documentation):

import re

changes: dict[str, str] = dict(changes)

def callback_repl(matchobj) -> str:
	replacement: Optional[str] = changes.get(, None)
	if replacement is not None:
		return replacement
	raise Exception('The match object don\'t match!')

re.sub(rf'({a}|{c}|{e})', callback_repl, text)

New suggested solution:

text.replace((a, b), (c, d), (e, f))

I can’t complaint about the first solution, it works, the only reason I’m posting this is because I think that the operation of writing multiple replaces is very common and can be optimized.

The second solution is complicated for the simple job it solves, I can see people copying it from StackOverflow haha, jokes aside, the function call adds an unnecessary overhead to the algorithm.

TRIE is the most performant way to multiple keywords in long string.

We use “Boyer-Moore String Search Algorithm” to search single word. Using the algorithm multiple times (e.g. s.replace(k1, r1).replace(k2, r2)) doesn’t solve the proposed restriction. So we can not use the BM algorithm for replacing multiple keywords.

Bulding TRIE consumes some CPU and memory. But if same replacement happens multiple times, it is fast. That’s why Go has Replacer in its stdlib.