Syntax of the method:str.replace(old, new, count=-1)
What if replace could accept two lists instead of two strings.
Often I find in code:
text.replace(a, b).replace(c, d)
The concatenation of replace calls can cause a unnecessary performance hit if the string is large or the call is the calls are made too many times.
text.replace([a, c], [b, d])
That would behave like (text.replace(a, b).replace(c, d)) but, b could not capture text from c, that is, old[i] would not capture new[i - 1] replaced text, or from any new[i - k] with K being a positive integer.
This way we can avoid having to reallocating giant strings multiple times.
Also, this makes room for huge replace commands that could be a handful tool for some cases.
Other problem with the original proposition is that when a function accepts lists, it is rarely accepts only lists. It usually accepts arbitrary iterables. But string is an iterable, so str.replace('ab', 'cd') should be equivalent to str.replace(['a', 'b'], ['c', 'd']).
If we solve the ambiguity by choosing other syntax, we will still need to add a lot of complex code. We will need to implement a part of the regular expression engine in the interpreter core. But we already have a dedicated module for regular expressions.
Interesting! A few days ago I wrote an example for Fluent Python 2ed (unpublished) that looks like this:
from typing import Iterable, Tuple
FromTo = Tuple[str, str]
def zip_replace(text: str, changes: Iterable[FromTo], count:int = -1) -> str:
for from_, to in changes:
text = text.replace(from_, to, count)
It’s an example for one of the chapters about type hints in the book. I do my best to avoid foo/bar examples, and this is a nice little function to discuss a type alias and an Iterable argument.
I don’t think that solving this restriction is a need because wanting c to capture b looks like a very rare usage case, where a replacement is taken on top of another, for this rare occasion you could just concatenate two uses of replace, also because I created this restriction to make the replacement algorithm faster.
I see how a Trie can solve this problem, but the memory usage would be a problem, and linked lists can not make use of the sequential memory caching (complexity is lower, but runtime can be larger).
Maybe a linear sweep similar to the current implementation of replace is a better solution.
TRIE is the most performant way to multiple keywords in long string.
We use “Boyer-Moore String Search Algorithm” to search single word. Using the algorithm multiple times (e.g. s.replace(k1, r1).replace(k2, r2)) doesn’t solve the proposed restriction. So we can not use the BM algorithm for replacing multiple keywords.
Bulding TRIE consumes some CPU and memory. But if same replacement happens multiple times, it is fast. That’s why Go has Replacer in its stdlib.