Feature Proposal: Multi-String Replacement Using a Dictionary in the .replace() Method

HicaroD · October 21, 2024, 12:13am

Summary: I would like to propose an extension to the .replace() method to allow multiple substring replacements in a string using a dictionary. Currently, .replace() accepts only two arguments (the value to be replaced and the replacement value), which results in the need for multiple calls to replace different characters or words. With this new functionality, it would be possible to perform all replacements in a single call, making the code more concise.

Examples

Current Situation: To replace multiple substrings in a string, we need to make multiple calls to .replace():

a = "hello"
a = a.replace("l", "X").replace("h", "X")
print(a)  # Output: XeXXo

Proposed New Usage: I would like to suggest a new way to use .replace() that allows passing a dictionary, where the keys are the values to be replaced, and the values are the new values:

a = "hello"
a = a.replace({"l": "X", "h": "X"})
print(a)  # Output: XeXXo

Alternative Suggestion

If modifying the .replace() method is not feasible, it could be considered to add a new method, such as .replace_multiple(), which would work specifically for this new feature of multiple replacements using a dictionary. This would avoid any conflict with the current usage of the .replace() method.

Conclusion

I believe this feature could enhance the developer experience when working with strings in Python, simplifying the code and making it more efficient. I’m open to feedback and suggestions to refine this idea, especially about the performance drawbacks, replacement conflicts and more!

Rosuav · October 21, 2024, 12:44am

If this were merely about making it more concise, there’s not all that much benefit, but there’s another aspect of this that’s worth noting: enforced single replacement. Consider a “swap” operation, here demonstrated with Pike which has this exact feature:

> replace("abcdefg", (["ab": "de", "de": "ab"]));
(1) Result: "decabfg"

Implemented with two separate replace() calls, this will end up making them the same (either “decdefg” or “abcabfg” depending on the order of replacements). Having a single dictionary to define the changes will allow this sort of thing to be done without reaching for a regular expression, with all the consequences of having to use regex.

blhsing · October 21, 2024, 1:09am

Note that the status quo is to use either re.sub with an ad-hoc replacement function that maps the match to a replacement string, which is clunky to use and requires escaping special characters in the input, or str.translate, which supports only single-character translations.

This proposal will remedy those downsides.

jamestwebber · October 21, 2024, 1:14am

In theory I don’t love the impact on the method signature. The current signature is

str.replace(old: str, new: str, count: int = -1)

And this would make it something like

str.replace(
    old_or_dict: str | dict[str, str],
    new: str | None,  # or some other sentinel value
    count: int = -1
)

Instead, there could be a new keyword-only argument for the translation dictionary, but I suppose this would still require defaults for old and new and some mutual-exclusion logic in case somebody passed in everything.

Another option is that this is a lot more like str.translate than str.replace, although translate has an idiosyncratic table input^[1] and only allows single characters. Adding a new version with this interface might be nicer.

it exists for a reason, but it sticks out as less ergonomic ↩︎

blhsing · October 21, 2024, 1:16am

James Webber:

In theory I don’t love the impact on the method signature. The current signature is

str.replace(old: str, new: str, count: int = -1)

And this would make it something like
str.replace(
    old_or_dict: str | dict[str, str],
    new: str | None,  # or some other sentinel value
    count: int = -1
)

The new signature would be much better written with typing.overload instead.

jamestwebber · October 21, 2024, 1:18am

That’s true, but honestly it’s not writing the signature that is the issue. I think that signatures like this are harder to learn and keep track of, in general.

blhsing · October 21, 2024, 1:24am

It depends on the nature of the function. If different signatures do completely different things, like those of type and iter, then I agree with you that it is better to make them separate functions. In cases like this and functions like max where different signatures do practically the same thing but take different types of input simply for convenience it actually makes these functions more intuitive to use and easier to keep track of.

jamestwebber · October 21, 2024, 1:31am

Hence why I was equivocating in my initial reply. If this had always existed in Python I’d probably be fine with it.

ncoghlan · October 21, 2024, 1:56am

By analogy with .format and .format_map, this could be called .replace_map.

Historically, this couldn’t be done because the order of application in the presence of overlapping patterns (or patterns that included later patterns in their output) would have been unpredictable when passing a built-in dict.

These days, dicts are insertion ordered, so the method can safely be defined as equivalent to:

modified = original
for k, v in replacements.items():
    modified = modified.replace(k,v)

blhsing · October 21, 2024, 2:00am

I personally would’ve slightly preferred .format to take a mapping with a second signature but yeah a separate function/method is just fine in the end of day.

Thanks for the historical insight. The order does matter when keys in the input are inclusive of another. For example, with {'a': 'b', 'ab': 'c'}, 'abc' would be become 'bbc', but with {'ab': 'c', 'a': 'b'} it would become 'cc'.

Alyssa Coghlan:

These days, dicts are insertion ordered, so the method can safely be defined as equivalent to:
modified = original
for k, v in replacements.items():
    modified = modified.replace(k,v)

This code isn’t quite equivalent to what is proposed because it wouldn’t support swapping as @Rosuav mentioned.

ncoghlan · October 21, 2024, 2:18am

With the iteration based definition, swapping is a three step operation:

target pattern → placeholder
source pattern → target pattern
placeholder → source pattern

The alternative would be to define .replace_map in terms of .format:

pattern = original.replace("{", "{{").replace("}","}}")
for idx, k in replacements:
    escaped = k.replace("{", "{{").replace("}","}}")
    pattern.replace(escaped, f"{idx}")
result = pattern.format(*replacement.values())

Either option would be a useful addition, but I agree the version that inherently supports swapping is more interesting.

blhsing · October 21, 2024, 2:29am

Alyssa Coghlan:

The alternative would be to define .replace_map in terms of .format:

pattern = original.replace("{", "{{").replace("}","}}")
for idx, k in replacements:
    escaped = k.replace("{", "{{").replace("}","}}")
    pattern.replace(escaped, f"{idx}")
result = pattern.format(*replacement.values())

Never thought of this approach before. Cool use of .format indeed.

jamestwebber · October 21, 2024, 2:40am

There are different options to specify the behavior here, I don’t know which one is obviously best but it would have to be spelled out.

One is “ordering of the mapping” as you describe another option is to order the keys in some way to pick the best match: maybe you want the sort the keys, or pick the longest match. Depending on the chosen behavior "aab".replace({"a": "A", "ab": "B", "aa": "C"}) could yield AAb or AB or Cb.

blhsing · October 21, 2024, 2:41am

I think the function should always follow the insertion order of the keys, and let the caller sort the dictionary in the way the caller intends.

Nineteendo · October 21, 2024, 5:15am

This has probably been suggested before. Could you check what was said previously on the mailing list and other threads on Discourse?

gcewing · October 21, 2024, 5:25am

I too would like to see this defined as a single-pass operation, with replaced text not being re-scanned for matches.

Liz · October 21, 2024, 10:56am

The other alternative here is to allow choosing a match priority of leftmost or leftmost longest in the target, rather than basing the order on the replacement map. Either of these options open up more efficient implementations for the underlying search and replace, while still allowing ordered replacement by calling multiple times only when this is necessary, making the general case predictable and faster.

HicaroD · October 21, 2024, 11:45pm

I like the idea of having a new method called replace_map(...) to implement this feature, as @ncoghlan proposed. In my humble opinion, using typing.overload would be nice if the replace(...) method was as simple as max(...), however, it is not. I don’t think people would think of using a map as one of the arguments of replace.

replace_map would solve this clarity issue because we are explicitly declaring the type in the method signature, just like format_map does.

ncoghlan · October 22, 2024, 1:18am

The advantage of using the iteration order is that it lets the caller explicitly control the priority order without having to come up with names for the different possibilities.

The downside is that the implementation might end up being slower, either because it always used a fully general pattern or because it is checking the replacement pattern order to see if it is ordered by length (whether ascending or descending).

Attempting to express the operation in terms of re — Regular expression operations — Python 3.13.0 documentation really highlights the need to make the intended semantics clear to avoid overly constraining the underlying implementation algorithm:

combine the escaped patterns into a regex “or” pattern
assemble a string list consisting of the string segments between matches, and the target strings for the matched patterns
join the results

From re — Regular expression operations — Python 3.13.0 documentation, that would prioritise the patterns in iteration order due to the way | is defined. (Tangent: if we wanted to initially implement it that way, re.sub_map could be a decent spelling, and then the str.replace_map idea could be proposed later as a way of doing the same thing without the generalised re engine overhead)

Ordering a dict by key length is a bit messy, though:

dict_by_key_length = dict(sorted(patterns.items(), key=(lamba x: len(x[0]))))

So by_length and reversed boolean flags may be worth including.

blhsing · October 22, 2024, 1:29am

Agreed that common orderings can be offers as flags, or the function would follow the iteration order as a default.