Str.replace of a set of characters

I would like str.replace, when given a set of characters, to replace occurrence of any of the characters in the set. I.e. 'ASDFGH'.replace(set('SFH'), '') == 'ADG' should then hold.
It would mean I would not have to go to the trouble of using the re module for a common string operation.

By extension, a set of multi-character strings should have any occurrences of the strings replaced, (in an indeterminate order, so watch out for replacement sets of strings were one string is a sub-string of another).

Whodjafink?

That’s what str.translate() already can do:

value = 'ASDFGH'
dropset = 'SFH'
dropmap = dict.fromkeys(map(ord, dropset))
print(value.translate(dropmap))

The dict.fromkeys() / ord() dance is used because the str.translate() method takes a map from integer codepoint to replacement value (and None means “remove”). str.translate() is fast and efficient and more powerful than simply as a character removal tool, hence the specific input requirement. It’s existence does mean there is no need to complicate the str.replace() API, however.

To make the tool easier to use there is a helper function: str.maketrans(), which can either transform a dictionary with single characters as keys or two or three strings into a suitable map for str.translate(). So this works too:

dropmap = str.maketrans(dict.fromkeys(dropset))

or

dropmap = str.maketrans('', '', dropset)

This last version requires dropset to be a string, while dict.fromkeys() will take any iterable.

1 Like

Thanks Martijn for your reply, it’s appreciated :slight_smile:

Unfortunately I am “cursed” by knowing regular expressions and am much more likely to use re.sub(r'[SFH]', '', 'ASDFGH'). I do prefer to use str methods over re but your use of translate seems overkill for my usual use cases where re is fast enough but I am trying to limit it’s use; (regular expressions are the way to go in scripting languages like Perl and Awk).

On my suggestion complicating str.replace: Yes it would. sets of single characters I think would be straight-forward to learn; the complications of multiple sub-strings - less so.

You don’t need the dict.fromkeys/ord dance, that’s what the

maketrans method is for:

> "python".translate(str.maketrans('tp', 'τπ'))

'πyτhon'

I did introduce str.maketrans() too, in the next paragraph :slight_smile:

1 Like

Thanks Steven,

So in summary:

  • I asked for: 'ASDFGH'.replace(set('SFH'), '')
  • After using: re.sub(r'[SFH]', '', 'ASDFGH')
  • There exists: 'ASDFGH'.translate(str.maketrans('', '', 'SFH'))

I know which way this is going :grin: but thought to bring it up as I often parse text and am used to using the other str methods then having the dissonance of needing re. I’ll have to give this new (to me), use of str.translate a go and see if I can get it to “flow”.

Ahah, just read the docs on str.translate: I would have read of its seemingly intended use case of charcter replacement and use in the codecs, ran a few examples, then quietly forgot about it as the “replacement” thoughts drowned out any connection between my task and “translation”.

Thanks again Martijn, Steve.