Suprising re.Match mutability

I assumed that a re.Match would copy it’s data from source so that if the source changed the match would not:

import re
data = bytearray(b'1234567890')
pattern = re.compile(b'3')
match = pattern.search(data)
del data[0]
print(match.group(0))
$ python example.py # Should print b'3'
b'4'

I’m not saying I’m right or wrong for using re like this. Just that it’s surprising and mutable sources do seem to be supported (and mypy is happy)

2 Likes

I presume thet match references its input, and gathers the indices of group matches.

import re
data = bytearray(b'1234567890')
pattern = re.compile(b'0')
match = pattern.search(data)
print(f"{match.span() = }; {match.string = }; {id(data) = };  {id(match.string) = }")
del data[0]
del data[0]
print(f"{match.span() = }; {match.string = }; {id(data) = };  {id(match.string) = }")
print(match.group(0))

Output

match.span() = (9, 10); match.string = bytearray(b'1234567890'); id(data) = 4571352240;  id(match.string) = 4571352240
match.span() = (9, 10); match.string = bytearray(b'34567890'); id(data) = 4571352240;  id(match.string) = 4571352240
b''
1 Like

Great spot. .groups (and .groupdict with named capturing groups) behave similarly - they must also work on an underlying index. This isn’t surprising - re.Match provides methods returning indices: .start, .end, .span.

My guess that this is a performance feature that avoid the overhead of copies.

1 Like

I’m thinking it’s more that the various regular expression modules in Python over the years were written when all their candidate inputs were immutable. I don’t know when the mutable bytearray object was added, but I’m pretty sure it was well after regular expressions.

So yes, an optimization, but maybe made when all searchable types were immutable.

1 Like

I’m thinking either;

  1. Current situation is fine and I just need to get good.

  2. Add something to the docs that says you can’t mutate the data until you’re done with the match objects.

  3. When the source is mutable, copy the match data.

I’m not qualified to know which is best :slight_smile:

Assuming this is intended behaviour, it’s worth adding to the re docs. Just so people are aware of the potential footgun, and those that bother to read them or do a search have something definitive to refer to. They’re far from concise currently, a little foot note won’t hurt.

2 Likes