Rationale for re.search, re.match, and re.fullmatch

smammy · September 15, 2023, 5:25pm

Just out of curiosity, does anyone know why re.match and re.fullmatch exist, when you can always use re.search instead? For example:

re.match('ad', 'bad fad') ⇒ re.search('^ad', 'bad fad')

re.fullmatch('ad', 'ada sad') ⇒ re.search('^ad$', 'ada sad')

I could see them being helpful if your code is getting a pattern passed in it and you don’t want to have deal with adding ^ and $ assertions to it. Even so, the names are ambiguous enough that there’s explanatory text about it in the docs. And there’s no function to match characters at the end of the string!

(Not that it matters, but personally I would’ve gone with match (to match anywhere), startmatch, endmatch, and fullmatch.)

Thanks for any insight y’all can provide.

pochmann · September 15, 2023, 5:40pm

The fullmatch doc says new in Python 3.4. Google “new in Python 3.4”, look for fullmatch. Has brief reasoning and link to discussion.

(just in case google doesn’t show it as top result)

MRAB · September 15, 2023, 7:28pm

Without match, you’d use \A to anchor at the start of the string when using the MULTILINE flag, ^ will match at the start of each line. Additionally, you might have to wrap the pattern in (?:...).

Before fullmatch was introduced, you’d have to put \Z at the end too.

And as for wanting to match from the end, not only is that far less common, it would also require matching backwards. It would let you have a variable-width lookbehind, but it’s not a trivial addition.

hansgeunsmeyer · September 15, 2023, 8:10pm

Not quite sure what value that would have (or exactly what you mean by this).
You can certainly write patterns to match substrings at the end of a string. The regex library (which is quite a bit faster than re and has far better unicode support) also has a reverse search flag that searches backwards, from the end of the string:

>>> regex.findall(r"..", "abcde")
['ab', 'cd']
>>> regex.findall(r"(?r)..", "abcde")
['de', 'bc']

Matching “from the end” is actually not hard to implement, I think: Simply read the target string in reverse, and try to match the reverse of the (sub)pattern(s)? (Perhaps not so trivial, if there are look-aheads/looks-behinds in the pattern? Are those the only trouble makers?)

hansgeunsmeyer · September 15, 2023, 9:52pm

@MRAB Could you elaborate on that? Why is this not trivial? I was (am) also assuming that you are doing this in the regex lib with the REVERSE flag - is that assumption wrong?

MRAB · September 15, 2023, 10:17pm

Well, in a sense it is trivial, but it’s a lot more code to add to handle matching backwards as well as forwards.

kknechtel · September 15, 2023, 10:35pm

At least until 3.8, re.search has to iterate over the string and try the match at every position, even if it immediately fails each time (due to the anchor). This makes it take O(N) time instead of O(1) for a trivial match:

$ python -m timeit --s "import re; haystack='bad fad'*1000" -- "re.match('ad', haystack)"
500000 loops, best of 5: 540 nsec per loop
$ python -m timeit --s "import re; haystack='bad fad'*1000" -- "re.search('^ad', haystack)"
5000 loops, best of 5: 61.1 usec per loop
$ python -m timeit --s "import re; haystack='bad fad'*1000" -- "re.search('\Aad', haystack)"
5000 loops, best of 5: 53.6 usec per loop

This seems to be fixed in or before 3.11:

$ python3.11 -m timeit --s "import re; haystack='bad fad'*1000" -- "re.match('ad', haystack)"
500000 loops, best of 5: 436 nsec per loop
$ python3.11 -m timeit --s "import re; haystack='bad fad'*1000" -- "re.search('^ad', haystack)"
500000 loops, best of 5: 458 nsec per loop
$ python3.11 -m timeit --s "import re; haystack='bad fad'*1000" -- "re.search('\Aad', haystack)"
500000 loops, best of 5: 462 nsec per loop

More to the point, .match allows for expressing intent more directly and clearly.

Topic		Replies	Views
Is this the correct way to document regex use Python Help	6	358	August 24, 2022
Help with regex search Python Help	2	325	October 5, 2021
Using {} regex within re.sub when using f-string Python Help	2	200	March 20, 2024
Module re: add possibility to return matches during substitution Python Help	5	204	April 12, 2024
Supporting len() for re.Match objects Ideas	4	1138	February 17, 2023

Rationale for re.search, re.match, and re.fullmatch

Related Topics