Rationale for re.search, re.match, and re.fullmatch

Just out of curiosity, does anyone know why re.match and re.fullmatch exist, when you can always use re.search instead? For example:

re.match('ad', 'bad fad')re.search('^ad', 'bad fad')

re.fullmatch('ad', 'ada sad')re.search('^ad$', 'ada sad')

I could see them being helpful if your code is getting a pattern passed in it and you don’t want to have deal with adding ^ and $ assertions to it. Even so, the names are ambiguous enough that there’s explanatory text about it in the docs. And there’s no function to match characters at the end of the string!

(Not that it matters, but personally I would’ve gone with match (to match anywhere), startmatch, endmatch, and fullmatch.)

Thanks for any insight y’all can provide.

The fullmatch doc says new in Python 3.4. Google “new in Python 3.4”, look for fullmatch. Has brief reasoning and link to discussion.

(just in case google doesn’t show it as top result)

1 Like

Without match, you’d use \A to anchor at the start of the string when using the MULTILINE flag, ^ will match at the start of each line. Additionally, you might have to wrap the pattern in (?:...).

Before fullmatch was introduced, you’d have to put \Z at the end too.

And as for wanting to match from the end, not only is that far less common, it would also require matching backwards. It would let you have a variable-width lookbehind, but it’s not a trivial addition.

1 Like

Not quite sure what value that would have (or exactly what you mean by this).
You can certainly write patterns to match substrings at the end of a string. The regex library (which is quite a bit faster than re and has far better unicode support) also has a reverse search flag that searches backwards, from the end of the string:

>>> regex.findall(r"..", "abcde")
['ab', 'cd']
>>> regex.findall(r"(?r)..", "abcde")
['de', 'bc']

Matching “from the end” is actually not hard to implement, I think: Simply read the target string in reverse, and try to match the reverse of the (sub)pattern(s)? (Perhaps not so trivial, if there are look-aheads/looks-behinds in the pattern? Are those the only trouble makers?)

@MRAB Could you elaborate on that? Why is this not trivial? I was (am) also assuming that you are doing this in the regex lib with the REVERSE flag - is that assumption wrong?

Well, in a sense it is trivial, but it’s a lot more code to add to handle matching backwards as well as forwards.

At least until 3.8, re.search has to iterate over the string and try the match at every position, even if it immediately fails each time (due to the anchor). This makes it take O(N) time instead of O(1) for a trivial match:

$ python -m timeit --s "import re; haystack='bad fad'*1000" -- "re.match('ad', haystack)"
500000 loops, best of 5: 540 nsec per loop
$ python -m timeit --s "import re; haystack='bad fad'*1000" -- "re.search('^ad', haystack)"
5000 loops, best of 5: 61.1 usec per loop
$ python -m timeit --s "import re; haystack='bad fad'*1000" -- "re.search('\Aad', haystack)"
5000 loops, best of 5: 53.6 usec per loop

This seems to be fixed in or before 3.11:

$ python3.11 -m timeit --s "import re; haystack='bad fad'*1000" -- "re.match('ad', haystack)"
500000 loops, best of 5: 436 nsec per loop
$ python3.11 -m timeit --s "import re; haystack='bad fad'*1000" -- "re.search('^ad', haystack)"
500000 loops, best of 5: 458 nsec per loop
$ python3.11 -m timeit --s "import re; haystack='bad fad'*1000" -- "re.search('\Aad', haystack)"
500000 loops, best of 5: 462 nsec per loop

More to the point, .match allows for expressing intent more directly and clearly.