Regular Expressions (RE) Module - Search and Match Comparison

onePythonUser · October 26, 2023, 2:40pm

Hello,

I have a question regarding the regular expression compile. I created a code snippet to compare the different search and match results using different strings and using different patterns.

Here is the test code snippet:

import re

s1 = 'bob has a birthday on Feb 25th'
s2 = 'sara has a birthday on March 3rd'
s3 = '12eup 586iu'
s4 = '0turt'
                                     #           '\w\w\w \d\d\w\w'
bday1_re = re.compile('\w+ \d+\w+') # Also tried: '\w+ \d+\w+'
bday2_re = re.compile('\w+ \w+')
digit_re = re.compile(r'\d\d\w+ \d\d')
numbers_re = re.compile('\w{0,10}')

print(bday1_re.search(s1))
print(bday2_re.search(s2))
print(digit_re.search(s3))
print(numbers_re.search(s4))
print('\n')
print(bday1_re.match(s1))
print(bday2_re.match(s2))
print(digit_re.match(s3))
print(numbers_re.match(s4))

After I run the code, I get the following result:

<re.Match object; span=(22, 29), match=‘Feb 25th’>
<re.Match object; span=(0, 8), match=‘sara has’>
<re.Match object; span=(0, 8), match=‘12eup 58’>
<re.Match object; span=(0, 5), match=‘0turt’>

None
<re.Match object; span=(0, 8), match=‘sara has’>
<re.Match object; span=(0, 8), match=‘12eup 58’>
<re.Match object; span=(0, 5), match=‘0turt’>

Does anyone know why for the string s1 the match fails considering there is ‘search’ success? Note that the match for the s1 string, I get a ‘None’ as the outcome.

jamestwebber · October 26, 2023, 2:43pm

We can find the answer in the docs for these functions:

Python offers different primitive operations based on regular expressions:

re.match() checks for a match only at the beginning of the string

re.search() checks for a match anywhere in the string (this is what Perl does by default)

re.fullmatch() checks for entire string to be a match

As you can see in the output (the span attribute of the match object), the first regex doesn’t match at the beginning of the string, so match fails.

onePythonUser · October 26, 2023, 3:02pm

Ahhh, got it. I reworded my s1 string to:

s1 = “Feb 25th is Bob’s birthday”

This now works.

Yes, as the definition clearly states, for 'match., it begins scanning for matches at the
beginning of the string only and not anywhere in the string as the 'search’ function does.

Much obliged sir.

hansgeunsmeyer · October 26, 2023, 3:46pm

@onePythonUser If you are exploring regular expressions, it may be worthwhile to have a look at the regex package (GitHub - mrabarnett/mrab-regex), maintained by someone who is also a regular contributor in this forum (@MRAB). It’s a plug-in replacement for the re module, but is generally faster and has better unicode support (even the official Python docs of re mention this) - that is, support for unicode properties such as different script types. And for instance, it enables defining a regex for compound emoji (what we perceive as one emoji, even though it’s really composed of multiple unicode code points), which is practically impossible to do (or very ugly and inefficient) if you simply use the re module.

barry-scott · October 26, 2023, 3:56pm

I wish the two methods were called startswith and contains.
I still find my self forgetting which of search and match is which.

jamestwebber · October 26, 2023, 3:57pm

I think those names would be confusing too, because they imply a boolean output but these functions return a match or matches.

onePythonUser · October 26, 2023, 4:02pm

@Barry Scott: Yes, I agree. This would be more in line with the ‘fullmatch’ function naming format.

onePythonUser · October 26, 2023, 4:06pm

Will this package be incorporated as part of python?

Is there an abundance of support comparable to the ‘re’ module? Right now
when I do a search for the ‘re’ module, I find a lot of material on this subject,
so it makes it a lot easier to learn. If only one person is maintaining the regex module
on GitHub, it might be a litter harder to transfer over.

Btw, I am still a relative rookie to python, so I really can’t comment on the nuances
of the language and can’t offer a legitimate comparison.

hansgeunsmeyer · October 26, 2023, 4:14pm

I’ve wondered about that too… I guess not. I believe this is mostly up to what @MRAB wants.
If the package would be incorporated, this does mean that development cannot be as agile as it was before so that could be a major disincentive to the developer.
(I’ve also wondered about this for the requests package - which is way more useful than urllib. I believe the main reason there is that the main developer does not want to give up control. Which is his right, of course.)

As to comparison with the re module - everything you can do in re you can also do in regex.
But there are a lot of general things you can do in various regular expression libs that you cannot do in re and that are supported in regex. So, you really cannot go wrong in installing regex as your default regular expression lib.

When a package is only (or mostly) maintained by just one person - that is a danger for future maintenance. But in this case, I’ve no doubt that someone will take this over when @MRAB retires. It’s just too useful and good

onePythonUser · October 26, 2023, 4:24pm

I might have to wait until I learn the fundamentals first (‘re’). Once I feel much
better with the ‘re’ module and python in general, then I will be in a better position
to branch out and explore other modules supporting regular expressions (and
modules that are not supported by python).