A Regular Expression Problem

>>> pattern = re.compile('\w+(?!\.com)')
>>> result = pattern.search('www_example123.com')
>>> print(result.group())

I can’t understand why the result is like this?

I hope someone can tell me the specific implementation process


That result is returned because the regex of (?!\.com) is a negative lookahead, which makes sure whatever matches w+ is not the “.com”. Regex looks ahead after each possible \w+ match. WHen regex looks at www_example123 if would not find a match because .com is immediately after the w+ match… so what happens is the regex stops at www_example12 which leaves 3 between the match for w+ and the negative lookahead used (?!\.com).

If you want to capture the full string www_example123, you could change the regex to use a positive lookahead, so that whatever is after w+ must be “.com”… something like: r'\w+(?=\.com) (notice the equals sign).

If you haven’t already, you could check out sites like https://regex101.com/ and play around with different regex expressions and target strings to get the intuition for a particular search.

The regular expression means “one or more word characters, that are not followed by .com”.

So we cannot match www_example123, because it is followed by .com. We have to backtrack one more character from there.

You may try to use possessive quantifier


or atomic group


Another solution might be to use a capture group: (\w+)\.com