The main reason is adding the substring if it is not present. But the
exact content of line is not known. The sample file:
First line should not match because it isn’t in right format. Second line should match with text “some” stored somewhere. Third line already contains substring “dont_match” so it should not match.
Thank you for these examples.
But it match with third line
This is because you’re not thinking about the way patterns match
correctly. Patterns backtrack: they try for the longest sequence which
will match your pattern, but if they don’t match at the longest extend
they will try a shorter extent which is still valid.
To take your third misbehaving example, your pattern doesn’t match:
but it does match:
by letting the
.*? match this text:
which lets it “not match” the
match" against the text
(Actually, you’ve got a nongreedy pad
.*?, so in fact it will match an
empty string right after your double quote, because the text
some dont_match is not a match for
What I think you want to say is: do not match if the string
occurs anywhere later in the text. So the pattern you want to “not
match” looks like:
Note the leading
.* to allow that “not match” pattern to span to any
point in the text.
So I moved your
.*? to inside the
for line in r'''
print(" ", re.search('^(sample=".*?(?!match"))', line))
print(" ", re.search('^(sample="(?!.*?match"))', line))
Personally I don’t use the nongreedy forms very often, and tend to write
.* instead of
.*?; I find the greedy forms easier to think about.
BTW, I recommend always using “raw strings” to express your regexps, eg:
As soon as you’ve got a slosh-escape in your regexp they pay off.
Cameron Simpson firstname.lastname@example.org