The main reason is adding the substring if it is not present. But the
exact content of line is not known. The sample file:
some_unneeded_format
sample="some"
sample="some dont_match"
First line should not match because it isn’t in right format. Second line should match with text “some” stored somewhere. Third line already contains substring “dont_match” so it should not match.
Thank you for these examples.
Last regexp:
print(re.search('^(sample=".*?(?!match"))', line))
But it match with third line
This is because you’re not thinking about the way patterns match
correctly. Patterns backtrack: they try for the longest sequence which
will match your pattern, but if they don’t match at the longest extend
they will try a shorter extent which is still valid.
To take your third misbehaving example, your pattern doesn’t match:
sample="some dont_match"
but it does match:
sample="some dont_match
by letting the .*?
match this text:
some dont
which lets it “not match” the match"
against the text _match
.
(Actually, you’ve got a nongreedy pad .*?
, so in fact it will match an
empty string right after your double quote, because the text some dont_match
is not a match for match"
.
What I think you want to say is: do not match if the string match"
occurs anywhere later in the text. So the pattern you want to “not
match” looks like:
.*match"
Note the leading .*
to allow that “not match” pattern to span to any
point in the text.
So I moved your .*?
to inside the (?!....)
section:
import re
for line in r'''
some_unneeded_format
sample="some"
sample="some dont_match"'''.split("\n"):
print(line)
print(" ", re.search('^(sample=".*?(?!match"))', line))
print(" ", re.search('^(sample="(?!.*?match"))', line))
Personally I don’t use the nongreedy forms very often, and tend to write
.*
instead of .*?
; I find the greedy forms easier to think about.
BTW, I recommend always using “raw strings” to express your regexps, eg:
r'^(sample="(?!.*?match")'
As soon as you’ve got a slosh-escape in your regexp they pay off.
Cheers,
Cameron Simpson cs@cskk.id.au