Question about (?!... regexp

I can’t understand how it works. For example:

on file:

sample
sample="some dont_match"

the code:

#!/usr/bin/python3

import re

file1=open('file','r')
lines=file1.readlines()

for line in lines:
    print(re.match('^(sample="(.*?)(?!match"))', line))  #***

gives matching. But if the line (***) change on

    print(re.match('^(sample="some dont_(?!match"))', line))  #***

it works as planned
Is there any way give a group and don’t match if some text is included?

.*? is a lazy match. It’ll initially ‘consume’ 0 characters, so you’ll be comparing (?!match") against some dont_match", which will succeed.

I think what you want is ^(sample="(?!.*?match").*?"). This looks ahead for match" and then fails if it was successful.

No, I need a group before pattern for disable. The group will used later

  1. Named groups must be written this way:

(?P<name>...)

Regular Expression HOWTO — Python 3.12.1 documentation

  1. (?!...)

Negative lookahead assertion. This is the opposite of the positive assertion; it succeeds if the contained expression doesn’t match at the current position in the string.
Regular Expression HOWTO — Python 3.12.1 documentation

so I suppose it’s not what you need.

Maybe you’re looking for:

>>> re.match(
...     r'^(?P<sample>some dont_(?:match))', 
...     "some dont_match"
... )
<re.Match object; span=(0, 15), match='some dont_match'>
  1. I don’t necessary need the named group
  2. Actually I need content of group if it is not followed by text. The question is why it doesn’t work with quantifiers?

My mistake, the file for test should be:

garbage
sample="some do"
sample="some dont_match"

On first and last line the script should not match, on the second it should

re.match(
    r'^(?P<sample>some do(?!match))', 
    "some do"
)

I hope it’s not homework :smiley:

No )) pet project

Sorry for my persistence, but is any way to use wildcard with negative lookahead?

Can you provide an updated example of what you’re trying to match, and
example which should not match, and what you’re trying as a regexp?

Cheers,
Cameron Simpson cs@cskk.id.au

The main reason is adding the substring if it is not present. But the exact content of line is not known. The sample file:

some_unneeded_format
sample="some"
sample="some dont_match"

First line should not match because it isn’t in right format. Second line should match with text “some” stored somewhere. Third line already contains substring “dont_match” so it should not match.

Last regexp:

print(re.search('^(sample=".*?(?!match"))', line))

But it match with third line

Thanks,
Alexander

Instead of .*?, try [^"]*+, available in Python 3.11 or with the Pypi extension regex:

The main reason is adding the substring if it is not present. But the
exact content of line is not known. The sample file:

some_unneeded_format
sample="some"
sample="some dont_match"

First line should not match because it isn’t in right format. Second line should match with text “some” stored somewhere. Third line already contains substring “dont_match” so it should not match.

Thank you for these examples.

Last regexp:

print(re.search('^(sample=".*?(?!match"))', line))

But it match with third line

This is because you’re not thinking about the way patterns match
correctly. Patterns backtrack: they try for the longest sequence which
will match your pattern, but if they don’t match at the longest extend
they will try a shorter extent which is still valid.

To take your third misbehaving example, your pattern doesn’t match:

 sample="some dont_match"

but it does match:

 sample="some dont_match

by letting the .*? match this text:

 some dont

which lets it “not match” the match" against the text _match.

(Actually, you’ve got a nongreedy pad .*?, so in fact it will match an
empty string right after your double quote, because the text some dont_match is not a match for match".

What I think you want to say is: do not match if the string match"
occurs anywhere later in the text. So the pattern you want to “not
match” looks like:

 .*match"

Note the leading .* to allow that “not match” pattern to span to any
point in the text.

So I moved your .*? to inside the (?!....) section:

 import re
 for line in r'''
 some_unneeded_format
 sample="some"
 sample="some dont_match"'''.split("\n"):
     print(line)
     print(" ", re.search('^(sample=".*?(?!match"))', line))
     print(" ", re.search('^(sample="(?!.*?match"))', line))

Personally I don’t use the nongreedy forms very often, and tend to write
.* instead of .*?; I find the greedy forms easier to think about.

BTW, I recommend always using “raw strings” to express your regexps, eg:

 r'^(sample="(?!.*?match")'

As soon as you’ve got a slosh-escape in your regexp they pay off.

Cheers,
Cameron Simpson cs@cskk.id.au

Thanks for solution and sorry for late answer

I’m trying realize this in project, maybe questions will arise

There are some issues with this solution. I need the string before pattern. If I use:

print(" ", re.search('^sample="(?!.*?match"))', line)

and try to reference by \1 there is an error ‘invalid group reference’ arise. If I use:

print(" ", re.search('^(sample="(?!.*?match"))', line))

the group reference doesn’t contain the text before negative match (“some dont_” in this case)

Solution:

https://stackoverflow.com/questions/53032368/adding-a-parameter-to-a-line-with-ansible-with-lineinfile-function-and-regexp