Regex, re.sub, How to make same replacement in all instances on same line

I have Python 3.11.9 on Windows 10.

I have many lines in a text file that look like this:

<prices> <price grade="c">700</price> <price grade="d">705</price> <price grade="f">710</price> <price grade="h">715</price> <price grade="i">720</price> <price grade="j">725</price> <price grade="k">726</price> <price grade="l">730</price> <price grade="m">740</price> <price grade="o">745</price> <price grade="p">750</price> <price grade="q">755</price></prices>

I want to prefix all <price with a CRLF or \n. They should end up like this:

<prices> 
<price grade="c">700</price> 
<price grade="d">705</price> 
<price grade="f">710</price> 
<price grade="h">715</price> 
<price grade="i">720</price> 
<price grade="j">725</price> 
<price grade="k">726</price> 
<price grade="l">730</price> 
<price grade="m">740</price> 
<price grade="o">745</price> 
<price grade="p">750</price> 
<price grade="q">755</price></prices>

ChatGPT says re.sub replaces all instances by default, and there is no re.REPLACEALL flag.

In my regex search I have my capture parenthesis. My code is this:

import re
xpptext = '<prices> <price grade="c">700</price> <price grade="d">705</price> <price grade="f">710</price> <price grade="h">715</price> <price grade="i">720</price> <price grade="j">725</price> <price grade="k">726</price> <price grade="l">730</price> <price grade="m">740</price> <price grade="o">745</price> <price grade="p">750</price> <price grade="q">755</price></prices>'
xpptext = re.sub(r'(<price )', '\n\1', xpptext, re.IGNORECASE)
print(xpptext)

Both in my local Python and Attempt This Online I get this which is not right:

<prices> 
grade="c">700</price> 
grade="d">705</price> <price grade="f">710</price> <price grade="h">715</price> <price grade="i">720</price> <price grade="j">725</price> <price grade="k">726</price> <price grade="l">730</price> <price grade="m">740</price> <price grade="o">745</price> <price grade="p">750</price> <price grade="q">755</price></prices>

Note the first two <price are not prefixed with \n but are replaced with \n even though I have my first capture group in there of \1 (number one digit).

What am I doing wrong here? ATO link, I hope it will work for you.

Thank you.

Does a normal string replace encounter false positives?

.replace('</price> <price ','</price> \n<price ')
1 Like
  • you need to make the second argument of re.sub a raw steing like the first one, otherwise \1 is being interpreted by the python parser as an octal escape sequence.
  • the fourth argument to re.sub is not the flags, but the maximum number of replacements. Use keyword arguments for flags instead.
1 Like

flags isn’t needed at all AFAICT.

>>> print(re.sub(r"(<price )", r"\n\1", xpptext))
<prices> 
<price grade="c">700</price> 
<price grade="d">705</price> 
<price grade="f">710</price> 
<price grade="h">715</price> 
<price grade="i">720</price> 
<price grade="j">725</price> 
<price grade="k">726</price> 
<price grade="l">730</price> 
<price grade="m">740</price> 
<price grade="o">745</price> 
<price grade="p">750</price> 
<price grade="q">755</price></prices>
1 Like

Thank you all. I got it working with this:

    xpptext = re.sub(r"(<price )", r"\n\1", xpptext)

I didn’t realize the regex needed to be a raw string as well. Also I removed the flag as it wasn’t needed.