Regex, re.sub, How to make same replacement in all instances on same line

c-rob · February 3, 2025, 11:57am

I have Python 3.11.9 on Windows 10.

I have many lines in a text file that look like this:

<prices> <price grade="c">700</price> <price grade="d">705</price> <price grade="f">710</price> <price grade="h">715</price> <price grade="i">720</price> <price grade="j">725</price> <price grade="k">726</price> <price grade="l">730</price> <price grade="m">740</price> <price grade="o">745</price> <price grade="p">750</price> <price grade="q">755</price></prices>

I want to prefix all <price with a CRLF or \n. They should end up like this:

<prices> 
<price grade="c">700</price> 
<price grade="d">705</price> 
<price grade="f">710</price> 
<price grade="h">715</price> 
<price grade="i">720</price> 
<price grade="j">725</price> 
<price grade="k">726</price> 
<price grade="l">730</price> 
<price grade="m">740</price> 
<price grade="o">745</price> 
<price grade="p">750</price> 
<price grade="q">755</price></prices>

ChatGPT says re.sub replaces all instances by default, and there is no re.REPLACEALL flag.

In my regex search I have my capture parenthesis. My code is this:

import re
xpptext = '<prices> <price grade="c">700</price> <price grade="d">705</price> <price grade="f">710</price> <price grade="h">715</price> <price grade="i">720</price> <price grade="j">725</price> <price grade="k">726</price> <price grade="l">730</price> <price grade="m">740</price> <price grade="o">745</price> <price grade="p">750</price> <price grade="q">755</price></prices>'
xpptext = re.sub(r'(<price )', '\n\1', xpptext, re.IGNORECASE)
print(xpptext)

Both in my local Python and Attempt This Online I get this which is not right:

<prices> 
grade="c">700</price> 
grade="d">705</price> <price grade="f">710</price> <price grade="h">715</price> <price grade="i">720</price> <price grade="j">725</price> <price grade="k">726</price> <price grade="l">730</price> <price grade="m">740</price> <price grade="o">745</price> <price grade="p">750</price> <price grade="q">755</price></prices>

Note the first two <price are not prefixed with \n but are replaced with \n even though I have my first capture group in there of \1 (number one digit).

What am I doing wrong here? ATO link, I hope it will work for you.

Thank you.

JamesParrott · February 3, 2025, 12:02pm

Does a normal string replace encounter false positives?

.replace('</price> <price ','</price> \n<price ')

MegaIng · February 3, 2025, 12:04pm

you need to make the second argument of re.sub a raw steing like the first one, otherwise \1 is being interpreted by the python parser as an octal escape sequence.
the fourth argument to re.sub is not the flags, but the maximum number of replacements. Use keyword arguments for flags instead.

abessman · February 3, 2025, 12:09pm

flags isn’t needed at all AFAICT.

>>> print(re.sub(r"(<price )", r"\n\1", xpptext))
<prices> 
<price grade="c">700</price> 
<price grade="d">705</price> 
<price grade="f">710</price> 
<price grade="h">715</price> 
<price grade="i">720</price> 
<price grade="j">725</price> 
<price grade="k">726</price> 
<price grade="l">730</price> 
<price grade="m">740</price> 
<price grade="o">745</price> 
<price grade="p">750</price> 
<price grade="q">755</price></prices>

c-rob · February 3, 2025, 12:19pm

Thank you all. I got it working with this:

    xpptext = re.sub(r"(<price )", r"\n\1", xpptext)

I didn’t realize the regex needed to be a raw string as well. Also I removed the flag as it wasn’t needed.