I’m trying to find duplicates in a string and replace. I am largely successful but found a bug I cannot resolve.
string = 'XX111XX = 1, XX111XX = 3 , XX121XXX = 2 , XX123XXX = "HAPPYYYY" XX124XXX = "HAPPYYYY"'
print return re.sub(r'\b[^\.|^\d](\w+)\b(?=.*\b([\S+]\1)\b)', r'', string)
This matches and replaces XX111XX HAPPYYYY leaving the double quotes where HAPPYYYY once was.
I don’t want to replace the values in double “” on the XX111XX matches.
How can this be achieved in my existing regex ?
I’ve tried all these things…
'\b[^\.|^\d|^A-Z](\w+)\b(?=.*\b([\S+]\1)\b)'
'\b[^\.|^\d|^\"](\w+)\b(?=.*\b([\S+]\1)\b)'
How can I avoid matching anything between quotes or code it so it only matches substrings than contain numbers and alphabetical characters and not ony alphabetical characters ?