Python Regex and re.findall problems

In a string I want to find what we call “macros”, but only some of them, and I want to find all macros that match: <rx;ABC123>, or <grf;ABC144> or <grfa;BDB199>. Here’s the code I’m using and the results. I’ve never done this regex in Python before and it’s not giving me what I want.

'''Test program to test finding multiple macros.'''

import re

lin = '<ps;2><rx;spec><px;;1>Table of contents<pa><spd;1><grf;1212><qa>'
rxarr = re.findall(r'<(grf|grfa|rx);.+?>', lin)
print(rxarr)
# I'm getting ['rx', 'grf'] which is incorrect. 
# I want to get in rxarr: ['<rx;spec>', '<grf;1212>']
rxarr = re.findall(r'(<(grf|grfa|rx);.+?>)', lin)
print(rxarr)
# For this pattern I get rxarr of: [('<rx;spec>', 'rx'), ('<grf;1212>', 'grf')] which I don't want.

I wasn’t sure what search terms to use. So how can I use regex to get what I want by using one re.search() statement?

Thank you!

  1. EDIT: I’m not searching HTML but the strings still look like HTML tags.

The result depends on the number of capturing groups in the pattern. … If there is exactly one group, return a list of strings matching that group.

Try finditer instead and use the entire Match objects to get exacly what you want.

Also what’s the intent behind .+? ? If the stuff after the semicolon really is optional, I’d prefer .* instead.

This r'<(grf|grfa|rx);.+?>' is what limits me to finding several whole macros, not all macros. .+?> stops at the first > sign. The question mark is a non-greedy modifier to .+.

Let’s say I want to find all <p> and <a> elements in html in this string:
<p style="color:red;"><b>My bold text</b> <i>My italic text</i> <a href="https://google.com>Google</a> We would want to return a list that has only: ['<p style="color:red;">', '<a href="https://google.com>']

Try this code to see what I mean:

import re
rxarr = re.findall(r'(<(grf|grfa|rx);.*)', lin)
print(rxarr)

You can do some testing on https://regex101.com. I just am new to Python regex so I did not remember re.finditer().
It was probably in a tutorial a year ago but I have never used it before.

Thanks!

1 Like

Ah of course, I’d forgot about non-greedy modifiers - thanks