Hi,
I’ve got hundreds of .html-Files of a Forum and I want to rename and save them the following way. Each File consists, among many other lines, of the following line:
It sounds like what you’re trying to do is parse HTML files to look for key pieces of information? If that’s the case, I recommend BeautifulSoup - it’s the easiest way to navigate a puddle of tags and find something useful in them.
I missed the HTML part of the question, as Rosuav pointed out.
You can even use regex to achieve what you want without parsing the HTML file.
Here is an example:
Code
import re
html = '''
<td align="left" valign="middle" class="nav" width="100%">
<span class="nav">
<a href="index.php?sid=a6ddafec8f3ed8a0a7cf4f8bf8273cff" class="nav">ONE</a>
<a href="./index.php" class="nav">TWO</a> »
<a href="./viewforum.php" class="nav">THREE</a>
<a href="./viewtopic.php" class="nav">FOUR</a>
</span>
</td>
'''
# Use regex to find the specified <a> tag and extract text until <
match = re.search(r'<a\s+href="./viewtopic.php"\s+class="nav">([^<]*)</', html)
# Print the extracted text
if match:
print(match.group(1))
else:
print("Pattern not found.")
But I agree, regexps are not for HTML (though I myself have gone down that path). But it can be a quick’n’dirty way to scan a known page for expected content. Still, BS4 (beautifulsoup4 · PyPI) is eady to use and a FAR FAR better tool.