I’m parsing SVG images, which are XML code.
They all have an xmlns (a namespace) at the base node, looking like <svg xmlns="http://www.w3.org/2000/svg" version="1.1"....
When parsing it using the xml.etree.elementtree module, it fills-in these namespace, to prepend the element tags, resulting in the root node having the tag {http://www.w3.org/2000/svg}svg, and an attrib where version is present but xmlns is not.
So, to keep a manageable tree, I used this code:
template_str = template_file.read()
namespace_match = re.search(r'xmlns="([^"]+)"\s*', template_str)
if namespace_match is not None:
namespace = namespace_match.group(1)
namespace_to_replace = namespace_match.group(0)
template_str = template_str.replace(namespace_to_replace, "")
template_ET = ET.fromstring(template_str)
template_ET.set("xmlns", namespace)
The last set is so that when I export it back to an svg file, it keeps its namespaces as before.
Is there a way to tell etree to handle xmlns as if it were an ordinary attribute, and skip all that messy hack ?
I’m not quite sure I know what you’re asking, but I do have a problem with the decision to insert namespaces by link in the curly braces, e.g. {…}.
To fix that problem for myself, I wrote the short script below. The idea is to replace the “{…}” namespace with the “…:” for each occurrence. It has served my purpose. I hope it can help you.
import xml.etree.ElementTree as etree
def fixNs(stack, tag):
if tag[0] == '{':
n, l = tag[1:].split('}')
for x in reversed(stack):
if x[1] == n:
if len(x[0]):
return x[0].encode("utf-8") + ":" + l
else:
return l
else:
return tag
def parse(f):
root = None
stack = []
nsinel = []
for ev, x in etree.iterparse(f, events=("start", "end", "start-ns", "end-ns")):
if ev == "start":
if root is None:
root = x
for n, u in reversed(nsinel):
if len(n):
x.attrib["xmlns:" + n.encode("utf-8")] = u
else:
x.attrib["xmlns"] = u
nsinel = []
elif ev == "end":
x.tag = fixNs(stack, x.tag.encode("utf-8"))
for k in x.attrib.iterkeys():
kf = fixNs(stack, k.encode("utf-8"))
if kf != k:
x.attrib[kf] = x.attrib[k]
del x.attrib[k]
elif ev == "start-ns":
stack.append(x)
nsinel.append(x)
elif ev == "end-ns":
stack.pop()
d = etree.ElementTree()
d._setroot(root)
return d
XPath is also useful if you’d like to ignore namespaces. It doesn’t remove them from the document, but lets you query for tags with, e.g.:
tree.findall(“{*}sometag”)
If lxml is an option, as @jamestwebber suggested, it generally makes working with namespaces easier. When it hasn’t been available, I’ve often written little functions that can take a dict of namespaces to help build XPath expressions.