Are There XML Processing Tools Without Safety Warnings

Hello Pythonistas,

I am getting into the subject of object serialization and parsing tools. In particular, XML parsing tools. When I went over to the Python website for the XML Processing Module, located here:

XML Processing Modules — Python 3.12.3 documentation

There is an immediate warning:

The XML modules are not secure against erroneous or maliciously constructed data.

Are there any out there that are secure? Is there such a thing?

Can someone please provide some insight since I am new to this topic.

Thank you to anyone that can provide some enlightenment on the subject.

Did you consider reading… The immediately following sentence in the documentation? :wink:

I read the following:

XML vulnerabilities

The XML processing modules are not secure against maliciously constructed data. An attacker can abuse XML features to carry out denial of service attacks, access local files, generate network connections to other machines, or circumvent firewalls.

Like I stated, this is a new topic for me, so just wanted someone’s point of view on the matter but not necessarily a tutorial. Btw, this is the first time that I have seen such a disclaimer for any module so it definitely peaked my interest.

The XML modules are not secure against erroneous or maliciously constructed data. If you need to parse untrusted or unauthenticated data see the XML vulnerabilities and The defusedxml Package sections.

And then:

The defusedxml Package

defusedxml is a pure Python package with modified subclasses of all stdlib XML parsers that prevent any potentially malicious operation. Use of this package is recommended for any server code that parses untrusted XML data. The package also ships with example exploits and extended documentation on more XML exploits such as XPath injection.

This is a secure package as far as the devs are aware, meaning it can parse untrusted xml data reasonably well. Whether this is good enough depends heavily on your usecase and threat model.

2 Likes

Thank you for your response. Are there others that you are aware of, that are not listed here, that are preferred XML processing modules? Since I am new to the topic, beginning with a tabla rasa so to speak, if I can start practicing with a trusted one, that would be appreciated. Otherwise, I will start with this one.

Honestly, I would recommend against using XML for object serialization in the first place, unless someone else is forcing your hand. It’s simply not designed for the purpose: it doesn’t natively represent any concept of data type, and it’s fundamentally a markup language designed to apply metadata to already existing text - not to represent a data structure.

2 Likes

Thank you @kknechtel.

To be honest, I am wrapping up Learning Python, 5th Ed by Mark Luntz, Advanced Topics section, (Ch. 37 - four to go). This is one of the last topics being covered - reason why I am covering XML. What alternative do you recommend?

Again, new to this topic. Any insight/wisdom would be appreciated.

For general object serialization - allowing for hierarchical structures - json and yaml are often used. For tabular data there are better formats (.csv, .tsv, .parquet or other binary formats) that store the data more efficiently.

2 Likes