Managing XML with Python

When I was sorting out how to deal with XML responses thrown by SDN controllers, whether it be with Cisco APIC or NSX Manager, I made some research around how to easily achieve that with Python libraries. I found out that lxml was the quickest way to work with XML. To install lxml, just run the following command from your terminal, assuming Python is already installed:

$ easy_install lxml

The tutorial, which can be found here, is quite long and covers loads of use cases you can face. I found another concise how-to guide, but written in French. So I’ve decided to translate it here so you can quickly use lxml and focus on SDN APIs, avoiding headache with the code syntax for parsing. However, remember the key thing with XML is that you should always stay away from using regex to manage it, this is really dirty and messy!!!

The following is a translation from http://apprendre-python.com/page-xml-python-xpath

What is XML

XML, which stands for Extensible Markup Language, is a markup language that allows you to exchange data between two heterogeneous environments. An XML document is a tree composed of nodes that can be elements or attributes.

An XML Document

Here is an example of an XML document, the filename of which is data.xml.

<?xml version="1.0" encoding="UTF-8"?>
<users>
    <user data-id="101">
        <nom>Zorro</nom>
        <metier>Danseur</metier>
    </user>
    <user data-id="102">
        <nom>Hulk</nom>
        <metier>Footballeur</metier>
    </user>
    <user data-id="103">
        <nom>Zidane</nom>
        <metier>Star</metier>
    </user>
    <user data-id="104">
        <nom>Beans</nom>
        <metier>Epicier</metier>
    </user>
    <user data-id="105">
        <nom>Batman</nom>
        <metier>Veterinaire</metier>
    </user>
    <user data-id="106">
        <nom>Spiderman</nom>
        <metier>Veterinaire</metier>
    </user>
</users>

The first line describes the encoding, which is always UTF-8. Then we can notice that the “users” tag has others “users” tags, that also have their own tags. The data are organised within a hierarchical tree structure, where each node gives some information.

XML Reading

The following script displays all users names.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from lxml import etree

tree = etree.parse("data.xml")
for user in tree.xpath("/users/user/nom"):
    print(user.text)

The result should be the following list.

Zorro
Hulk
Zidane
Beans
Batman
Spiderman

You can also display tags attributes:

tree = etree.parse("data.xml")
for user in tree.xpath("/users/user"):
    print(user.get("data-id"))

Result:

101
102
103
104
105
106

You can also narrow down the list to users whose job is veterinary:

tree = etree.parse("data.xml")
for user in tree.xpath("/users/user[metier='Veterinaire']/nom"):
    print(user.text)

The result should be:

Batman
Spiderman
Building XML

You can build XML in the following way:


users = etree.Element("users")
user = etree.SubElement(users, "user")
user.set("data-id", "101")
nom = etree.SubElement(user, "nom")
nom.text = "Zorro"
metier = etree.SubElement(user, "metier")
metier.text = "Danseur"
print(etree.tostring(users, pretty_print=True))

The result is:

<users>
    <user data-id="101">
        <nom>Olivier</nom>
        <metier>Danseur</metier>
    </user>
</users>

Code to build the initial XML:


users = etree.Element("users")

users_data = [
("101", "Zorro", "Danseur"),
("102", "Hulk", "Footballeur"),
("103", "Zidane", "Star"),
("104", "Beans", "Epicier"),
("105", "Batman", "Veterinaire"),
("106", "Spiderman", "Veterinaire"),
]

for user_data in users_data:
    user = etree.SubElement(users, "user")
    user.set("data-id", user_data[0])
    nom = etree.SubElement(user, "nom")
    nom.text = user_data[1]
    metier = etree.SubElement(user, "metier")
    metier.text = user_data[2]


print(etree.tostring(users, pretty_print=True))
Nodes methods

If you want to know more about the lxml library, a lot more nodes methods are available. You can display them by executing the command help(<node>).

Comments

comments powered by Disqus