XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

EaseXML: A Python Data-Binding Tool
by Uche Ogbuji | Pages: 1, 2, 3

Listing 2 (labelsease.py). EaseXML class definitions for address labels
from EaseXML import *

class labels(XMLObject):
    labels = ListNode(u'label')

class label(XMLObject):
    id = StringAttribute()
    added = StringAttribute(u'added')
    _nodesOrder = [u'name', u'address', u'quote']
    name = TextNode()
    address = ItemNode(u'address')
    quote = ItemNode(u'quote', optional=True)

class address(XMLObject):
    _nodesOrder = [u'street', u'city', u'state']
    street = TextNode()
    city = TextNode()
    state = TextNode()

class quote(XMLObject):
    _name = u'quote'
    content = ChoiceNode(['#PCDATA', 'emph'], optional=True,
                         main=True, noLimit=True)
    emph = TextNode(optional=True)

  

The most important class XMLObject still bears the name of the original package. You have to subclass it to create your own specialized classes representing elements. The top-level element labels is defined using a class of the same name. It expresses that its contents are a list of child elements ( EaseXML.ListNode) named label. Each of these has an id and added attribute. Data binding tools have to deal with the situation where XML's naming conventions don't match that of the host language. In EaseXML, the names of XML identifiers are usually assumed from the named of the matching Python object references, but the definition of the added attribute shows how you can override that by specifying the actual XML identifier as the first argument. This argument is sometimes optional, as in EaseXML.StringAttribute; but sometimes it's mandatory, as in EaseXML.ListNode and EaseXML.ItemNode. You specify the order of child nodes using the _nodesOrder list, specifying XML identifier names. EaseXML.TextNode defines a simple node with text content only. Such nodes do not require a separate Python class. The definition for the quote element illustrates a few things. It uses the name_ property to override the XML element identifier, which is derived form the class name by default (in this case, the override happens to be the same as the default). quote is simple text in one of its occurrences in the XML example, and mixed content in another. You define mixed content by using a EaseXML.ChoiceNode, with #PCDATA as one of the entries. As in XML DTDs, this is a special identifier for text content. optional=True is specified for the mixed content contsruct as a whole, indicating that the element can be empty, and for the emph element, indicating that text alone can occur without any elements mixed in.

Putting the Binding to Work

After you define the binding classes, you can use them to parse in XML. You can also use them to generate XML, but I don't cover that in this article. The following interactive session demonstrates reading XML with an EaseXML data binding.


$ python -i labelsease.py
>>> XML = open('labels.xml', 'r').read()
>>> doc = labels.fromXml(XML)

  

As you can see, I load Listing 2 upon starting the Python interpreter. doc is a data structure based on instances of those classes with the data from the XML document.


>>> #Print the ids of all the labels
>>> for label in doc.labels:
...     print label.id
...
tse
ep
lh
>>> #Print the first quote element's contents
>>> doc.labels[0].quote.emph
u'Midwinter Spring'
>>> doc.labels[0].quote.content
[u'is its own season\u2026']

  

I ran into all sorts of quirks when poking introspectively at the resulting data binding. For example, I found a phantom processing instruction among the child nodes of the quote element you see in the last snippet. The Unicode support seems to be patchy, and I was unable to reserialize the quote element containing the ellipsis character (I checked the toxml method for encoding arguments but didn't find any.) The API itself is a bit strange and hard to get your head around. I noticed that the forEach method is the recommended way for walking EaseXML objects. Keep in mind that it requires specialized callbacks to work.

I decided to write about EaseXML before I realized to what extent it's a young project. It needs quite a bit of work. Besides the quirks I mentioned above, EaseXML lacks proper namespaces support, and I think the binding schema API could do with some close analysis. Fortunately, the version control logs seem to show a reasonable rate of development. I think it's worth keeping an eye on EaseXML because it does bring some innovative touches to XML processing in Python, but I would suggest waiting for another couple of releases before using it in production.

Pages: 1, 2, 3

Next Pagearrow