EaseXML: A Python Data-Binding Tool
by Uche Ogbuji
|
Pages: 1, 2, 3
from EaseXML import *
class labels(XMLObject):
labels = ListNode(u'label')
class label(XMLObject):
id = StringAttribute()
added = StringAttribute(u'added')
_nodesOrder = [u'name', u'address', u'quote']
name = TextNode()
address = ItemNode(u'address')
quote = ItemNode(u'quote', optional=True)
class address(XMLObject):
_nodesOrder = [u'street', u'city', u'state']
street = TextNode()
city = TextNode()
state = TextNode()
class quote(XMLObject):
_name = u'quote'
content = ChoiceNode(['#PCDATA', 'emph'], optional=True,
main=True, noLimit=True)
emph = TextNode(optional=True)
The most important class XMLObject still bears the name of
the original package. You have to subclass it to create your own specialized
classes representing elements. The top-level element labels is
defined using a class of the same name. It expresses that its contents are
a list of child elements ( EaseXML.ListNode) named label.
Each of these has an id and added attribute. Data
binding tools have to deal with the situation where XML's naming conventions
don't match that of the host language. In EaseXML, the names of XML identifiers
are usually assumed from the named of the matching Python object references,
but the definition of the added attribute shows how you can
override that by specifying the actual XML identifier as the first argument.
This argument is sometimes optional, as in EaseXML.StringAttribute;
but sometimes it's mandatory, as in EaseXML.ListNode and EaseXML.ItemNode.
You specify the order of child nodes using the _nodesOrder list,
specifying XML identifier names. EaseXML.TextNode defines a
simple node with text content only. Such nodes do not require a separate
Python class. The definition for the quote element illustrates
a few things. It uses the name_ property to override the XML
element identifier, which is derived form the class name by default (in this
case, the override happens to be the same as the default). quote is
simple text in one of its occurrences in the XML example, and mixed content
in another. You define mixed content by using a EaseXML.ChoiceNode,
with #PCDATA as one of the entries. As in XML DTDs, this is
a special identifier for text content. optional=True is specified
for the mixed content contsruct as a whole, indicating that the element can
be empty, and for the emph element, indicating that text alone
can occur without any elements mixed in.
Putting the Binding to Work
After you define the binding classes, you can use them to parse in XML. You can also use them to generate XML, but I don't cover that in this article. The following interactive session demonstrates reading XML with an EaseXML data binding.
$ python -i labelsease.py
>>> XML = open('labels.xml', 'r').read()
>>> doc = labels.fromXml(XML)
As you can see, I load Listing 2 upon starting the Python interpreter. doc is
a data structure based on instances of those classes with the data from the
XML document.
>>> #Print the ids of all the labels
>>> for label in doc.labels:
... print label.id
...
tse
ep
lh
>>> #Print the first quote element's contents
>>> doc.labels[0].quote.emph
u'Midwinter Spring'
>>> doc.labels[0].quote.content
[u'is its own season\u2026']
I ran into all sorts of quirks when poking introspectively at the resulting
data binding. For example, I found a phantom processing instruction among
the child nodes of the quote element you see in the last snippet.
The Unicode support seems to be patchy, and I was unable to reserialize the
quote element containing the ellipsis character … (I checked
the toxml method for encoding arguments but didn't find any.)
The API itself is a bit strange and hard to get your head around. I noticed
that the forEach method is the recommended way for walking EaseXML
objects. Keep in mind that it requires specialized callbacks to work.
I decided to write about EaseXML before I realized to what extent it's a young project. It needs quite a bit of work. Besides the quirks I mentioned above, EaseXML lacks proper namespaces support, and I think the binding schema API could do with some close analysis. Fortunately, the version control logs seem to show a reasonable rate of development. I think it's worth keeping an eye on EaseXML because it does bring some innovative touches to XML processing in Python, but I would suggest waiting for another couple of releases before using it in production.