XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XML Data Bindings in Python

XML Data Bindings in Python

June 11, 2003

In a recent interview, "What's Wrong with XML APIs", Elliotte Rusty Harold offers a familiar classification of XML APIs:

  1. Push APIs (e.g. SAX)
  2. Pull APIs (e.g. Python's pulldom)
  3. Tree-based APIs (e.g. DOM)
  4. data binding APIs (e.g. PyXML marshalling tools)
  5. Query APIs (e.g. using 4XPath directly from Python)

The XML community of late there has been a lot of talk that there are no really easy and efficient ways of general XML programming. Push processing has the usual rap of being too difficult. It is easy to dismiss this as a problem for amateur programmers who have not properly learned how to code state machines; but let's face it, state machines are hard to code by hand, and the community has been slow to develop more declarative and friendly tools for developing SAX processing stubs, such as LEX and YACC tools for generating parser state machines. As frequent Python-XML contributor Tom Passim puts it, in a recent XML-DEV posting, with push processing the more context one has to keep track of between callbacks the harder the code is to write and maintain.

Pull processing has strong adherents, but there are also many, including me, who don't see that it really buys all that much simplicity. Tree APIs are easier to code, but less efficient as documents become larger because they generally require the entire document to be in memory. Query APIs take a step toward bridging XML and programming languages, which is a step toward making life easier for developers. Data bindings are a further step toward this goal and the focus of this article and others to come.

The State of Python Data Bindings

A data binding is any system for viewing XML documents as databases or programming language or data structures, and vice versa. There are several aspects, including:

  1. marshalling -- serializing program data constructs to XML
  2. unmarshalling -- creating program data constructs from XML
  3. schema-directed binding -- using XML schema languages (DTD, WXS, RELAX NG, etc.) to provide hints and intended data constructs to marshalling and unmarshalling systems
  4. query-directed binding -- using XML-specific query languages such as XPath to provide hints to marshalling and unmarshalling systems
  5. process bindings -- mapping program or DBMS actions designed to process particular data structure patterns covered by marshalling and unmarshalling

All of these aspects are available to some extent in Python, but unfortunately, the coverage is spotty. In the following list, the numbers refer to which aspects of data binding from the preceding list are offered by each tool.

Generic and WDDX marshalling in PyXML (1)(2)
I covered these marshalling/unmarshalling tools in the earlier article Introducing PyXML
generateDS.py (1)(2)(3)
A tool for generating Python data structures from XML Schema.
xml_pickle and xml_objectify.py from the Gnosis XML Utilities (1)(2)
tools for generic and specialized marshalling and unmarshalling.
XBind (1)(2)
An XML vocabulary for specifying language-independent data bindings; includes a prototype Python implementation.
Skyron (1)(2)(5)
Uses recipes encoded in XML to bind XML data to handler code in Python. Typical usage is to create a specialized Python data structure from particular XML data patterns.

generateDS.py

In future articles I'll survey all these packages, starting in this article with generateDS.py, which I downloaded (generateDS-1.2a.tar.gz), unpacked and installed using python setup.py install. The sample file for exercising the binding is in listing 1.

Listing 1: Example file for Python data binding comparison
<?xml version="1.0" encoding="iso-8859-1"?>
<labels>
  <label>
    <quote>
      <!-- Mixed content -->
      <emph>Midwinter Spring</emph> is its own season&#133;
    </quote>
    <name>Thomas Eliot</name>
    <address>
      <street>3 Prufrock Lane</street>
      <city>Stamford</city>
      <state>CT</state>
    </address>
  </label>
  <label>
    <name>Ezra Pound</name>
    <address>
      <street>45 Usura Place</street>
      <city>Hailey</city>
      <state>ID</state>
    </address>
  </label>
</labels>  

This example demonstrates a few things: an XML character entity outside the ASCII range (to test proper character support), a bit of the data flavor of XML with repeated, structured records, and a bit of the document flavor with mixed content in the quote element. The document flavor can be reinforced a bit if one treats the order of labels as important; likewise, the data flavor is reinforced if the order is considered unimportant. See this excellent discussion by Python-XML stalwart Paul Prescod for a nice contrast between data and document nuances of XML usage. Namespaces are another area of consideration, but to save space I do not cover them in this discussion of data bindings. generateDS.py operates on a WXS definition for the XML format. See listing 2 for the WXS description of the format used in listing 1.

Listing 2: WXS schema for XML format in listing 1
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified"
>
  <xs:element name="labels">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="label"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="label">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" ref="quote"/>
        <xs:element ref="name"/>
        <xs:element ref="address"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="quote">
    <xs:complexType mixed="true">
      <xs:sequence>
        <xs:element ref="emph"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="emph" type="xs:string"/>
  <xs:element name="name" type="xs:string"/>
  <xs:element name="address">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="street"/>
        <xs:element ref="city"/>
        <xs:element ref="state"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="street" type="xs:string"/>
  <xs:element name="city" type="xs:string"/>
  <xs:element name="state" type="xs:string"/>
</xs:schema>  

generateDS.py requires pyxml, and I used the most recent CVS version. It seems to require Python 2.2, as it uses static methods. I used Python 2.2.2 and ran it against the WXS as follows:

python generateDS.py -o labels.py listing2.xsd

generateDS.py generates Python files with the data binding derived from the schema. The -o option gives the location of the file containing data structures derived from the schema. This is the heart of the data binding. The output file labels.py is too large to paste in its entirety, but listing 3 is a snippet to give you a feel for the output:

Listing 3: A snippet from the data binding generated by generateDS.py.
class label:
    subclass = None
    def __init__(self, quote=None, name=None, address=None):
        self.quote = quote
        self.name = name
        self.address = address
    def factory(*args):
        if label.subclass:
            return apply(label.subclass, args)
        else:
            return apply(label, args)
    factory = staticmethod(factory)
    def getQuote(self): return self.quote
    def setQuote(self, quote): self.quote = quote
    def getName(self): return self.name
    def setName(self, name): self.name = name
    def getAddress(self): return self.address
    def setAddress(self, address): self.address = address
    def export(self, outfile, level):
        showIndent(outfile, level)
        outfile.write('<label>\n')
        level += 1
        if self.quote:
            self.quote.export(outfile, level)
        if self.name:
            self.name.export(outfile, level)
        if self.address:
            self.address.export(outfile, level)
        level -= 1
        showIndent(outfile, level)
        outfile.write('</label>\n')
    def build(self, node_):
        attrs = node_.attributes
        for child in node_.childNodes:
            if child.nodeType == Node.ELEMENT_NODE and \
                child.nodeName == 'quote':
                obj = quote.factory()
                obj.build(child)
                self.setQuote(obj)
            elif child.nodeType == Node.ELEMENT_NODE and \
                child.nodeName == 'name':
                obj = name.factory()
                obj.build(child)
                self.setName(obj)
            elif child.nodeType == Node.ELEMENT_NODE and \
                child.nodeName == 'address':
                obj = address.factory()
                obj.build(child)
                self.setAddress(obj)
# end class label

# SNIP

class name:
    subclass = None
    def __init__(self):
        pass
    def factory(*args):
        if name.subclass:
            return apply(name.subclass, args)
        else:
            return apply(name, args)
    factory = staticmethod(factory)
    def export(self, outfile, level):
        showIndent(outfile, level)
        outfile.write('<name>\n')
        level += 1
        level -= 1
        showIndent(outfile, level)
        outfile.write('</name>\n')
    def build(self, node_):
        attrs = node_.attributes
        for child in node_.childNodes:
            pass
# end class name  

The label class has, among other things, facilities for marshalling and unmarshalling. The build method allows instances of the class to be built from a DOM, and this appears to be the only supplied method of binding from instances. This is what one might expect, since it's the easiest and most convenient way to write a data binding. It does mean that memory footprint could become a problem as the DOM contents are duplicated in the resulting data structures. Given that the DOM might become unnecessary once the data structures are complete, there seems to be some room for optimization. The export method marshals the object back to XML.

Pages: 1, 2

Next Pagearrow







close