XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XML Data Bindings in Python
by Uche Ogbuji | Pages: 1, 2

Special Schema Needs

There is a class like label for each element defined in the schema. As you can see, this even extends to the name element and therein lies a problem. name is a simple element with only string content. But in the generated binding it is given its own element, rather than making it a simple data member of label. Even worse than that, if you follow the build method carefully, you'll see that it throws away the text content of the element upon unmarshalling. It turns out generateDS.py is rather picky in its interpretation of WXS. The relevant snippet from listing 2 is

  <xs:element name="label">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" ref="quote"/>
        <xs:element ref="name"/>
        <xs:element ref="address"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="name" type="xs:string"/>  

This is a common practice in WXS: using a separate xs:element declaration for each element, even if it is of simple type. But this usage throws off generateDS.py, and in order to have name treated as a simple data member of the binding class you have to rewrite the schema:

  <xs:element name="label">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" ref="quote"/>
        <xs:element ref="name" type="xs:string"/>
        <xs:element ref="address"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>  

Which, according to WXS rules, is strictly equivalent to the original form. Listing 4 is a new version of the WXS to satisfy this preference of generateDS.py.

Listing 4: Adjusted WXS for data binding generation by generateDS.py
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified"
>
  <xs:element name="labels">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="label"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="label">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" ref="quote"/>
        <xs:element ref="name" type="xs:string"/>
        <xs:element ref="address"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="quote">
    <xs:complexType mixed="true">
      <xs:sequence>
        <xs:element ref="emph" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="address">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="street" type="xs:string"/>
        <xs:element ref="city" type="xs:string"/>
        <xs:element ref="state" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>  

Listing 5 is a snippet from the new data binding. Notice the update to the handling of the name element.

Listing 5: A snippet from the updated data binding
class label:
    subclass = None
    def __init__(self, quote=None, name='', address=None):
        self.quote = quote
        self.name = name
        self.address = address
    def factory(*args):
        if label.subclass:
            return apply(label.subclass, args)
        else:
            return apply(label, args)
    factory = staticmethod(factory)
    def getQuote(self): return self.quote
    def setQuote(self, quote): self.quote = quote
    def getName(self): return self.name
    def setName(self, name): self.name = name
    def getAddress(self): return self.address
    def setAddress(self, address): self.address = address
    def export(self, outfile, level):
        showIndent(outfile, level)
        outfile.write('<label>\n')
        level += 1
        if self.quote:
            self.quote.export(outfile, level)
        showIndent(outfile, level)
        outfile.write('<name>%s</name>\n' % quote_xml(self.getName()))
        if self.address:
            self.address.export(outfile, level)
        level -= 1
        showIndent(outfile, level)
        outfile.write('</label>\n')
    def build(self, node_):
        attrs = node_.attributes
        for child in node_.childNodes:
            if child.nodeType == Node.ELEMENT_NODE and \
                child.nodeName == 'quote':
                obj = quote.factory()
                obj.build(child)
                self.setQuote(obj)
            elif child.nodeType == Node.ELEMENT_NODE and \
                child.nodeName == 'name':
                name = ''
                for text_ in child.childNodes:
                    name += text_.nodeValue
                self.name = name
            elif child.nodeType == Node.ELEMENT_NODE and \
                child.nodeName == 'address':
                obj = address.factory()
                obj.build(child)
                self.setAddress(obj)
# end class label

class quote:
    subclass = None
    def __init__(self, emph=''):
        self.emph = emph
    def factory(*args):
        if quote.subclass:
            return apply(quote.subclass, args)
        else:
            return apply(quote, args)
    factory = staticmethod(factory)
    def getEmph(self): return self.emph
    def setEmph(self, emph): self.emph = emph
    def export(self, outfile, level):
        showIndent(outfile, level)
        outfile.write('<quote>\n')
        level += 1
        showIndent(outfile, level)
        outfile.write('<emph>%s</emph>\n' % quote_xml(self.getEmph()))
        level -= 1
        showIndent(outfile, level)
        outfile.write('</quote>\n')
    def build(self, node_):
        attrs = node_.attributes
        for child in node_.childNodes:
            if child.nodeType == Node.ELEMENT_NODE and \
                child.nodeName == 'emph':
                emph = ''
                for text_ in child.childNodes:
                    emph += text_.nodeValue
                self.emph = emph
# end class quote  

Now this is a pretty straightforward data binding result that, for example, wouldn't surprise a Java developer. Each complex type in the schema becomes a class, and simple types become simple properties with get/set methods (like JavaBeans). This might feel a bit unpythonic until you reflect that these binding classes are designed to be subclassed (note the factory convenience functions), and the use of accessor functions allows classic method polymorphism. Of course, one could still argue that since the binding already uses Python 2.2, it could have taken advantage of the more Pythonic approaches to such polymorphism available with new style classes in Python 2.2. (For more on new style classes, see Unifying types and classes in Python 2.2 by Guido van Rossum and What's New in Python 2.2 by A.M. Kuchling.)

Look at the quote.build method. Again, careful examination will show that generateDS.py does not seem to handle mixed content. In particular it discards text that is not within the emph element: "is its own season...".

Listing 6 demonstrates usage of the data binding, a pretty straightforward matter.

Listing 6:
import sys
import labels

rootObject = labels.parse('listing1.xml')
print dir(rootObject)

eliot = rootObject.label[0]
name = eliot.name
street = eliot.address.street
print street

emphasized = eliot.quote.emph
print emphasized

pound = rootObject.label[1]

#Modify the XML through the data binding
pound.name = 'Ezra Loomis Pound'

#Marshall back a portion of the XML, as modified
pound.export(sys.stdout, 0)  

I also wanted to check the handling of non-ASCII characters, but the ellipsis character I'd placed in the quote element was discarded by the binding generation. I moved it into the emph element and this time when I tried parsing the instance I ended up with the infamous "UnicodeError: ASCII encoding error: ordinal not in range(128)". Examining the binding code, I think this might be more a problem with the marshalling and unmarshalling than with the binding implementation, so perhaps it would be easy to fix.

Just the beginning

generateDS.py is a very nifty program and offers many of the hallmarks of a data binding. I did point out a few shortcomings, not to knock the project, but because I think that rich bindings may be an area where Python can leapfrog the field in XML processing because of its dynamic qualities. In this column I shall continue to explore the issue, exploring the remaining data binding projects and offering discussion on future directions.

Meanwhile, here's the usual brief on activity in the Python-XML landscape.

Dave Kuhlman, the developer behind generateDS.py, announced code for Python support for the REST (XML-over-HTTP) mode of Amazon Web services. The package provides Python code for parsing and processing the Amazon Web Services XML documents. It also includes code for generating WXS from an XML instance document (not unlike the concept in Eric Van der Vlist's Examplotron). Kuhlman has been very busy working with XML, REST, and SWIG (a tool for binding Python and other languages to C code). Another nice resource is Kuhlman's unofficial SWIG-based Python binding of the libxml tree API (see my last article for a discussion of the official Python binding).

Fredrik Lundh has been busy working on ElementTree, which I covered recently. He announced 1.1 and 1.2 alpha 1. Changes include a new XML literal factory, a self-contained ElementTree module, use of ASCII as the default encoding, optimizations, and limited XPath support.

    

Also in Python and XML

Processing Atom 1.0

Should Python and XML Coexist?

EaseXML: A Python Data-Binding Tool

More Unicode Secrets

Unicode Secrets

John Merrells pointed me to the Python API for Berkeley DB XML, part of Berkeley DB. In Merrells' words: "The Python API is basically the same as the C++ and Java APIs, in that they expose the functionality of the product."

See this post and thread for discussion of Tim Bray's comment: "The Python people also piped to say 'everything's just fine here' but then they always do, I really must learn that language". I suspect that Tim Bray might have been referring to comments by me, Paul Prescod and others on the XML-DEV mailing list. I think our point is that Python's dynamic nature makes the horrors of DOM and SAX easier to bear, and not that Python has anything radical to leapfrog them. I'm rather hoping this series on data bindings helps produce such a leap, though.



1 to 3 of 3
  1. Dave responds
    2003-09-10 08:56:25 Dave Kuhlman
  2. Gnosis Utilities Work Great
    2003-06-15 18:58:42 Doug Tillman
  3. XML data binding only half done
    2003-06-12 11:58:08 Peter Herndon
1 to 3 of 3