XML Data Bindings in Python
by Uche Ogbuji
|
Pages: 1, 2
Special Schema Needs
There is a class like label for each element defined in
the schema. As you can see, this even extends to the name
element and therein lies a problem. name is a simple
element with only string content. But in the generated binding it is
given its own element, rather than making it a simple data member of
label. Even worse than that, if you follow the
build method carefully, you'll see that it throws away the
text content of the element upon unmarshalling. It turns out
generateDS.py is rather picky in its interpretation of WXS. The relevant
snippet from listing 2 is
<xs:element name="label">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" ref="quote"/>
<xs:element ref="name"/>
<xs:element ref="address"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="name" type="xs:string"/>
This is a common practice in WXS: using a separate
xs:element declaration for each element, even if it is of
simple type. But this usage throws off generateDS.py, and in order to
have name treated as a simple data member of the binding class you have
to rewrite the schema:
<xs:element name="label">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" ref="quote"/>
<xs:element ref="name" type="xs:string"/>
<xs:element ref="address"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Which, according to WXS rules, is strictly equivalent to the original form. Listing 4 is a new version of the WXS to satisfy this preference of generateDS.py.
Listing 4: Adjusted WXS for data binding generation by generateDS.py<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
>
<xs:element name="labels">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="label"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="label">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" ref="quote"/>
<xs:element ref="name" type="xs:string"/>
<xs:element ref="address"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="quote">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="emph" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="address">
<xs:complexType>
<xs:sequence>
<xs:element ref="street" type="xs:string"/>
<xs:element ref="city" type="xs:string"/>
<xs:element ref="state" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Listing 5 is a snippet from the new data binding. Notice the update
to the handling of the name element.
class label:
subclass = None
def __init__(self, quote=None, name='', address=None):
self.quote = quote
self.name = name
self.address = address
def factory(*args):
if label.subclass:
return apply(label.subclass, args)
else:
return apply(label, args)
factory = staticmethod(factory)
def getQuote(self): return self.quote
def setQuote(self, quote): self.quote = quote
def getName(self): return self.name
def setName(self, name): self.name = name
def getAddress(self): return self.address
def setAddress(self, address): self.address = address
def export(self, outfile, level):
showIndent(outfile, level)
outfile.write('<label>\n')
level += 1
if self.quote:
self.quote.export(outfile, level)
showIndent(outfile, level)
outfile.write('<name>%s</name>\n' % quote_xml(self.getName()))
if self.address:
self.address.export(outfile, level)
level -= 1
showIndent(outfile, level)
outfile.write('</label>\n')
def build(self, node_):
attrs = node_.attributes
for child in node_.childNodes:
if child.nodeType == Node.ELEMENT_NODE and \
child.nodeName == 'quote':
obj = quote.factory()
obj.build(child)
self.setQuote(obj)
elif child.nodeType == Node.ELEMENT_NODE and \
child.nodeName == 'name':
name = ''
for text_ in child.childNodes:
name += text_.nodeValue
self.name = name
elif child.nodeType == Node.ELEMENT_NODE and \
child.nodeName == 'address':
obj = address.factory()
obj.build(child)
self.setAddress(obj)
# end class label
class quote:
subclass = None
def __init__(self, emph=''):
self.emph = emph
def factory(*args):
if quote.subclass:
return apply(quote.subclass, args)
else:
return apply(quote, args)
factory = staticmethod(factory)
def getEmph(self): return self.emph
def setEmph(self, emph): self.emph = emph
def export(self, outfile, level):
showIndent(outfile, level)
outfile.write('<quote>\n')
level += 1
showIndent(outfile, level)
outfile.write('<emph>%s</emph>\n' % quote_xml(self.getEmph()))
level -= 1
showIndent(outfile, level)
outfile.write('</quote>\n')
def build(self, node_):
attrs = node_.attributes
for child in node_.childNodes:
if child.nodeType == Node.ELEMENT_NODE and \
child.nodeName == 'emph':
emph = ''
for text_ in child.childNodes:
emph += text_.nodeValue
self.emph = emph
# end class quote
Now this is a pretty straightforward data binding result that, for
example, wouldn't surprise a Java developer. Each complex type in the
schema becomes a class, and simple types become simple properties with
get/set methods (like JavaBeans). This might feel a bit unpythonic until
you reflect that these binding classes are designed to be subclassed (note
the factory convenience functions), and the use of accessor
functions allows classic method polymorphism. Of course, one could still
argue that since the binding already uses Python 2.2, it could have taken
advantage of the more Pythonic approaches to such polymorphism available
with new style classes in Python 2.2. (For more on new style classes, see
Unifying types and
classes in Python 2.2 by Guido van Rossum and What's New
in Python 2.2 by A.M. Kuchling.)
Look at the quote.build method. Again, careful
examination will show that generateDS.py does not seem to handle mixed
content. In particular it discards text that is not within the
emph element: "is its own season...".
Listing 6 demonstrates usage of the data binding, a pretty straightforward matter.
Listing 6:import sys
import labels
rootObject = labels.parse('listing1.xml')
print dir(rootObject)
eliot = rootObject.label[0]
name = eliot.name
street = eliot.address.street
print street
emphasized = eliot.quote.emph
print emphasized
pound = rootObject.label[1]
#Modify the XML through the data binding
pound.name = 'Ezra Loomis Pound'
#Marshall back a portion of the XML, as modified
pound.export(sys.stdout, 0)
I also wanted to check the handling of non-ASCII characters, but the
ellipsis character I'd placed in the quote element was
discarded by the binding generation. I moved it into the
emph element and this time when I tried parsing the
instance I ended up with the infamous "UnicodeError: ASCII encoding
error: ordinal not in range(128)". Examining the binding code, I think
this might be more a problem with the marshalling and unmarshalling than
with the binding implementation, so perhaps it would be easy to fix.
Just the beginning
generateDS.py is a very nifty program and offers many of the hallmarks of a data binding. I did point out a few shortcomings, not to knock the project, but because I think that rich bindings may be an area where Python can leapfrog the field in XML processing because of its dynamic qualities. In this column I shall continue to explore the issue, exploring the remaining data binding projects and offering discussion on future directions.
Meanwhile, here's the usual brief on activity in the Python-XML landscape.
Dave Kuhlman, the developer behind generateDS.py, announced code for Python support for the REST (XML-over-HTTP) mode of Amazon Web services. The package provides Python code for parsing and processing the Amazon Web Services XML documents. It also includes code for generating WXS from an XML instance document (not unlike the concept in Eric Van der Vlist's Examplotron). Kuhlman has been very busy working with XML, REST, and SWIG (a tool for binding Python and other languages to C code). Another nice resource is Kuhlman's unofficial SWIG-based Python binding of the libxml tree API (see my last article for a discussion of the official Python binding).
Fredrik Lundh has been busy working on ElementTree, which I covered recently. He announced 1.1 and 1.2 alpha 1. Changes include a new XML literal factory, a self-contained ElementTree module, use of ASCII as the default encoding, optimizations, and limited XPath support.
Also in Python and XML | |
Should Python and XML Coexist? | |
John Merrells pointed me to the Python API for Berkeley DB XML, part of Berkeley DB. In Merrells' words: "The Python API is basically the same as the C++ and Java APIs, in that they expose the functionality of the product."
See this post and thread for discussion of Tim Bray's comment: "The Python people also piped to say 'everything's just fine here' but then they always do, I really must learn that language". I suspect that Tim Bray might have been referring to comments by me, Paul Prescod and others on the XML-DEV mailing list. I think our point is that Python's dynamic nature makes the horrors of DOM and SAX easier to bear, and not that Python has anything radical to leapfrog them. I'm rather hoping this series on data bindings helps produce such a leap, though.
- Dave responds
2003-09-10 08:56:25 Dave Kuhlman - Dave responds
2003-09-17 14:34:22 Dave Kuhlman - Gnosis Utilities Work Great
2003-06-15 18:58:42 Doug Tillman - XML data binding only half done
2003-06-12 11:58:08 Peter Herndon - XML data binding only half done, Pt II
2003-06-12 16:33:42 Peter Herndon