XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XML Namespaces Support in Python Tools, Part Three
by Uche Ogbuji | Pages: 1, 2

Listing 5: Output from listing 4

<products>
  <ns0:product xmlns:ns0="http://example.com/product-info">
    <ns0:name xml:lang="en">Python Perfect IDE</ns0:name>
    <ns0:description>
      Uses mind-reading technology to anticipate and accommodate
      all user needs in Python development.  Implements all
      <ns0:code>1166</ns0:code>.
    </ns0:description>
  <ns0:launch-date /><island /></ns0:product>
  <ns1:product id="1166" xmlns:ns1="http://example.com/product-info">
    <ns1:name>XSLT Perfect IDE</ns1:name>
    <ns1:description>
      <ns1:code>red</ns1:code>
      <html:div class="eggs" html:global="spam" xml:lang="en"
                xmlns:html="http://www.w3.org/1999/xhtml">
        <ref ns3:type="simple"
                xmlns:ns3="http://www.w3.org/1999/xlink">A link</ref>
      </html:div>
    </ns1:description>
  </ns1:product>
</products>  

Reading namespaces with libxml/Python

As I discussed in my article about libxml there are several available mechanisms for processing XML, including SAX and DOM variations. I focused on the more unusual API, XmlTextReader, and in this discussion of namespace processing I shall continue to focus on this API, which means that I'll only worry about how to read namespace information. You should be able to perform mutation using similar DOM idioms to those I presented in the first namespace article. Listing 6 is the XmlTextReader equivalent of listing 2.

Listing 6: libxml code to display namespace information for elements and attributes

import sysimport cStringIO
import libxml2

XMLNS_NS = 'http://www.w3.org/2000/xmlns/'
XMLREADER_START_ELEMENT_NODE_TYPE = 1

input = open(sys.argv[1])
input_source = libxml2.inputBuffer(input)
reader = input_source.newTextReader("urn:bogus")

while reader.Read():
    if reader.NodeType() == XMLREADER_START_ELEMENT_NODE_TYPE:
        print "Element namespace:", repr(reader.NamespaceUri())
        print "Element local name:", repr(reader.LocalName())
        print "Prefix used for element:", repr(reader.Prefix())
        while reader.MoveToNextAttribute():
            #Ignore namespace declarations
            if reader.NamespaceUri() != XMLNS_NS:
                print "Attribute namespace:", repr(reader.NamespaceUri())
                print "Attribute local name:", repr(reader.LocalName())
                print "Prefix used for attribute:", repr(reader.Prefix())

Besides the fact that again the element names and prefixes are not returned as Unicode objects, the results are as expected.

PyRXPU and namespaces

PyRXPU is part of the PyRXP package, and the only part I recommend using, as I discussed at length in my recent article on PyRXP. PyRXP is "non-Unicode" by default, but this default configuration is not an XML parser at all. You do have to get a CVS version of the package in order to use PyRXPU. The latest release, 0.9, does not include it. I provided details for installing from CVS in my earlier article. I did try to update to more recent CVS code this time, but my attempts to use PyRXPU in the latest PyRXP code resulted in core dumps on my Dell Inspiron 8600 running Fedora Core 2, so I reverted to the CVS code I used back in February. I didn't see anything in the CVS logs since February indicating any significant changes in namespace handling, so I assumed this would still be a current test.

By default PyRXP doesn't do any special namespace processing and returns namespace declarations as regular attributes. There are several parser parameters regarding namespace processing. One, ReturnNamespaceAttributes, is described strangely in the documentation as not returning XML namespace declarations by default. This seems to be incorrect. The second, XMLNamespaces, is described in the documentation thusly:

If this is on, the parser processes namespace declarations (see below). Namespace declarations are not returned as part of the list of attributes on an element.

I wasn't able to find whatever passage might have been referenced in the "see below" phrase: this sentence was pretty much the last one concerning namespaces in the document. I came to wish I could find more on namespaces once I tried out namespace processing in listing 7.

Listing 7: Code to parse a document in namespace processing mode

import sys
import pyRXPU

parser = pyRXPU.Parser()
parser.XMLNamespaces = 1
doc_source = open(sys.argv[1]).read()
doc = parser.parse(doc_source)

import pprint
pprint.pprint(doc)  

The result of running this against listing 1 is very odd:

Listing 8: results of namespace-aware reading of listing 1 in PyRXPU

(u'products',
 None,
 [u'\n  ',
  (u'product',
   {u'id': u'1144'},
   [u'\n    ',
    (u'name', {u'xml:lang': u'en'}, [u'Python Perfect IDE'], None),
    u'\n    ',
    (u'description',
     None,
     [u'\n      Uses mind-reading technology to anticipate and '
       'accommodate\n   all user needs in Python development.  '
       'Implements all\n      ',
      (u'html:code', None, [u'from __future__ import'], None),
      u' features though\n      the year 3000.  Works well with ',
      (u'code', None, [u'1166'], None),
      u'.\n    '],
     None),
    u'\n  '],
   None),
  u'\n  ',
  (u'p:product',
   {u'id': u'1166'},
   [u'\n    ',
    (u'p:name', None, [u'XSLT Perfect IDE'], None),
    u'\n    ',
    (u'p:description',
     None,
     [u'\n      ',
      (u'p:code', None, [u'red'], None),
      u'\n      ',
      (u'html:code', None, [u'blue'], None),
      u'\n      ',
      (u'html:div',
       None,
       [u'\n        ',
        (u'ref',
         {u'xl:type': u'simple', u'xl:href': u'index.xml'},
         [u'A link'],
         None),
        u'\n      '],
       None),
      u'\n    '],
     None),
    u'\n  '],
   None),
  u'\n'],
 None)  

The important information -- the namespaces -- is omitted while the unimportant details -- the prefixes -- are included as part of element names. This makes namespace processing very difficult. I tried a lot of tweaking and other options to try to get all the information needed for ready namespace processing without having to knit it all back together by hand after turning off the namespace option (the only difference upon omitting the parser.XMLNamespaces = 1 line is that namespace declarations are returned as attributes). In the end I was not really able to tackle any of the namespace reading or mutation tasks without processing namespaces entirely by hand (which you can do with any toolkit, namespace aware or no), and I conclude that PyRXPU does not really support namespace processing.

Wrap up

In this batch of namespace tests the results have been a mixed bag. ElementTree supports namespaces properly, but makes it very hard to work with prefixes, which is acceptable given that prefixes are a mere syntactic convenience. I would hesitate to use ElementTree where I needed the convenience of preserved prefixes. PyRXPU seems to either report namespace declarations literally, without any API benefits, or discard the namespaces information altogether, which is as much to say it doesn't support namespace processing. libxml, as one expects from such a comprehensive library, handles namespaces effortlessly. I barely scratched the surface in this article of how to process namespaces in libxml, but I do show the SAX and DOM approaches in my earlier article. I expect to wrap up this series on namespace processing next by looking at how some data binding tools handle namespaces.

It has been a busy month for my colleagues in the Python-XML community, including work by Brett Hartshorn on yet another small Python DOM implementation, xmlapi 0.2.1. It's billed as an "even smaller XML DOM implementation than Python's standard xml.dom.minidom" and claims performance and feature improvements over minidom.

Philippe Normand debuted XMLObject 0.0.2, a data binding tool which allows you to map from customized Python classes to XML and vice versa. See the announcement.

Fredrik Lundh announced ElementTree 1.2. I'm somewhat confused at this point as to whether the package is supposed to be called "ElementTree" or "elementtree", but I think current clues suggest the former. This release just makes official the various experimental features such as XPath support which I have already discussed. 1.2 final appeared after I wrapped up this article and the namespace discussion is based on the most recent beta rather than the final 1.2 release, but I expect not much has changed in between the two. See the announcement.

Brian Quinlan, who has also been busy helping organize the Vancouver Python Workshop, announced Pyana 0.9.0, the latest release of his Python interface to the Xalan-C XSLT processor. Changes include updated for Xalan 1.8/Xerces 2.5, basic support for tracing, and removal of the transform-to-DOM support, with promises of a better replacement in future. See the announcement.

Magnus Lie Hetland has updated Atox to version 0.5. Atox allows you to write custom scripts for converting plain text into XML. You define the text to XML binding using a simple XML language. It's meant to be used from the command line. Changes since 0.1 include language improvements, added support for config files and XSLT fragments in Atox format files. . See the full announcement.

    

Also in Python and XML

Processing Atom 1.0

Should Python and XML Coexist?

EaseXML: A Python Data-Binding Tool

More Unicode Secrets

Unicode Secrets

Michael Twomey announced pygenx 0.5.2, a wrapper for Tim Bray's XML generation library Genx. Genx is a C library and its output is canonical XML. PyGenx wraps the full API. See the announcement

Christof Hoeke announced pyXSLdoc 0.51, a Python tool for generating documentation of XSLT code in a similar approach to Javadoc. Version 0.60b is actually available, but the most recent announcement is for 0.52.

I recently discovered the Simple Objects from XML (SOX) module buried deep within the Python Enterprise Application Kit (PEAK). PEAK is a components toolkit for large-scale applications (the developers claim it is as powerful as J2EE but not as complex). SOX is another XML data binding toolkit which uses SAX events to build an object the user can define based on classes set up for namespace aware or namespace oblivious usage.



1 to 1 of 1
1 to 1 of 1