XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Writing and Reading XML with XIST
by Uche Ogbuji | Pages: 1, 2

Completing the Document

If you look carefully at Listing 1, you'll notice that what I've created is really just the top-level XSA element, and not the entire XML document. There is no XML declaration, and no XSA document type declaration (which is required for it to be a valid XSA document). XIST does allow for all this added detail. To create a full XML document you use an ll.xist.xsc.Frag object, which can gather together all the needed nodes, including declarations. Listing 3 illustrates this. You can run it by just pasting in part one from the top of Listing 1. I didn't reproduce Part 1 in order to save space.

Listing 3: Using XIST to Generate a Proper XSA Document

XSA_PUBLIC = 
 "-//LM Garshol//DTD XML Software Autoupdate 1.0//EN//XML"
XSA_SYSTEM = 
 "http://www.garshol.priv.no/download/xsa/xsa.dtd"

class xsa_doctype(xsc.DocType):
    """
    Document type for XSA
    """
    def __init__(self):
     xsc.DocType.__init__(
       self, 'xsa PUBLIC "%s" "%s"'%(XSA_PUBLIC, XSA_SYSTEM)
     )

doc = xsc.Frag(
    xml.XML10(),
    xsa_doctype(),
    xsa(
        vendor(
            name(u"Centigrade systems"),
            email(u"info@centigrade.bogus"),
        ),
        product(
            name(u"100\u00B0 Server"),
            version(u"1.0"),
            last_release(u"20030401"),
            changes(),
            id = u"100\u00B0"
        )
    )
)

print doc.asBytes(encoding="utf-8")

This time I create an explicit document type declaration class and bundle this into a document fragment along with an instance of ll.xist.ns.xml.XML10, which represents the XML declaration. Listing 4 shows the resulting output. Again the actual output is all on one line, but I have inserted line feeds for formatting reasons.

Listing 4: Output from the Variation in Listing 3

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE xsa
PUBLIC 
 "-//LM Garshol//DTD XML Software Autoupdate 1.0//EN//XML"
"http://www.garshol.priv.no/download/xsa/xsa.dtd">
<xsa><vendor><name>Centigrade systems</name>
<email>info@centigrade.bogus</email></vendor>
<product id="100">
  <name>100 Server</name>
  <version>1.0</version>
  <last-release>20030401</last-release><changes></changes>
</product></xsa>

Reading XML

XIST provides parsers that you can use to read XML into the sorts of XIST data structures I describe above. It's really quite simple, so I'll get right to it. Listing 5 is a simple example using XIST to parse a Docbook instance.

Listing 5: Using XIST to Parse an XML Document

from ll.xist import xsc
from ll.xist import parsers
#You must import this XIST namespace module, otherwise you
#get a validation error because the parser does not Know the
#vocabulary
from ll.xist.ns import docbook

DOC = """\
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
<article>
  <articleinfo>
  <title>DocBook article example</title>
  <author>
    <firstname>Uche</firstname>
    <surname>Ogbuji</surname>
  </author>
  </articleinfo>
  <section label="main">
    <title>Quote from "I Try"</title>
    <blockquote>
     <attribution>Talib Kweli</attribution>
     <para>
     Life is a beautiful struggle
     People search through the rubble for a suitable hustle
     Some people using the noodle
     Some people using the muscle
     Some people put it all together,
     make it fit like a puzzle
      </para>
    </blockquote>
  </section>
</article>
"""

doc = parsers.parseString(DOC)

I'll work interactively from this listing to show some of the tree navigation facilities for XIST trees. First I'll show how to use XIST iterators to search for the blockquote element.

$ python -i listing5.py
>>> blockquotes = doc.walk(xsc.FindTypeAll(docbook.blockquote))
>>> bq = blockquotes.next()
>>> print bq

      Talib Kweli

        Life is a beautiful struggle
        People search through the rubble
        for a suitable hustle
        Some people using the noodle
        Some people using the muscle
        Some people put it all together,
        make it fit like a puzzle


>>> print bq.asBytes()
<blockquote>
      <attribution>Talib Kweli</attribution>
      <para>
        Life is a beautiful struggle
        People search through the rubble
        for a suitable hustle
        Some people using the noodle
        Some people using the muscle
        Some people put it all together,
        make it fit like a puzzle
      </para>
    </blockquote>
>>>

The walk method creates an iterator over the nodes in document order. xsc.FindTypeAll creates a filter that restricts the iterator to find all instances of all the given elements within the subtree. There is also xsc.FindType, which searches only the immediate children of the node. So, to find the attribution of the quote:

>>> attribs =
       bq.content.walk(xsc.FindTypeAll(docbook.attribution))
>>> attrib = attribs.next()
>>> print attrib
Talib Kweli
>>>

Once you find an element of interest, it's trivial to access one of its attributes. They are available as if items in a dictionary.

>>> sections =
                  doc.walk(xsc.FindTypeAll(docbook.section))
>>> sect = sections.next()
>>> print sect[u"label"]
main
>>>

XIST also takes advantage of Python's operator overloading to support a language in some ways like XPath, but given as Python expressions rather than strings (Unicode objects, to be precise). This language is called XFind. The examples in the documentation look very interesting, but I had some trouble getting the expected results from XFind expressions. I couldn't be sure whether it was something I was doing wrong or quirks in the library, so I'll leave exploring XFind more deeply for another time. I encourage you to experiment with XFind, though. Many people have called for such a pure Python take on XPath, and it looks as if XIST is well on its way down this road.

Wrap Up

    

Also in Python and XML

Processing Atom 1.0

Should Python and XML Coexist?

EaseXML: A Python Data-Binding Tool

More Unicode Secrets

Unicode Secrets

It's surprising that XIST is such a dark horse. It has been around for a long time. It has a lot of very original and interesting capabilities. It's pretty well documented, and has a mature feel about it. Yet I had never tried it before working on this article, and I don't think I know of anyone else who had. Based on my experimentation, it is definitely worth serious consideration when you're looking for a Python-esque XML processing toolkit. The extremely object-oriented framework can feel a bit heavy, but I can appreciate some of the resulting benefits, and it would certainly suit some users' tastes very well. I should also mention that there is a lot more to XIST that I was able to cover in this article. I didn't touch on its support for different HTML and XHTML vocabularies, XML namespaces, XML entities, validation and content models, tree modification, pretty printing, image manipulation, and more.

I could only find one new development to report on regarding XML in the Python space. It's the interesting news that Fred Drake, Pythonista extraordinaire, appears to have started chipping in on the ZSI project for Python Web services. He made the announcement of ZSI 1.7. For those who are still interested in mainstream Web services, this is surely great news.



1 to 1 of 1
  1. XIST
    2005-03-23 03:35:50 Cito
1 to 1 of 1