XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Gems From the Archives

Gems From the Archives

April 09, 2003

The Python XML SIG, particulary its mailing list, is the richest resource there is for those looking to use Python for XML processing. In fact, efforts such as XML Bookmark Exchange Language (XBEL), created by the XML-SIG in September of 1998 and now used in more browsers and bookmark projects than any other comparable format, demonstrate this group's value to the entire XML world. We're all developers here, though, and for developers there is nothing as valuable as running code. There has been plenty of running code emanating from the XML-SIG over the years, including PyXML and a host of other projects I have mentioned in this column. But a lot of the good stuff is buried in examples and postings of useful snippets on the mailing list, and not readily available elsewhere.

In this and in subsequent articles I will mine the richness of the XML-SIG mailing list for some of its choicest bits of code. I start in this article with a couple of very handy snippets from 1998 and 1999. Where necessary, I have updated code to use current APIs, style, and conventions in order to make it immediately useful to readers. All code in this article was tested using Python 2.2.1 and PyXML 0.8.2.

A note on style. Throughout this column I try to stick to the dictates of PEP 8: Style Guide for Python Code, which is based on an older essay by Guido van Rossum, and formalized to instruct contributors to core Python code. I have also updated code by others in this article to meet these guidelines, which I call "PEP 8 style".

Lars Marius Garshol's XML Output Generator

I have discussed XML output frequently in this column. I think it's essential for developers to be strictly correct in the XML they produce. Lars Marius Garshol posted an early XMLWriter class in the thread "XBEL DTD as a meta-dtd". This class is different and much simpler than the SAX tools xml.sax.saxutils.XMLGenerator, which I covered in my last article, and xml.sax.writer.XmlWriter, which I may cover in the future. Lars says that "it is written for data-oriented documents, not document-like ones". This is probably because there isn't a way to produce mixed content using this class as-is, though it would be fairly easy to extend it to overcome this limitation.

Listing 1: Lars Marius Garshol's XMLWriter class

# A simple XML-generator# Originally Lars Marius Garshol, September 1998
# http://mail.python.org/pipermail/xml-sig/1998-September/000347.html
# Changes by Uche Ogbuji April 2003
# *  unicode support: accept encoding argument and use Python codecs
#    for correct character output
# *  switch from deprecated string module to string methods
# *  use PEP 8 style

import sys
import codecs

class XMLWriter:

    def __init__(self, out=sys.stdout, encoding="utf-8", indent=u"  "):
        """
        out      - a stream for the output
        encoding - an encoding used to wrap the output for unicode
        indent   - white space used for indentation
        """
        wrapper = codecs.lookup(encoding)[3]
        self.out = wrapper(out)
        self.stack = []
        self.indent = indent
        self.out.write(u'<?xml version="1.0" encoding="%s"?>\n' \
                       % encoding)

    def doctype(self, root, pubid, sysid):
        """
        Create a document type declaration (no internal subset)
        """
        if pubid == None:
            self.out.write(
                u"<!DOCTYPE %s SYSTEM '%s'>\n" % (root, sysid))
        else:
            self.out.write(
                u"<!DOCTYPE %s PUBLIC '%s' '%s'>\n" \
                % (root, pubid, sysid))
        
    def push(self, elem, attrs={}):
        """
        Create an element which will have child elements
        """
        self.__indent()
        self.out.write("<" + elem)
        for (a, v) in attrs.items():
            self.out.write(u" %s='%s'" % (a, self.__escape_attr(v)))
        self.out.write(u">\n")
        self.stack.append(elem)

    def elem(self, elem, content, attrs={}):
        """
        Create an element with text content only
        """
        self.__indent()
        self.out.write(u"<" + elem)
        for (a, v) in attrs.items():
            self.out.write(u" %s='%s'" % (a, self.__escape_attr(v)))
        self.out.write(u">%s</%s>\n" \
                       % (self.__escape_cont(content), elem))

    def empty(self, elem, attrs={}):
        """
        Create an empty element
        """
        self.__indent()
        self.out.write(u"<"+elem)
        for a in attrs.items():
            self.out.write(u" %s='%s'" % a)
        self.out.write(u"/>\n")
        
    def pop(self):
        """
        Close an element started with the push() method
        """
        elem=self.stack[-1]
        del self.stack[-1]
        self.__indent()
        self.out.write(u"</%s>\n" % elem)
    
    def __indent(self):
        self.out.write(self.indent * (len(self.stack) * 2))
    
    def __escape_cont(self, text):
        return text.replace(u"&", u"&amp;")\
               .replace(u"<", u"&lt;")

    def __escape_attr(self, text):
        return text.replace(u"&", u"&amp;") \
               .replace(u"'", u"&apos;").replace(u"<", u"&lt;")

The code is pretty straightforward, in part because it does not make any special provision for XML namespaces, though you can generate namespace declarations and qualified names by hand. You could start adding support for mixed content by adding a method like this:

def content(self, content):
        """
        Create simple text content as part of a mixed content element
        """
        self.out.write((self.__escape_cont(content))

The following snippet is an example of using XMLWriter. It generates a simple XML Software Autoupdate (XSA) output. XSA is an XML data format, incidentally also designed by the prolific Lars Marius Garshol, for listing and describing software packages.

from listing1 import XMLWriter

writer = XMLWriter()
writer.doctype(
    u"xsa", u"-//LM Garshol//DTD XML Software Autoupdate 1.0//EN//XML",
    u"http://www.garshol.priv.no/download/xsa/xsa.dtd")
#Notice: there is no error checking to ensure that the root element
#specified in the doctype matches the top-level element generated
writer.push(u"xsa")
#Another element with child elements
writer.push(u"vendor")
#Element with simple text (#PCDATA) content
writer.elem(u"name", u"Centigrade systems")
writer.elem(u"email", u"info@centigrade.bogus")
writer.elem(u"vendor", u"Centigrade systems")
#Close currently open element ("vendor)
writer.pop()
#Element with an attribute
writer.push(u"product", {u"id": u"100\u00B0"})
writer.elem(u"name", u"100\u00B0 Server")
writer.elem(u"version", u"1.0")
writer.elem(u"last-release", u"20030401")
#Empty element
writer.empty(u"changes")
writer.pop()
writer.pop()

The following is the output generated from the above code:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xsa PUBLIC '-//LM Garshol//DTD XML Software Autoupdate 1.0//EN//XML' 
'http://www.garshol.priv.no/download/xsa/xsa.dtd'>
<xsa>
    <vendor>
        <name>Centigrade systems</name>
        <email>info@centigrade.bogus</email>
        <vendor>Centigrade systems</vendor>
    </vendor>
    <product id='100°'>
        <name>100° Server</name>
        <version>1.0</version>
        <last-release>20030401</last-release>
        <changes/>
    </product>
</xsa>

Notice the correct handling of non-ASCII output.

If you are interested in a bit of a mind-bending experimentation in XML output using Python idioms, see this message by Greg Stein. As an example of the very interesting perspective it provides, the following snippet should generate a bit of XHTML:

f = Factory()
body = f.body(bgcolor='#ffffff').p.a(href='l.html').img(src='l.gif')
html = f.html[f.head.title('title'), body]  

I think you'll agree this is delightfully twisted.

Sean McGrath's Very Simple Tree Widget XML Viewer

XML has a natural tree structure and so graphical tree widgets for viewing and editing XML are an old and obvious solution. Some find them very intuitive and some find them clumsy. I'm mostly in the latter camp, but I certainly acknowledge that it's important to be able to make tree viewers available where they may suit the user. The excellent wxPython GUI toolkit makes such things rather easy, and Sean McGrath posted a simple tree viewer for wxWindows. It handles elements and content, but not attributes. It would be relatively simple to extend it to cover most cases. wxPython is cross-platform, supporting UNIX, Windows and Mac OS.

Listing 2: Sean McGrath's wxPython tree widget for viewing XML files
"""
Build a GUI Tree (wxWindows) from an XML file using pyexpat
"""
# Originally Sean McGrath, September 1999
# http://mail.python.org/pipermail/xml-sig/1999-July/001350.html
# Changes by Uche Ogbuji April 2003
# *  update to handle unicode objects coming from pyexpat
# *  switch from deprecated string module to string methods
# *  use PEP 8 style

import sys
import codecs
from xml.parsers import pyexpat

ENCODING = "utf-8"
#raw_encode = codecs.lookup("utf-8")[0]
#encode = lambda s, r=raw_encode: r(s)[0]

from wxPython.wx import *

class MyFrame(wxFrame):
    def __init__(self, parent, id, title):
        wxFrame.__init__(self, parent, id, title, wxPoint(100, 100),
			 wxSize(160, 100))
	menu = wxMenu()
	menu.Append(1001, "Open")
	menu.Append(1002, "Close")
	menu.Append(1003, "Exit")
	menubar = wxMenuBar()
	menubar.Append(menu, "File")
	self.SetMenuBar(menubar)

class MyApp(wxApp):
    def OnInit(self):
        self.frame = MyFrame(NULL, -1, "Tree View of XML")
	self.tree = wx.wxTreeCtrl(self.frame, -1)
	EVT_MENU(self, 1001, self.OnOpen)
	EVT_MENU(self, 1002, self.OnClose)
	EVT_MENU(self, 1003, self.OnExit)
	self.frame.Show(true)
	self.SetTopWindow(self.frame)
	return true

    def OnOpen(self, event):
        f = wxFileDialog(self.frame, "Select a file", ".", "",
			 "*.xml", wxOPEN)
	if f.ShowModal() == wxID_OK:
	    LoadTree(f.GetPath())
		
    def OnClose(self, event):
        self.tree = wx.wxTreeCtrl(self.frame, -1)
	pass

    def OnExit(self, event):
        self.OnCloseWindow(event)

    def OnCloseWindow(self, event):
        self.frame.Destroy()

NodeStack = []

# Define a handler for start element events
def StartElement(name, attrs):
    global NodeStack
    NodeStack.append(app.tree.AppendItem(NodeStack[-1],
                                         name.encode(ENCODING)))


def EndElement(name):
    global NodeStack
    NodeStack = NodeStack[:-1]

def CharacterData(data):
    global NodeStack
    if data.strip():
        app.tree.AppendItem(NodeStack[-1], data.encode(ENCODING))

def LoadTree(f):
    print "Loading:", f
    # Create a parser
    Parser = pyexpat.ParserCreate()
    
    # Tell the parser what the start element handler is
    Parser.StartElementHandler = StartElement
    Parser.EndElementHandler = EndElement
    Parser.CharacterDataHandler = CharacterData

    # Parse the XML File
    ParserStatus = Parser.Parse(open(f, 'r').read(), 1)
    if ParserStatus == 0:
        print "Parser error."
	raise SystemExit

if __name__ == "__main__":
    app = MyApp(0)
    NodeStack = [app.tree.AddRoot("Root")]

    app.MainLoop()
    raise SystemExit

I tested this with wxPython 2.4.0.6. There were a few incidental errors and warnings, such as an admonition to use the True and False boolean constants rather than 0 and 1. These constants come with the new bool type in Python 2.3, so I ignored the warnings. Figure 1 shows what the resulting window looks like.

Figure 1: A screen shot of the tree widget generated by listing 2
Figure 1: A screen shot of the tree widget generated by listing 2

If you are looking for a more sophisticated widget that allows editing as well as viewing XML, or you prefer the GTK user interface toolkit to wxWindows, you may find that XMLTree or XMLEditor (distributed as a bundle called XMLTools) is the ticket.

To Be Continued...

There is more useful code available in the XML-SIG archives, and I will return to this topic in the future, presenting updates of other useful code from the archives. Don't hesitate to delve in yourself and get a head start; if you do, feel free to attach comments to this article or post them to the XML-SIG mailing list.

Moving from old to new Python-XML development, the following events of the past month are of interest.

Jean-Francois Touchette introduced XMLTP Light (XMLTP/L), a lightweight XML-based RPC protocol. XMLTP/L is primarily designed for fast RPC calls to a database server over an intranet, allowing only a subset of XML for simplicity. It is implemented in Python and C, although bindings can be written in Java. Jean-Francois discusses the package in detail in a Linux Journal article, " The Rookery: XMLTP/L, XMLTP Light".

SOAPpy 0.9.8 emerged as an unfortunately stealthy release. SOAPpy is part of the Web Services for Python project; this release is a fairly significant update. The installation now uses distutils, and there are other convenience and interoperability tweaks. It looks as if SOAPpy 0.9.9 is in pre-release mode and should be upon us soon. Announcements and discussion appear to be limited to the pywebsvcs-talk forum on SourceForge.net.

    

Also in Python and XML

Processing Atom 1.0

Should Python and XML Coexist?

EaseXML: A Python Data-Binding Tool

More Unicode Secrets

Unicode Secrets

I announced the release of 4Suite 1.0a1. With this release 4Suite has finally started down the path to 1.0. I have also installed a logging IRC bot for the #4suite IRC channel on irc.freenode.net. You can navigate the logs (which are presented weblog-style) online. We often post example code, Q&A sessions and relevant links on the log, so please do keep an eye on it. Better yet, join us on the IRC channel: all Python-XML discussion is on-topic, though we focus mostly on 4Suite. For more information on 4Suite, see my earlier article on it.

Important note on the last article. I must have made an error in testing the code in my last article to be sure it works without PyXML installed. It turns out that because of at least one rather glaring bug in the Python 2.2 SAX library, you cannot run the example without installing PyXML 0.8.2 or more recent. I shall try to see that the XML modules in Python 2.3 are updated from the PyXML project so that the new version is not similarly limited. I apologize for any inconvenience. Once you have PyXML installed all the code in the article will work just fine.