Menu

The State of the Python-XML Art

September 18, 2002

Uche Ogbuji

Welcome to the first Python-XML column. Every month I'll offer tips and techniques for XML processing in Python and close coverage of particular packages. Python is an excellent language for XML processing, and there is a wealth of tools and resources to help the intrepid developer be productive. In what follows I'll survey these tools and resources, giving a sense of how broadly Python supports XML technologies and giving you a head start on the more in-depth topics to follow.

The world of Python-XML

One of the best things about Python-XML is the active community of practitioners and contributors. From introductory texts to references to mailing lists, these resources will provide answers to most questions worth asking about Python and XML. If you are new to Python and coming from the XML perspective, Sean McGrath's article XML Processing with Python is an older but still very pertinent introduction to the area.

The Python XML SIG is the primary focus of Python work for XML, and its mailing list is a good place for discussion. The XML SIG has also produced some important general XML work such as the XML Bookmark Exchange Language (XBEL), which is now used in several Web browsers. There is also a lot of general Python-XML discussion on the 4Suite and Zope-xml mailing lists.

There's a lot of material in print on XML processing with Python. Python & XML, by Christopher A. Jones and Fred L. Drake, Jr. (O'Reilly) is a valuable book on the topic. Definitive XML Application Development, by Lars Marius Garshol (Prentice Hall), introduces XML processing, using Python throughout as the implementation language. It also uses Java and, thus, provides useful comparison for those familiar with Java-XML programming. Python Cookbook, edited by Alex Martelli and David Ascher (O'Reilly), has a section with XML recipes. Python How to Program, by Deitel, Deitel, Liperi and Wiedermann (Prentice Hall), has several chapters and a detailed case study covering XML topics. Python Web Programming, by Steve Holden and David Beazley (New Riders), has a section on XML. The first Python-XML book, XML Processing with Python, by Sean McGrath (Prentice Hall), is a bit dated in general but covers topics that none of the other books do. XML Processing with Perl, Python, and PHP, by Martin C. Brown (Sybex), devotes six chapters to Python. Most general Python books that cover version 2.0 and up will introduce the built-in XML processing libraries. Mark Lutz maintains a list of Python books. I'll be reviewing a selection of Python-XML books later on in this column.

Andrew Kuchling's Python/XML HOWTO is good starter documentation. The on-line slides for Alexandre Fayolles's EuroPython 2002 tutorial on Python-XML processing are also a useful starter. The Python Cookbook offers a good number of recipes on XML. I maintain a collection of recipes, tips and pointers on Python processing in XML. You will find many other on-line resources referenced there. The XML SIG maintains a Wiki, but it doesn't have a great deal of content yet.

Python and XML Software

The following table lists the currently available Python-XML software that I judge to be significant. It is not a list of every bit of software in Python that has anything to do with XML: for example, I do not list pyglade, which is software for generating user interfaces in the GNOME desktop system for UNIX. The user interface specifications in question are in XML, but this is not really enough to call it an XML processing tool for Python. However, you can certainly use the tools I mention for convenient manipulation of pyglade specifications. The general rules of thumb for including software are, first, whether it implements a technology or set of technologies strongly associated with XML; and, second, whether it does so in a way that is useful for any arbitrary XML file I may want to process.

I've organized the table according to the areas of XML technology. This will give newcomers to Python a quick look at the coverage of XML technologies in Python and should serve as a quick guide to where to go to address any particular XML processing need. I rate the vitality of each listed project as either "weak", "steady" or "strong" according to the recent visible activity on each project: mailing list traffic, releases, articles, other projects that use it, etc. I will often omit entries I judge to be of weak vitality in areas where there are other projects of steady or strong vitality.

     
XML processing software for Python
name description vitality
XML parsing
PyLTXML

PyLTXML is a Python extension wrapping the LTXML parser. It supports DTD validation.

steady
cDomlette

cDomlette is part of 4Suite. It is a fast C-based DOM implementation with a Python API, and includes a wrapper of the expat parser. It supports DTD validation. It also supports XInclude and XML Base.

strong
libxml/python

This Python extension module is a wrapper for libxml. It supports DTD validation.

strong
pyRXP

pyRXP is a Python extension wrapping the RXP XML parser. It supports DTD validation.

steady
pyexpat

Pyexpat is part of PyXML and is a wrapper of the expat parser. It supports DTD validation.

strong
qp_xml

qp_xml is part of PyXML. It is a simple parser written entirely in Python with no validation support.

steady
xmlproc

xmlproc is part of PyXML. It is a parser written entirely in Python. It supports DTD validation and provides API access to parsed DTD constructs.

steady
XPath, XSLT and XPointer
4XSLT

4XSLT is part of 4Suite, as is 4XPath and 4XPointer. 4XSLT supports a large portion of EXSLT.

strong
Pyana

Pyana is a Python extension module wrapping the Xalan XSLT engine.

strong
libxslt/Python

This Python extension module is a wrapper for libxslt. It supports a large portion of EXSLT.

strong
Schema languages (besides DTD)

XSV

XSV is a W3C XML Schema (WXS) implementation. It is actually one of the first WXS implementations, and drives the W3C's on-line validator.

steady
XVIF

XVIF implements RELAX NG, enhanced with the XML Validation Interoperability Framework for XML processing pipelining. It includes an implementation of XML Regular Fragmentations. 4Suite includes experimental RELAX NG and XVIF integration through this software.

steady
Protocols
Python Web Services

This is a collection of Python modules for SOAP, WSDL and related technologies.

steady
WDDX/Python

PyXML comes with a WDDX module for Python.

wsdl4py

wsdl4py is a simple Python library for WSDL processing. See also uddi4py.

steady
xmlrpclib

Python versions from 2.1 up bundle XML-RPC client and server modules.

strong
RDF and Topic Maps
4RDF

4RDF is part of 4Suite. It includes an RDF/XML and NTriples parser, RDF store system, Python triples API and an implementation of the Versa query language.

strong
Redfoot and RDFLib

Redfoot is an RDF server written in Python. RDFLib is the triple store and RDF/XML parser component.

strong
Redland/Python

This is a Python interface for the Redland RDF Application Framework.

strong
tmproc

tmproc is a Python implementation of XML Topic Maps, based on ISO/IEC 13250 Topic Maps.

strong
DOM
4DOM

4DOM is part of PyXML. It is a comprehensive implementation of W3C DOM Level 2.

steady
cDomlette

See the "XML parsing" section

strong
minidom

Python versions from 2.0 up bundle a minidom module. Minidom is a lightweight DOM implementation that is more pythonic. It follows the general lines of DOM Level 2.

strong
pulldom

Python versions from 2.0 up bundle a pulldom module. Pulldom is a special DOM-like implementation that only loads parts of an XML document as requested.

strong
Miscellany

4XLink

4XLink is part of 4Suite. It implements a portion of XLink.

weak
4XUpdate

4XUpdate is part of 4Suite. It is a Python implementation of XUpdate. It can be used to apply difference patches generated by XMLDiff.

strong
Pyxie

Pyxie is a line-oriented XML processor.

weak
XIST

XIST, "object oriented XSLT", uses an easily extensible, DOM-like view of source and target XML documents to do tree transformations.

strong
XMLTools

XMLTools is a small suite of tools that includes a graphical XML tree viewer and editor for the GTK windowing library.

strong
XMLdiff

XMLdiff is a python tool that figures out the significant differences between two XML files or DOM trees. It can generate XUpdate output.

strong
c14n.py

c14n.py is part of PyXML. It implements XML canonicalization.

strong
xml.sax

Python versions from 2.0 up bundle a SAX module.

strong
xmlarch

xmlarch is a XML architectural forms processor written in Python, using SAX.

weak

There are many Python projects for storage and network serving of data which have specialized facilities for XML documents. These include Maki, Zope, 4Suite repository and XDisect. I do not list these in the chart above because it is especially subjective as to whether these can be considered XML tools. I also plan to cover Python server frameworks with XML facilities in this column in future.

I have probably missed a few entries here and there. Please post any omissions to the comments section at the end of this article. For those working on new Python and XML goodies, do not forget to post announcements to the Python XML SIG mailing list. This is the best way to be sure that I and a lot of others are aware of your work. I'll mention new software and significant updates regularly in this column.

Into the fray...

I hope these pointers give you a good start into the world of Python and XML. In the next article I'll tour the many facilities added to core Python by the PyXML package.