
The State of the Python-XML Art
Welcome to the first Python-XML column. Every month I'll offer tips and techniques for XML processing in Python and close coverage of particular packages. Python is an excellent language for XML processing, and there is a wealth of tools and resources to help the intrepid developer be productive. In what follows I'll survey these tools and resources, giving a sense of how broadly Python supports XML technologies and giving you a head start on the more in-depth topics to follow.
The world of Python-XML
One of the best things about Python-XML is the active community of practitioners and contributors. From introductory texts to references to mailing lists, these resources will provide answers to most questions worth asking about Python and XML. If you are new to Python and coming from the XML perspective, Sean McGrath's article XML Processing with Python is an older but still very pertinent introduction to the area.
|
Related Reading
|
The Python XML SIG is the primary focus of Python work for XML, and its mailing list is a good place for discussion. The XML SIG has also produced some important general XML work such as the XML Bookmark Exchange Language (XBEL), which is now used in several Web browsers. There is also a lot of general Python-XML discussion on the 4Suite and Zope-xml mailing lists.
There's a lot of material in print on XML processing with Python. Python & XML, by Christopher A. Jones and Fred L. Drake, Jr. (O'Reilly) is a valuable book on the topic. Definitive XML Application Development, by Lars Marius Garshol (Prentice Hall), introduces XML processing, using Python throughout as the implementation language. It also uses Java and, thus, provides useful comparison for those familiar with Java-XML programming. Python Cookbook, edited by Alex Martelli and David Ascher (O'Reilly), has a section with XML recipes. Python How to Program, by Deitel, Deitel, Liperi and Wiedermann (Prentice Hall), has several chapters and a detailed case study covering XML topics. Python Web Programming, by Steve Holden and David Beazley (New Riders), has a section on XML. The first Python-XML book, XML Processing with Python, by Sean McGrath (Prentice Hall), is a bit dated in general but covers topics that none of the other books do. XML Processing with Perl, Python, and PHP, by Martin C. Brown (Sybex), devotes six chapters to Python. Most general Python books that cover version 2.0 and up will introduce the built-in XML processing libraries. Mark Lutz maintains a list of Python books. I'll be reviewing a selection of Python-XML books later on in this column.
Andrew Kuchling's Python/XML HOWTO is good starter documentation. The on-line slides for Alexandre Fayolles's EuroPython 2002 tutorial on Python-XML processing are also a useful starter. The Python Cookbook offers a good number of recipes on XML. I maintain a collection of recipes, tips and pointers on Python processing in XML. You will find many other on-line resources referenced there. The XML SIG maintains a Wiki, but it doesn't have a great deal of content yet.
Python and XML Software
The following table lists the currently available Python-XML software that I judge to be significant. It is not a list of every bit of software in Python that has anything to do with XML: for example, I do not list pyglade, which is software for generating user interfaces in the GNOME desktop system for UNIX. The user interface specifications in question are in XML, but this is not really enough to call it an XML processing tool for Python. However, you can certainly use the tools I mention for convenient manipulation of pyglade specifications. The general rules of thumb for including software are, first, whether it implements a technology or set of technologies strongly associated with XML; and, second, whether it does so in a way that is useful for any arbitrary XML file I may want to process.
I've organized the table according to the areas of XML technology. This will give newcomers to Python a quick look at the coverage of XML technologies in Python and should serve as a quick guide to where to go to address any particular XML processing need. I rate the vitality of each listed project as either "weak", "steady" or "strong" according to the recent visible activity on each project: mailing list traffic, releases, articles, other projects that use it, etc. I will often omit entries I judge to be of weak vitality in areas where there are other projects of steady or strong vitality.
| name | description | vitality |
|---|---|---|
| XML parsing | ||
| PyLTXML |
PyLTXML is a Python extension wrapping the LTXML parser. It supports DTD validation. |
steady |
| cDomlette |
cDomlette is part of 4Suite. It is a fast C-based DOM implementation with a Python API, and includes a wrapper of the expat parser. It supports DTD validation. It also supports XInclude and XML Base. |
strong |
| libxml/python |
This Python extension module is a wrapper for libxml. It supports DTD validation. |
strong |
| pyRXP |
pyRXP is a Python extension wrapping the RXP XML parser. It supports DTD validation. |
steady |
| pyexpat |
Pyexpat is part of PyXML and is a wrapper of the expat parser. It supports DTD validation. |
strong |
| qp_xml |
qp_xml is part of PyXML. It is a simple parser written entirely in Python with no validation support. |
steady |
| xmlproc |
xmlproc is part of PyXML. It is a parser written entirely in Python. It supports DTD validation and provides API access to parsed DTD constructs. |
steady |
| XPath, XSLT and XPointer | ||
| 4XSLT |
4XSLT is part of 4Suite, as is 4XPath and 4XPointer. 4XSLT supports a large portion of EXSLT. |
strong |
| Pyana |
Pyana is a Python extension module wrapping the Xalan XSLT engine. |
strong |
| libxslt/Python |
This Python extension module is a wrapper for libxslt. It supports a large portion of EXSLT. |
strong |
| Schema languages (besides DTD) | ||
| XSV |
XSV is a W3C XML Schema (WXS) implementation. It is actually one of the first WXS implementations, and drives the W3C's on-line validator. |
steady |
| XVIF |
XVIF implements RELAX NG, enhanced with the XML Validation Interoperability Framework for XML processing pipelining. It includes an implementation of XML Regular Fragmentations. 4Suite includes experimental RELAX NG and XVIF integration through this software. |
steady |
| Protocols | ||
| Python Web Services |
This is a collection of Python modules for SOAP, WSDL and related technologies. |
steady |
| WDDX/Python |
PyXML comes with a WDDX module for Python. |
|
| wsdl4py |
wsdl4py is a simple Python library for WSDL processing. See also uddi4py. |
steady |
| xmlrpclib |
Python versions from 2.1 up bundle XML-RPC client and server modules. |
strong |
| RDF and Topic Maps | ||
| 4RDF |
4RDF is part of 4Suite. It includes an RDF/XML and NTriples parser, RDF store system, Python triples API and an implementation of the Versa query language. |
strong |
| Redfoot and RDFLib |
Redfoot is an RDF server written in Python. RDFLib is the triple store and RDF/XML parser component. |
strong |
| Redland/Python |
This is a Python interface for the Redland RDF Application Framework. |
strong |
| tmproc |
tmproc is a Python implementation of XML Topic Maps, based on ISO/IEC 13250 Topic Maps. |
strong |
| DOM | ||
| 4DOM |
4DOM is part of PyXML. It is a comprehensive implementation of W3C DOM Level 2. |
steady |
| cDomlette |
See the "XML parsing" section |
strong |
| minidom |
Python versions from 2.0 up bundle a minidom module. Minidom is a lightweight DOM implementation that is more pythonic. It follows the general lines of DOM Level 2. |
strong |
| pulldom |
Python versions from 2.0 up bundle a pulldom module. Pulldom is a special DOM-like implementation that only loads parts of an XML document as requested. |
strong |
| Miscellany | ||
| 4XLink |
4XLink is part of 4Suite. It implements a portion of XLink. |
weak |
| 4XUpdate |
4XUpdate is part of 4Suite. It is a Python implementation of XUpdate. It can be used to apply difference patches generated by XMLDiff. |
strong |
| Pyxie |
Pyxie is a line-oriented XML processor. |
weak |
| XIST |
XIST, "object oriented XSLT", uses an easily extensible, DOM-like view of source and target XML documents to do tree transformations. |
strong |
| XMLTools |
XMLTools is a small suite of tools that includes a graphical XML tree viewer and editor for the GTK windowing library. |
strong |
| XMLdiff |
XMLdiff is a python tool that figures out the significant differences between two XML files or DOM trees. It can generate XUpdate output. |
strong |
| c14n.py |
c14n.py is part of PyXML. It implements XML canonicalization. |
strong |
| xml.sax |
Python versions from 2.0 up bundle a SAX module. |
strong |
| xmlarch |
xmlarch is a XML architectural forms processor written in Python, using SAX. |
weak |
There are many Python projects for storage and network serving of data which have specialized facilities for XML documents. These include Maki, Zope, 4Suite repository and XDisect. I do not list these in the chart above because it is especially subjective as to whether these can be considered XML tools. I also plan to cover Python server frameworks with XML facilities in this column in future.
I have probably missed a few entries here and there. Please post any omissions to the comments section at the end of this article. For those working on new Python and XML goodies, do not forget to post announcements to the Python XML SIG mailing list. This is the best way to be sure that I and a lot of others are aware of your work. I'll mention new software and significant updates regularly in this column.
Into the fray...
I hope these pointers give you a good start into the world of Python and XML. In the next article I'll tour the many facilities added to core Python by the PyXML package.
Did Uche miss your favorite Python-XML tool? Share it in our forum.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Another XML seaching / editing tool
2003-10-09 08:00:15 Mikhail Grushinskiy [Reply]
http://xmlstar.sourceforge.net/
Has support for PyX:
XML -> PYX
PYX -> XML
and other features
to:
Check or validate XML files (simple well-formedness check, DTD, XSD, RelaxNG)
Calculate values of XPath expressions on XML files (such as running sums, etc)
Search XML files for matches to given XPath expressions
Apply XSLT stylesheets to XML documents (including EXSLT support, and passing parameters to stylesheets)
Query XML documents (ex. query for value of some elements of attributes, sorting, etc)
Modify or edit XML documents (ex. delete some elements)
Format or "beautify" XML documents (as changing indentation, etc)
Fetch XML documents using http:// or ftp:// URLs
Browse tree structure of XML documents (in similar way to 'ls' command for directories)
Include one XML document into another using XInclude
XML c14n canonicalization
Escape/unescape special XML characters in input text
Print directory as XML document
- Gnosis
2003-05-13 18:11:37 Doug Tillman [Reply]
Just another word about Gnosis - this library is great. Easy to use and fast. Moreover, the creator, David Mertz, very obligingly and promptly replies to emails if you encounter an issue with the code.
- Gnosis Utils
2002-09-25 08:43:24 David Mertz [Reply]
As with all of Uche's writing, this was a nice, clear article. Still, I want to put in a little plug for the utility set that I created (with the generous help of a number of contributors). I think that for the tasks Gnosis Utils accomplishes, there are not really any other tools that do the same thing:
BACKGROUND: Gnosis Utilites contains a number of Python libraries,
most (but not all) related to working with XML. These include:
gnosis.xml.pickle (XML pickling of Python objects)
gnosis.xml.objectify (Any XML to "native" Python objects)
gnosis.xml.validity (Enforce validity constraints)
gnosis.xml.indexer (XPATH indexing of XML documents)
gnosis.indexer (Full-text indexing/searching)
[...].convert.txt2html (Convert ASCII source files to HTML)
gnosis.util.dtd2sql (DTD -> SQL 'CREATE TABLE' statements)
gnosis.util.sql2dtd (SQL query -> DTD for query results)
gnosis.util.xml2sql (XML -> SQL 'INSERT INTO' statements)
gnosis.util.combinators (Combinatorial higher-order functions)
gnosis.util.introspect (Introspect Python objects)
...and so much more! :-)
The current release is always available as:
http://gnosis.cx/download/Gnosis_Utils-current.tar.gz
- ParsedXML
2002-09-20 04:45:54 Martijn Faassen [Reply]
(Whoops, sorry about the empty message above)
ParsedXML is a DOM implementation created by Zope
corporation and maintained by me where the DOM
tree can be persistent in Zope:
http://www.zope.org/Members/faassen/ParsedXML
- ParsedXML
2002-09-25 08:10:29 Uche Ogbuji [Reply]
I wanted to note that I am aware of PyXML. I posted a disclaimer that I avoided listing of Python server frameworks that included XML processing capabilities because they are hard to quantify. ParsedXML cannot really be used outsize Zope (if I'm right), which is why I didn't list it.
I do plan in future to cover each of these server frameworks in dedicated articles. I will surely cover ParsedXML in my article on Zope.
Meanwhile, I will be looking out for ParsedXML announcements for my monthly "happenings" section of the column.
Thanks.
- ParsedXML
2002-09-27 04:45:34 Martijn Faassen [Reply]
ParsedXML cannot be easily used outside Zope,
that's true, understood. Parts of it can be though; the DOM core is fairly useful outside Zope though it depends on some Zopeisms (in particular some extensionclass features). The unit tests for the DOM
are very extensive and perfectly useful outside
Zope; in fact I've tried them against the 4DOM in
PyXML (there were many failures however :).
- ParsedXML
- ParsedXML
