
The State of Python-XML in 2004
October 13, 2004
The table below lists the currently
available Python-XML software that I judge to be significant. It is not
a list of every bit of software in Python that has anything to do with
XML. For example, I do not list pyglade (part of PyGTK), which is software for
generating user interfaces in the GNOME desktop system for UNIX. The
user interface specifications in question are in XML, but this is not
really enough to call it an XML processing tool for Python. However, you
can certainly use the tools I mention for convenient manipulation of
pyglade specifications.
The general rules of thumb for including software are, firstly,
whether it implements a technology or set of technologies strongly
associated with XML; and secondly, whether it does so in a way that is
useful for any arbitrary XML file I may want to process.
Another example of a project that doesn't fit these parameters is
Mark Pilgrim's excellent Universal Feed Parser, which parses
almost every known form of RSS and Atom newsfeed formats, including some
that are not well-formed XML. This package is not a general-purpose
tool for XML processing, but rather focused on a specific XML
vocabulary. I did make a bit of a compromise on this principle to cover
RDF packages, since even though RDF/XML is a specific XML vocabulary, it
is generally acknowledged as a valid way to express the data in any
XML.
I organize the table according to selected areas of XML technology.
This will give newcomers to Python a quick look at the coverage of XML
technologies in Python and should serve as a quick guide to where to go
to address any particular XML processing need. I have added reference
links to column articles on software I've covered in this column. I
have set a "heartbeat" rating for each project. One
heart means the project is almost inactive and three means the project
is very active. I judge this rating subjectively, according to recent
activity I can find for each project: mailing list traffic, releases,
articles, other projects that use it, etc.
In 2002 I reported 34 Python-XML projects. Last year I added 24 and
this year 16 (marked with an asterisk) for a grand total of 74. This
month alone two new projects have emerged, showing the continuing
interest in Python processing of XML. This year I added a new category,
for XML generators, with 9 entries. There has been a bloom in Python
packages for generating XML. An existing category that keeps on growing
is in Pythonic APIs or data bindings. There are 15 as of this year's
count. There is no doubt that patience for non-Pythonic ways of
processing XML has worn thin, but considering that my list may not even
be complete (rumor has it Guido van Rossum has a data-binding tool of
his own), one wonders whether this area is ripe for consolidation. At
this point I leave you to judge such matters for yourself.
XML Processing Software for Python
XML Parsng Engines
Parsing engines offer unique, low-level parsers. Many packages offer
additional capabilities, but this section mainly documents the various
low-level XML parsers for Python, on which other packages then build.
Packages do not support DTD validation unless such support is explicitly
stated. Note: There are no Python parsers that I know of that support
XML 1.1, although,
as many have remarked (1),
(2),
(3), XML 1.1 is
probably in trouble as far as adoption is concerned. |
|
| Name |
Description |
Vitality |
| PyLTXML |
PyLTXML is a Python extension wrapping the LTXML
parser. It supports DTD validation. |
 |
| cDomlette |
cDomlette is part of 4Suite. It is a fast, C-based
DOM implementation with a Python
API, and includes a wrapper of the expat
parser. It supports RELAX
NG validation. It also supports XInclude
and XML Base and XML
entity catalogs. [1].
[2].
|
|
| libxml2/python
|
This Python extension module is a wrapper for libxml.
It supports DTD validation, RELAX NG, WXS, XInclude (plus XPointer),
XML Base, and XML Catalogs. [1].
[2].
|
|
| pyRXPU
|
pyRXPU is a Python extension wrapping the RXP
XML parser. It supports DTD validation. Unfortunately, pyRXPU is
only an optional mode of building PyRXP, which in its default build
falsely claims to be an XML parser. [1]. |
|
| pyexpat
|
Pyexpat is part of PyXML and is a wrapper of the
expat parser. |
|
| qp_xml |
qp_xml is part of PyXML. It is a simple parser written
entirely in Python with no validation support. |
|
| xmlproc |
xmlproc is part of PyXML. It is a parser written
entirely in Python. It supports DTD validation and XML catalogs.
It provides API access to parsed DTD constructs. |
|
|
|
DOM
The Document Object Model is probably the best-known API for XML, and
is very well-represented in the Python world. |
|
| Name |
Description |
Vitality |
| 4DOM
|
4DOM is part of PyXML. It is a comprehensive implementation
of W3C DOM Level 2. |
|
| PIRXX |
A Python extension module for interface with Xerces and Xalan. |
|
| cDomlette |
See the "XML parsing" section. |
|
| domhelper.py * |
domhelper.py is a DOM helper module with functions
to provide some common operations on DOM, including looking up namespace
URIs and prefixes, non-recursively getting text or child elements
of a given node. |
|
| minidom |
Python versions from 2.0 up bundle a minidom
module. Minidom is a lightweight DOM implementation that is
more Pythonic. It follows the general lines of DOM Level 2. [1]. [2]. [3]. |
|
| pulldom |
Python versions from 2.0 up bundle a pulldom
module. Pulldom is a special DOM-like implementation that only
loads parts of an XML document as requested. [1]. |
|
| pxdom |
pxdom is a pure-Python DOM implementation and non-validating parser,
supporting DOM Level 3 Core, XML, Load and Save specifications.
pxdom passes the DOM Level 1 and 2 Core Test Suite. [1]. |
|
| xmlapi * |
xmlapi is a lightweight XML DOM implementation similar to minidom.
|
|
|
|
Data Bindings and Specialized APIs
SAX and DOM are perhaps the best-known XML processing APIs, but there
are many projects that strive for an API that focuses on the strengths
of Python. |
|
| Name |
Description |
Vitality |
| Anobind |
Anobind is a data-binding that provides for customized bindings
using XPath and Python patterns. It supports a subset of XPath on
the data structures, and re-serialization of XML. [1]. |
|
| ElementTree |
A library for managing any sort of hierarchical Python objects in
specialized data structures based on XML elements. It suports a
subset of XPath on the data structures. [1]. [2]. [3]. |
|
| PAX |
Part of OpenTAL, a Python-based templating system for manipulation
of XMLish data, the Pythonic API for XML (PAX) parses an XML file
into a Pythonic data structure, using iterators for some APIs. It
also provides a transformation engine. |
|
| POM |
The Python Object Model for XML is a DOM-like library, but more
closely follows Python conventions. POM objects can also enforce
DTD constraints dynamicaly during API manipulations. POM is a component
of PyNMS, a collection
of Python (and some C) modules for use in network management applications.
|
|
| Python XML Marshaller
* |
Python XML Marshaller is a Python data-binding for
XML with some WXS support, including the ability to generate WXS
from Python data structures. It also offers some features for customizing
the binding. |
|
| SOX * |
Simple Objects from XML (SOX) is a part of the Python
Enterprise Application Kit (PEAK). SOX uses SAX events to build
a Python object the user can define based on specialized classes.
|
|
| Satine |
Satine converts XML documents to Python lists of
objects that have Python attributes mirroring the XML element attributes,
called the "xlist" data structure. It also has a web services module that supports
plain XML and SOAP over HTTP. |
|
| Skyron |
Skyron is a Python module that transforms XML documents according
to simple "recipe" files expressed in XML. These recipes
bind XML data to handler code in Python. |
|
| XBind |
XBind is an XML vocabulary for specifying language-independent data
bindings. It comes with a a prototype Python implementation (see
section 7 of the XBind tutorial for a link). |
|
| XElf |
XElf is a set of modules dedicated to XML processing for Python.
It currently features a Python XOM implementation, including support
for Namespaces and XMLBase. XOM is Elliotte Rusty
Harold's XML object module for Java intended to improve upon DOM
and JDOM. |
|
| XMLObject * |
XMLObject allows you to map from customized Python classes to XML,
and vice versa. |
|
| generateDS.py |
generateDS.py is a tool for generating Python data
structures from W3C XML Schema definitions. [1]. |
|
| gnosis.xml.objectify |
This module in Gnosis Utilities turns arbitrary XML documents into
Python objects, allowing for user customization of the conversion.
[1]. |
|
| xmlite |
xmlite is a light weight XML parser and printer that emits simple
nested lists. |
|
| xmltramp |
xmltramp turns an XML document into a Python data structure with
heavy use of dictionaries. [1]. |
|
| |
XPath and XSLT
XPath and XSLT are perhaps the most universal XML processing tools.
XSLT is not just a styling tool but a full-blown (if verbose) scripting
language for XML. XPath is embedded in almost every other XML technology
you can think of. |
|
| Name |
Description |
Vitality |
| 4XSLT |
4XSLT is part of 4Suite, as is 4XPath and 4XPointer.
4XSLT supports a large portion of EXSLT.
[1]. [2]. [3]. |
|
| Pyana |
Pyana is a Python extension module wrapping the
Xalan XSLT engine. |
|
| libxslt/Python |
This Python extension module is a wrapper for libxslt. It supports a large portion of EXSLT
|
|
| |
Schema Languages (Not Built into Parsers)
Schema languages allow one to communicate XML formats, validate that
instances match the constraints, and even assess convenience features
for the XML formats. DTD is the original schema language, and is usually
implemented in XML parser (and so most implementations are covered in
the section on parsers). |
|
| Name |
Description |
Vitality |
| Scimitar
* |
Scimitar is an ISO Schematron
implementation that works by compiling a Schematron schema into
a Python validator script. |
|
| XSV |
XSV is a W3C XML Schema (WXS) implementation. It is actually one
of the first WXS implementations, and drives the W3C's on-line validator. |
|
| XVIF |
XVIF implements RELAX NG, enhanced with the XML Validation Interoperability Framework for XML processing pipelining. It includes an implementation
of XML Regular Fragmentations. 4Suite
includes experimental RELAX NG and XVIF integration through this
software. |
|
| gnosis.xml.validity |
This module in Gnosis Utilities represents XML DTD validity constraints
as Python objects. |
|
| minixsv * |
minixsv is a lightweight W3C XML Schema validator
written in pure Python. It implements a small but core subset of
the language. |
|
| |
XML Generators
These are Python tools that can be used to generate XML. |
|
| Name |
Description |
Vitality |
| Atox * |
Atox allows you to write custom scripts for converting
plain text into XML. You define the text to XML binding using a
simple XML language. It's meant to be used from the command line.
Changes since 0.1 include language improvements, added support for
config files, and XSLT fragments in Atox format files. |
|
| GraphPath * |
GraphPath is a little XPath-like language for analysing graph-structured
data, especially RDF. The implementation is python and works with
rdflib or the python binding of Redland. It includes a query evaluator
and a goal-driven inference engine. |
|
| JAXML
|
JAXML is a Python module that provides a Python
function invocation syntax for generating of XML or HTML. [1]. |
|
| Martel * |
Martel is a tool for working flat-file text-based formats into XML,
inspired by data formats popular in used in bioinformatics. It essentially
generates SAX events from the results of applying regular expressions
to text. |
|
| PXTL |
PXTL ("Python XML Templating Language" is a tool for producing
XML, HTML and other text-based document types using XML templates.
|
|
| PyGenx
* |
PyGenx is a Python wrapper for Genx
an canonical XML generation library written in C. |
|
| XMLBuilder
* |
XMLBuilder is an XML generator that works by interpreting
data in Python dictionaries. |
|
| handyxml * |
handyxml is a Python module that wraps XML parsers
and parsed DOM implementations into objects with added Pythonic
features. |
|
| xmlprinter |
A lightweight Python module to help write out well-formed XML, inspired
by Perl's XML::Writer module. [1]. |
|
| |
Protocols
One of the earliest and most discussed uses of XML is to transmit data
from one application or machine to another. These tools provide such XML
protocol facilities for use in Python. |
|
| Name |
Description |
Vitality |
| Python
Web Services |
This is a collection of Python modules for SOAP,
WSDL and related technologies. |
|
| WDDX/Python |
PyXML comes with a WDDX module for Python. |
|
| XMLTP Light |
XMLTP/L is a light weight XML-like RPC protocol
(it actually only allows a subset of XML). XMLTP/L is primarily
designed for fast RPC calls to a database server over an intranet.
It is implemented in Python and C, although bindings can also be
written in Java. |
|
| wsdl4py
|
wsdl4py is a simple Python library for WSDL processing.
See also uddi4py. |
|
| xmlrpclib |
Python versions from 2.1 up bundle XML-RPC client
and server modules. |
|
| |
RDF and Topic Maps
The Resource Description Framework is a system for managing metadata.
Its primary serialization syntax is an XML vocabulary. These are Python
tools for processing this RDF/XML syntax. |
|
| name |
description |
vitality |
| 4RDF |
4RDF is part of 4Suite. It includes an RDF/XML and
NTriples parser, RDF store system, Python triples API and an implementation
of the Versa query language. |
|
| Pyrple * |
Pyrple is a small RDF API in Python,with support
for parsing RDF/XML, N3, and N-Triples formats. |
|
| RDFLib |
RDFLib, which used to be part of Redfoot, is an
RDF/XML parser and RDF triple store. |
|
| Redfoot |
Redfoot is an RDF server written in Python. |
|
| Redland/ Python
|
This is a Python interface for the Redland RDF Application Framework.
|
|
| Rx4RDF * |
Rx4RDF is a specification and reference implementation for querying,
transforming, and updating W3C's RDF by specifying a deterministic
mapping of the RDF model to the XML data model defined by XPath.
Rx4RDF shields developers from the complexity of RDF by enabling
you to use familar XML technologies such as XPath, XSLT, and XUpdate.
Rx4RDF also forms the basis of Racoon, similar to the popular
Cocoon framework, but using RDF and Python rather than XML/XSLT
and Java. |
|
| TRAMP |
TRAMP is a data-binding-like map between RDF/XML documents and Python
objects. |
|
| rdfxml.py |
A lightweight SAX-based RDF/XML parser. |
|
| tmproc |
tmproc is a Python implementation of XML Topic Maps, based on ISO/IEC 13250 Topic
Maps. |
|
| |
Miscellany
In this category is software that does not fall into any other area.
|
|
| Name |
Description |
Vitality |
| 4XLink |
4XLink is part of 4Suite. It implements a portion of XLink. |
|
| 4XUpdate |
4XUpdate is part of 4Suite. It is a Python implementation
of XUpdate. It can be
used to apply difference patches generated by XMLDiff. [1]. |
|
| Berkeley DB XML Python Module |
Berkeley DB XML is an XML DBMS and it incudes a
Python API that mirrors the C++ and Java APIs. |
|
| Pyxie |
Pyxie is a line-oriented XML processor. |
|
| XIST |
XIST is a Python web-page generator that operates using a DOM-like view of source XML documents. |
|
| XMLFilter
* |
XMLFilter provides a fallback SAX parser/driver
to avoid SAXReaderNotAvailable errors that users encounter on some
platforms. It also offers a safety net against the XMLGenerator
bug that bit me earlier in this series. Its main feature, however,
is a framework for SAX filters. [1]. |
|
| XMLTools |
XMLTools is a small suite of tools that includes a graphical XML
tree viewer and editor for the GTK windowing library. |
|
| XMLdiff |
XMLdiff is a Python tool that figures out the significant differences
between two XML files or DOM trees. It can generate XUpdate output. |
|
| c14n.py |
c14n.py is part of PyXML. It implements XML canonicalization. [1]. |
|
| gnosis.xml.indexer
|
This module in Gnosis Utilities creates full-text indexes of XML
or plain-text files. |
|
| xml.sax |
Python versions from 2.0 up bundle a SAX module. [1]. [2]. [3]. [4]. [5]. |
|
| xmlSiteMakerPy |
xmlSiteMakerPy is a Python-based XML and XSLT framework
for offline (i.e. static) site generation. |
|
| xmlarch |
xmlarch is a XML architectural forms
processor written in Python, using SAX. |
|
| |
The Community Marches On
I'm sure I've missed some resources in this update article. If you
know of any I've neglected, please mention them in a comment to this
article and I'll be sure to take note of them for future updates. I
mention new or newly discovered resources at the end of each column
article, and I compile the updates yearly. Certainly anyone working
where Python meets XML should participate on the Python XML SIG mailing
list, and post announcements there. Doing so is the best way to be
sure that I and a lot of others are aware of your work.
This month's regular update starts with something mind-bogglingly
brilliant (if odd). My xmlhack colleague Oleg Paraschenko
created Pysch, a scheme runtime environment
in Python that he wrote expressely with the purpose of running Scheme
tools SXPath and SXSLT
under Python. Psych already runs these target packages, and according
to Paraschenko, "I think that Pysch can be used to run any Scheme code, after first using third-party tools to process the Scheme code and save
it in XML format for parsing by Psych." But there is the expected
limitation: "Pysch is very slow. I'm not going to fix it yet. I use
Pysch for research goals and not in production."
Mike Hostetler announced
XMLBuilder 1.3. "You create an XMLBuilder object, send it some
dictionary data, and it will generate the XML for you." I just
mentioned the 1.1 release last month and I only post consecutive updates
upon major changes. That certainly is the case here. It appears this
is the first actually usable version of XMLBuilder. The announcement
says "Support for non-ascii character." I hadn't realized such a
limitation in earlier releases. I applaud the author and contributors
for putting in the work to establish the "XML" in "XMLBuilder."
As I
always vehemently argue, it ain't XML if it doesn't support
Unicode. I probably have to weaken this rule a bit for XML
generation code, saving the full strictness for XML parsing code, but
I'm not comfortable with and can't recommend XML generation code that
doesn't support the full character model. See the XMLBuilder announcement.
Roland Leuthe released minixsv 0.2, "a lightweight XML schema validator written in
pure Python. It implements only a subset of the W3C XML schema [WXS] 1.0
recommendation." The WXS subset is very limited, but Leuthe admits the
package is "pre-alpha," and I'll keep an eye out for further
developments. minixsv works with the standard minidom or elementtree.
As the page says, "Other DOM implementations can be easily adapted by
implementing a newly derived XML interface class."
The major update rule also applies to my release of Scimitar 0.9.0.
Scimitar is a fast ISO
Schematron implementation that works by compiling a Schematron
schema into a Python validator script. It now supports the full draft
ISO Schematron spec, including variables and abstract patterns. See the
announcement.
Philippe Normand announced XMLObject 0.1.3, a data-binding tool that allows you to map from customized Python classes to
XML, and vice versa. See the announcement.
Fredrik Lundh released ElementTree 1.2.1.
He says: "ElementTree 1.2.1 is 1.2 plus code that takes advantage of new
expat features in newer versions of Python. As a result, the parser is
now 20-30% faster on many kinds of XML documents. Enjoy!"
For users of various .NET Python tools, Srijit Kumar Bhadra posted some useful sample code for generating XML output.
Later he posted some corrections to the code comments.
1 to 1 of 1
-
- lxml still at it
2004-10-14 04:33:11 Martijn Faassen
1 to 1 of 1