Should Python and XML Coexist?
by Uche Ogbuji | Pages: 1, 2
There Is Such a Thing as Overdoing It
It's one thing to say that XML is often not the best choice for configuration and scripting in Python applications, but one has to be careful not to overstate this fact. Phillip comes close to doing so in his post Python Is Not Java.
This is a different situation than in Java, because compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain. In Python, XML is something you use for interoperability, not your core functionality, because you simply don't need it for that. In Java, XML can be your savior because it lets you implement domain-specific languages and increase the flexibility of your application "without coding." In Java, avoiding coding is an advantage because coding means recompiling. But in Python, more often than not, code is easier to write than XML. And Python can process code much, much faster than your code can process XML. (Not only that, but you have to write the XML processing code, whereas Python itself is already written for you.)
He goes on at length, but it's really just different ways of restating this core paragraph. The second sentence is where overstatement comes in: if XML is used where it should best be used, and ditto for Python code, it shouldn't even make sense to try to compare the two in the same terms. Unfortunately, one sees a lot of misguided overgeneralization starting with valid complaints about where XML is not suitable. The rest of the quoted paragraph is more careful, and it's clear Phillip is not claiming, for example, that people use Python code rather than HTML (or XHTML) to express Web pages. XML is no more a good code format than Python is a good document format.
XML is the result of the meeting of two very distinct worlds: the database/data structure worlds and the document management world. As a result, XML is reasonably suitable for expressing data structures, and reasonably so for documents as well. I personally argue that XML is much more suited for documents than for data structures, but this is a long-standing debate in the XML community. I do, however, observe that dissatisfaction with XML seems to emerge much more loudly when XML is used to express data structures. The biggest complaints of the document crowd with XML, in my estimation, are its lack of minimization tricks (as in SGML) and its lack of support for overlapping markup. On the other hand, programmer types are much more prone to call in the entire reason for XML's being, sometimes to the point of overreacting.
I personally consider this to be evidence that the trend toward injecting more and more of the character of programming languages and databases into XML is deeply misguided. W3C XML Schema and XQuery do even more to blur the line between applications and semistructured data. Developers in languages such as Java see this as a good thing because they already rely so heavily on XML that ever-closer union seems natural. Unfortunately, the message is not always clear that users of dynamic languages should consider less complex and rigid alternatives such as RELAX NG and XPath. I have long said that I would rather use Python and XPath to access XML documents and even XML data stores than XQuery, but being familiar with Java/XML APIs, I can understand why XQuery would be attractive in that case.
In an interesting twist on this whole matter, even in languages such as Java, there is some backlash emerging against overuse of XML. Some developers rue the need for complex XML in scripting scenarios where it might have been better to use a language such as Jython, which is already tightly integrated into the host language, and is far better suited to writing code than XML.
There is plenty of room for discussion about where XML can be useful to Python programmers, and where it can be a hindrance. There is also plenty of room to discuss which XML-related technologies are well suited to use with Python, and which might be best avoided. I'll cover such matters in coming articles. Meanwhile, it's great to see that the Python community has been doing a lot more than just complaining about XML.
Starting close to home, I pushed Amara XML Toolkit to version 1.0. I've covered Amara here in the past (Introducing the Amara XML Toolkit and Making Old Things New Again). Amara's centerpiece is Bindery, a very Pythonic XML API. The biggest change is a package option that incorporates the prerequisites (from 4Suite), in order to remove one installation step. You no longer need anything except for Python to install Amara from one package in one step. See the announcement.
Walter Dörwald announced XIST 2.11, which I covered in the recent article Writing and Reading XML with XIST . It's a very capable, open source package for XML and HTML processing and generation. The biggest change is script, xml2xsc.py, which parses a sample XML instance and generates sub classes for XIST. See the announcement for the full catalog of changes, including many more fixes and a few minor API updates.
Christof Hoeke just keeps on pushing out packages. This time it's cssutils 0.8a3, "a Python package to parse and build CSS Cascading Style Sheets." cssutils implements portions of DOM Level 2 Stylesheets CSS interfaces. See the announcement.
Alexander Schremmer announced MoinMoin 1.3.5. MoinMoin is a widely used Wiki server written in Python. It offers several XML features, including Docbook content and XSLT rendering, which saw some work in this release as the 4Suite compatibility was improved. Many more details in the announcement.
Also in Python and XML
John Holland announced pyx12 1.2.0. "Pyx12 is a HIPAA X12 document validator and converter. It parses an ANSI X12N data file and validates it against the Implementation Guidelines for a HIPAA transaction. By default, it creates a 997 response. It can create an html representation of the X12 document or can translate to an XML representation of the data file." This release focuses on XML-output format adjustments and additions, with some bugs and performance tweaks. See the announcement.
J. David Ibáñez released itools 0.10.0, a collection of utilities.
It includes some XML-related modules including:
itools.xml (a parser
with some similarities to pulldom),
Localization Interchange File Format),
Memory eXchange). It also includes Simple Template Language (STL), a
language for embedding template-processing instructions in XHTML. See the announcement.
Julien Oster announced xmlrpcserver 0.99.1. "xmlrpcserver is a simple to use but fairly complete XML-RPC server module for Python, implemented on top of the standard module xmlrpclib. This module may, for example, be used in CGIs, inside application servers or within an application, or even standalone as an HTTP server waiting for XML-RPC requests." See the announcement, which includes a complete code example.
Following up on my article Wrestling HTML I wrote a couple of articles detailing further experiences turning an HTML mess into clean XHTML. In Use Amara to Parse/Process (Almost) any HTML I showed how to use the HTML tidy command line to feed HTML to Amara. In Beyond HTML Tidy I give a workout to John Cowan's TagSoup command line tool as well as BeautifulSoup.
Leslie Michael Orchard announced a module xslfilter.py for WSGI--Python Web Server Gateway Interface. It uses lxml to optionally run XSLT transforms against XML produced by server code, and send the result to the client.
Radovan Garabík's announcement of unicode 0.4.7 is a nice follow-up to my last article, which included discussion of the Python Unicode database module. Radovan's tool is a "simple python command line utility that displays properties for a given unicode character, or searches unicode database for a given name," building on that database.