Should Python and XML Coexist?
by Uche Ogbuji
|
Pages: 1, 2
There Is Such a Thing as Overdoing It
It's one thing to say that XML is often not the best choice for configuration and scripting in Python applications, but one has to be careful not to overstate this fact. Phillip comes close to doing so in his post Python Is Not Java.
This is a different situation than in Java, because compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain. In Python, XML is something you use for interoperability, not your core functionality, because you simply don't need it for that. In Java, XML can be your savior because it lets you implement domain-specific languages and increase the flexibility of your application "without coding." In Java, avoiding coding is an advantage because coding means recompiling. But in Python, more often than not, code is easier to write than XML. And Python can process code much, much faster than your code can process XML. (Not only that, but you have to write the XML processing code, whereas Python itself is already written for you.)
He goes on at length, but it's really just different ways of restating this core paragraph. The second sentence is where overstatement comes in: if XML is used where it should best be used, and ditto for Python code, it shouldn't even make sense to try to compare the two in the same terms. Unfortunately, one sees a lot of misguided overgeneralization starting with valid complaints about where XML is not suitable. The rest of the quoted paragraph is more careful, and it's clear Phillip is not claiming, for example, that people use Python code rather than HTML (or XHTML) to express Web pages. XML is no more a good code format than Python is a good document format.
XML is the result of the meeting of two very distinct worlds: the database/data structure worlds and the document management world. As a result, XML is reasonably suitable for expressing data structures, and reasonably so for documents as well. I personally argue that XML is much more suited for documents than for data structures, but this is a long-standing debate in the XML community. I do, however, observe that dissatisfaction with XML seems to emerge much more loudly when XML is used to express data structures. The biggest complaints of the document crowd with XML, in my estimation, are its lack of minimization tricks (as in SGML) and its lack of support for overlapping markup. On the other hand, programmer types are much more prone to call in the entire reason for XML's being, sometimes to the point of overreacting.
I personally consider this to be evidence that the trend toward injecting more and more of the character of programming languages and databases into XML is deeply misguided. W3C XML Schema and XQuery do even more to blur the line between applications and semistructured data. Developers in languages such as Java see this as a good thing because they already rely so heavily on XML that ever-closer union seems natural. Unfortunately, the message is not always clear that users of dynamic languages should consider less complex and rigid alternatives such as RELAX NG and XPath. I have long said that I would rather use Python and XPath to access XML documents and even XML data stores than XQuery, but being familiar with Java/XML APIs, I can understand why XQuery would be attractive in that case.
In an interesting twist on this whole matter, even in languages such as Java, there is some backlash emerging against overuse of XML. Some developers rue the need for complex XML in scripting scenarios where it might have been better to use a language such as Jython, which is already tightly integrated into the host language, and is far better suited to writing code than XML.
Conclusion
There is plenty of room for discussion about where XML can be useful to Python programmers, and where it can be a hindrance. There is also plenty of room to discuss which XML-related technologies are well suited to use with Python, and which might be best avoided. I'll cover such matters in coming articles. Meanwhile, it's great to see that the Python community has been doing a lot more than just complaining about XML.
Starting close to home, I pushed Amara XML Toolkit to version 1.0. I've covered Amara here in the past (Introducing the Amara XML Toolkit and Making Old Things New Again). Amara's centerpiece is Bindery, a very Pythonic XML API. The biggest change is a package option that incorporates the prerequisites (from 4Suite), in order to remove one installation step. You no longer need anything except for Python to install Amara from one package in one step. See the announcement.
Walter Dörwald announced XIST 2.11, which I covered in the recent article Writing and Reading XML with XIST . It's a very capable, open source package for XML and HTML processing and generation. The biggest change is script, xml2xsc.py, which parses a sample XML instance and generates sub classes for XIST. See the announcement for the full catalog of changes, including many more fixes and a few minor API updates.
Christof Hoeke just keeps on pushing out packages. This time it's cssutils 0.8a3, "a Python package to parse and build CSS Cascading Style Sheets." cssutils implements portions of DOM Level 2 Stylesheets CSS interfaces. See the announcement.
Alexander Schremmer announced MoinMoin 1.3.5. MoinMoin is a widely used Wiki server written in Python. It offers several XML features, including Docbook content and XSLT rendering, which saw some work in this release as the 4Suite compatibility was improved. Many more details in the announcement.
Also in Python and XML | |
John Holland announced pyx12 1.2.0. "Pyx12 is a HIPAA X12 document validator and converter. It parses an ANSI X12N data file and validates it against the Implementation Guidelines for a HIPAA transaction. By default, it creates a 997 response. It can create an html representation of the X12 document or can translate to an XML representation of the data file." This release focuses on XML-output format adjustments and additions, with some bugs and performance tweaks. See the announcement.
J. David Ibáñez released itools 0.10.0, a collection of utilities.
It includes some XML-related modules including: itools.xml (a parser
with some similarities to pulldom), itools.schemas, itools.rss (RSS
2.0), itools.xliff (XLIFF--XML
Localization Interchange File Format), itools.xhtml, itools.tmx (TMX--Translation
Memory eXchange). It also includes Simple Template Language (STL), a
language for embedding template-processing instructions in XHTML. See the announcement.
Julien Oster announced xmlrpcserver 0.99.1. "xmlrpcserver is a simple to use but fairly complete XML-RPC server module for Python, implemented on top of the standard module xmlrpclib. This module may, for example, be used in CGIs, inside application servers or within an application, or even standalone as an HTTP server waiting for XML-RPC requests." See the announcement, which includes a complete code example.
Following up on my article Wrestling HTML I wrote a couple of articles detailing further experiences turning an HTML mess into clean XHTML. In Use Amara to Parse/Process (Almost) any HTML I showed how to use the HTML tidy command line to feed HTML to Amara. In Beyond HTML Tidy I give a workout to John Cowan's TagSoup command line tool as well as BeautifulSoup.
Leslie Michael Orchard announced a module xslfilter.py for WSGI--Python Web Server Gateway Interface. It uses lxml to optionally run XSLT transforms against XML produced by server code, and send the result to the client.
Radovan Garabík's announcement of unicode 0.4.7 is a nice follow-up to my last article, which included discussion of the Python Unicode database module. Radovan's tool is a "simple python command line utility that displays properties for a given unicode character, or searches unicode database for a given name," building on that database.
Share your experience in our forums.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Odd synopsis of article
2005-09-01 22:18:52 Uche Ogbuji [Reply]
There is a strange entry on this article on Pythonthreads:
http://www.pythonthreads.com/news/latest/xml-is-useful-with-java-but-with-python......html
I think it's meant to be a synopsis, except that it mixes points from this article with snippets from PJE's Weblog, including pieces I did not quote in this article. Not sure what to make of it, but I worry that it might give the wrong impression of this article.
- XML/RDF/OWL vs. MKR
2005-08-30 04:37:44 Richard H. McCullough [Reply]
I don't like XML/RDF/OWL from the aspect of human
engineering. They are very verbose, which makes
them hard to read and write. Furthermore, they
are not well suited to expressing the meaning of
things in the real world.
I like MKR (because I invented it and because)
it is more more user-friendly. It's like a
compatible combination of English, UNIX shell,
Unicon (or Python), and CycL. MKR has built-in
constructs for handling context, definitions,
methods and ontologies.
There is an MKR interface to the Stanford TAP
knowledge base (RDF) and the OpenCyc knowledge
base (CycL), but the interface is not a fully
general translation between MKR/RDF and MKR/CycL.
MKE/MKR is open source, with binaries for Windows
and Linux. See http://rhm.cdepot.net/
- XML
2005-08-27 08:52:13 gedave [Reply]
I've been programming in different languages for over 20 years. I remember the days when one had to pay $5000 for Fortran on VMS. The days when everyone one had custom data and configuration file formats that required endless tedious documentation in order for anyone to use the format.
Things are different now. CPU's have the power to allow ASCII format of any kind to be feasible for config and data storage. We have cheap programming languages and numerous options when applying a language to a task.
Now I'm writing a real-time database. We have so much data that storage in ASCII is not an alternative. Fortunately, there is the freely available HDF5 format.
Our users want their data loaded into MS Office tools. I wish they supported HDF5, but a suitable alternative is that they support both HTML and XML.
When I consider how much work technologies such as HDF5 and XML saves me in terms of compatiblity with other peoples' software, I am extremely thankful that these "standards" exist. I've never used Python. My expertise is in C++ and Perl. I love scripting languages. I'm sure I would love Python. But if I had to give up one of the technologies I have available for my tasks today, I would give up Perl. I have waited too long for standard file formats to be so popular and with so much free support, even to the point that Bill Gates must conform, that I never want to go back to the way things were.
By the way, I developed a set of C++ classes that slurp up XML files into data structures that are very similar to Perl data structures. C++ is so powerful that with the right classes, many of the advantages of a scripting language are attainable.
:-David
- XML
2005-08-29 15:48:56 Uche Ogbuji [Reply]
I agree about the importance of open data formats. See:
http://www-128.ibm.com/developerworks/xml/library/x-think15/
As for C++, I certainly do not agree. I spent 6 years programming in C++, and though Java seemed to me to be only an incremental improvement, REXX and then Python came as such a breath of fresh air that I cannot imagine how I programmed in such a straitjacket before. I tried just about every class library under the sun, and nothing even comes within miles of approaching the flexibility and productivity of dynamic languages.
- XML
- weblog entry on Python and XML
2005-08-26 08:12:00 Martijn Faassen [Reply]
Months ago I posted a weblog entry sparked
by PJE's first posting. So, for those who
are interested in further reading:
http://faassen.n--tree.net/blog/view/weblog/2005/01/30/0
- weblog entry on Python and XML
2005-08-29 15:40:00 Uche Ogbuji [Reply]
That's a very well put response.
- weblog entry on Python and XML
- python wrappers to xml
2005-08-25 14:40:57 antont [Reply]
in the interactive python notebooks project, http://www.scipy.org/wikis/featurerequests/NoteBook
where originally the idea was to have the notebooks as Python, it was decided to use XML instead as the document format.
i've been working on the document processing part now, trying to develop nice wrappers to the XML so that a nice Python API would still be provided (now mainly for the gui app that's developed for writing the notebooks, i.e. nbshell). feel free to look at the development version at http://projects.scipy.org/ipython/ipython/file/nbdoc/trunk/notabene/notebook.py
the solution is also problematic, 'cause some of the (data) structure information is duplicated -- e.g. the Log object holds a list of cells, but also the corresponding log xml element (accessible as Log.element) has the Cell.elements as xmltree xubelements. but i think this still provides good performance, compared to e.g. xpath queries, and nicer interface (e.g. slicing to get a range of cells).
so this is one example where Python and XML coxist, and are pretty integrated. it's still early in development and feasibility is uncertain, but well, it's a working application too :)
here there's no little languages though, so i'm a bit off topic, but coexistence there sure is.
~Toni
- python wrappers to xml
2005-08-29 15:38:42 Uche Ogbuji [Reply]
Good to hear. I do want to note that the title was a rhetorical question. Python and XML cn certainly coexist. I'm glad to hear the partnership is useful for you, as it has been for me.
- python wrappers to xml
