Should Python and XML Coexist?
Recently there have been some discussions in the Python community about whether and where XML is useful. As I've mentioned before, the Python community tends to be rather hostile to XML. The recent round of discussions has mixed some of that raw scorn with a bit of nuance, and it seems a good time to examine some of the considerations that shape the intersection between these two popular and powerful technologies.
Avoiding XML Sit Ups
Just to pick a good place to start the discussion, let's look at the particular XML usage scenario that has sparked recent discussions. Phillip J. Eby is one of the core developers at the Open Source Applications Foundation, where the primary project is Chandler, an enterprise grade groupware application written in Python. The project includes a component architecture called Parcels, which were originally expressed in XML. Recently, the decision was made to move from XML to Python code itself for expressing parcels. Eby discussed this decision in a weblog post with the provocative title Chandler Begins Recovery from XML. The contents are somewhat less inflammatory.
Some of you may be thinking back to my Python Is Not Java rant, in which I said that using XML for core application functionality like this was, well, unwise. :) At the PyCon Chandler sprint, it was discovered that the Chandler's homegrown XML schema definition language was a terrible hardship on developers, and so I proposed to replace it with a descriptor-based Python API. That migration was completed recently. With that done, only initialization of data items (such as Chandler's UI components) was done using XML. So, a few weeks ago, I implemented an experimental API for initializing data items, which quickly became quite popular, with some even pointing out the advantages of being able to factor out repetition.
More about that "Python Is Not Java" rant in a bit, but first more from the more recent article:
For a while, there was also a proposal to create a new XML format just for UI definition. But my counterproposal for using a simple template class and a classmethod instead was met with great rejoicing.
Many people misunderstood and/or misrepresented my previous position on XML; the case of Chandler should help to clarify it. Chandler still uses XML for WebDAV, for .xrc files, for sharing, and numerous other use cases where it makes at least some sense to do so. The parcel.xml format, however, was pure excise: a verbose additional language to do things that are more cleanly (and efficiently) done in Python code. It was developed to serve a vision of Chandler as a "data-driven" system, and it was supposed to ultimately support things like GUI editors.
Of course, the real sin here was not so much XML per se, as overengineering in advance of requirements. If you're not developing the feature now, it's best not to make a bunch of other design decisions based on what you think the feature will need. A little thing like choosing to put data in XML form can result in a wide variety of additional costs...
Java (and C++, etc.) certainly do have a lot to do with this matter. Such languages might do the job for general applications programming, but the basic restrictions of most static languages make them a poor choice for little languages within host applications. If you need a configuration or script file of some sort for an application written in Java, you probably would not want to use Java for the language for that file. There are all sorts of little languages that are used for such cases, but XML has been dominating the scene lately. It has the advantage of being relatively straightforward to process, internationalized, portable, flexible, and extensible. Certainly Java developers these days see XML everywhere, from Apache Ant to J2EE server configuration.
When migrating from languages such as Java to dynamic languages such as Python it's easy to forget to reassess the value of XML as a little language. There is much less need for separate little languages in the case of a dynamic language, though. If you define configuration and scripting for an application in a dynamic host language, and you use the same language to express the script, the instructions in the script files can be directly executed in the context of the host process, which provides a tremendous amount of flexibility. (It might also open up security issues if you allow script files from external sources, but I'll skirt that issue in this discussion.) This does not mean that there is never a need for a separate little language. After all, there isn't a loud cry for a Python regular expression syntax, but separate little languages are not often needed in Python programs. Using Python itself for scripting gives you the full power of Python, and the script author is not restricted to simple key-value style parameters.
If you pay close attention, Phillip's complaints are all about using little languages in general within Python programs and really have nothing special to say about XML. The point is that if you don't need to invent a new syntax that the Python developer needs to learn, you shouldn't do so, because doing so is erecting an unnecessary hurdle. This should be simple common sense rather than an exhibit in the case against XML.
This basic reasoning applies to most dynamic languages, not just Python, and in many other dynamic languages XML is the easiest target among possible little language formats, because of its popularity. The cutest turn of phrase in this campaign comes from Ruby, where the popular Ruby on Rails framework uses the following blurb:
Rails is a full-stack, open source web framework in Ruby for writing real-world applications with joy and less code than most frameworks spend doing XML sit-ups.
The emphasis is mine. I guess this phrase is intended to speak to J2EE developers, who are used to working through layers upon layers of XML in order to set up configuration. I think you already tend to find much less of that in dynamic language projects, and in fact I'd suggest that you are just as likely to find ".ini"-style file formats as XML in dynamic language projects. Of course, ".ini" can be a poor choice for the same reasons as XML.
Pages: 1, 2