Keep it Simple...

March 29, 2000

Edd Dumbill

I've been intrigued by the takeup that Sean McGrath's PYX notation has received over the last few weeks. We featured PYX in Sean's introduction to Pyxie, his open-source XML processing project.

Taking ESIS (a concept from SGML), he created a subset of the notation generated by James Clark's nsgmls, suitable for a lot of everyday XML processing. The real win is in the line-oriented aspect of the notation, which enables traditional text-processing tools such as sed and awk to be leveraged.

Since publication of that first article, we've received news of implementations of PYX processors in Java, Perl, as well as in Omnimark (see sidebar). In the same way sed was empowered on the command line, PYX notation empowers regular expression processing in other programming languages for comfortable XML handling.

PYX and Omnimark

Just to prove that the Perl and Python script junkies don't have the monopoly on hacking up good ideas, Peter Kenny has constructed some Omnimark scripts that perform XML-to-PYX translation. Kenny protested, "Why don't we see any Omnimark examples on", but happily decided to remedy his complaint. (For more on Omnimark, see their home page.)

So what's the big deal with PYX? Well, for a start, it doesn't ignore the rest of computing history. For all our obsession with design patterns and reusable software, software developers have a remarkable capacity for ignoring anything that went before. Refreshingly, PYX brings XML to existing tools, rather than having us throw away all we know in order to work with XML. In addition to building on SGML work, PYX takes advantage of the pipeline model for processing popularized by the UNIX command line, and brings XML within reach of line-oriented tools.

The other thing of note about PYX is that it's a syntax. All its benefits are achieved through a notation. As the current XML trend is to build layer upon layer, it's something of an eye-opener to see what can be done by going the other way.

Thinking about PYX has led me to draw out a couple of points to remember in XML application development. Hopefully these are obvious, but I think they merit repeating.

Stay Cool

There's no doubting that XML is an incredibly exciting technology: SAX, XSLT, the DOM, and other XML technologies all have great power. However, don't re-orient your world around them if they're not appropriate.

Perhaps one of the most tempting sidelines is to start basing your application processing around the DOM. That way madness lies. Why throw away your data model in order to manipulate a large, general purpose, greedy object model? Apart from the size issue with the DOM, you may well find yourself jumping through hoops if you need to express any data structures other than strictly hierarchical ones.

The DOM is just one illustration. The point is, make XML work for you -- not the other way around! (For examples of application strategies that make XML work for you, see Design Patterns in XML Applications by Fabio Arciniegas.)

Power in the Syntax

Don't despise XML's syntax. Programmers without a document-focused heritage may get the urge to put the XML syntax out of sight as soon as possible. Once the data is inside your program, why bother passing it around as XML? So, we're seeing (Java-)serialized DOMs passed between applications and other uses. However, that's just not XML. Serialized objects have been around for ages, very useful thank you, but they don't have the advantages of XML.

Although intuitively you might think that a "binary object" would be faster to move around than a text file, think twice. In a very interesting paper, Bruce Martin demonstrated that, in his experience, serializing and de-serializing a DOM object took significantly longer than writing and re-parsing XML!

It may well be that your only application need for now is to be able to import and export data in XML format. If that's the case, don't feel pressured into going any further -- you've already bought into a huge amount of XML's power. I get a little depressed at the thought of every existing computing technology being reinvented using "XML inside" when it doesn't need to be (often this reinvention will be slower and more awkward).

PYX reminds us that a large amount of useful work can be done without any semantic interpretation of an XML document at all. The revolution that XML is bringing about on the Web isn't about object technology, but about an interoperable syntax. It's not about shoveling XML into your application to gain buzzword karma, but about exploring the new avenues of distributed information processing offered by XML.


Some (highly recommended) further reading on these topics can be found in the archives of XML-DEV: What is XML for? and Storing Lots of Fiddly Bits. (Thanks to Leigh Dodds for these links.)