XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

DOM and SAX Are Dead, Long Live DOM and SAX
by Kendall Grant Clark | Pages: 1, 2

The Social Dominance of DOM

Len Bullard, in a follow up to his own question, gave a bit more insight into why he'd asked it.

I'm curious about this subject because I keep seeing people who learn just the barebones, get to DOM, then use it for everything...Is XML just that hard to learn, too obscure, too different, or is this just ossification brought on by years of copying code and not looking beneath?

Which offers another way of looking at this issue. In addition to the (for some) psychological oddness of SAX, there's also a kind of social dominance of the DOM. It is, after all, the W3C's blessed XML processing API. It has vastly more corporate marketing and sales and training resources behind it.

Some people who answered Bullard's question suggested that they knew of XML programmers, who were competent using DOM, who simply did not know about SAX at all.

Several others mentioned the issue of programming platform dominance, which, in the XML world, comes down to Microsoft and Java, both of which are rather DOM-friendly, even if for different reasons.

Michael Brennan fingered a very crucial reason for DOM being very widely used:

The case of MSXML offers another good example of why many programmers use the DOM. Microsoft's DOM includes integrated XPath support. Developers can load XML into a DOM, then easily query the structure with XPath to extract the data they are after. Switching to SAX adds substantial complexity to the code, which now has to deal with state management strategies to keep track of where it is in the document at any moment.

I've found many developers in the Microsoft world take the integrated XPath support for granted and don't realize that it is not a standard DOM feature (yet).

Which actually suggests that sometimes, perhaps often, XPath is just the right tool for the job, and its association -- whether formal or informal -- with DOM matters. But in looking for reasons for DOM's overuse at the expense of wider SAX usage, it's hard to overestimate the degree to which the dominant client-side computing platform, the dominant web browser, and one of the dominant server platforms are all Microsoft products, that MSXML is widely used in all three products, and that it is DOM and XPath-friendly.

The utility of XPath suggests that its role in the repertoire of XML processing may well expand in the future. Again, Michael Brennan made this point as clearly as anyone.

I've been eyeing the dom4j, SAXPath, and Jaxen stuff with great interest, lately...[T]his notion of registering a handler to match subtrees based on XPath is very interesting. Using XPath as the glue between object models that can support an infoset abstraction is also very interesting. We commonly load XML into a DOM just so we can leverage XPath. We use tools for mapping XML elements/attributes to internal data structures and functions using XPath expressions. It would be great to have that same abstraction and ease of implementation without having to load a DOM to do it...

I hope this sort of approach gains wider acceptance and adoption. I think having the sort of abstraction that Jaxen affords offers far greater potential in the long run than looking toward the DOM (or even SAX) as the glue between XML and other object models.

In the open source community, XML programming often means Java programming. Of course open source languages, like Perl and Python and many others, have good or even excellent XML support. But Java is at the top of the heap in terms of number of tools, corporate support for those tools, number of training materials, and so on. While SAX support is very good in Java, there is an embarrassment of riches from which to choose when considering Java DOM programming, as Bob McWhirter suggested.

I think that in the open-source Java world, focus has been more on the infoset than on any given object-model. Since we have JDOM, dom4j, EXML, along with normal DOM, and only certain utilities are supported under certain models (ie, Xalan won't work with dom4j Documents directly), there's been a lot of work on translating one model to another.

Then, you have things like dom4j's ElementHandler interfaces, which allow folks who are used to processing object trees deal appropriately with very large datasets. You can register a handler to match particular subtrees. Do whatever processing you need (including XPath expressions), and then detach the sub-tree, freeing up memory for the rest of the parse...

In my experience, it's not just DOM vs SAX, but competition between the DOMs (sometimes mixing several in the same application) and SAX. And typically, dom4j's sub-tree mechanisms have keep me from having to venture into hard-to-maintain SAX code.

In short, when processing XML in Java, there are often good technical reasons why the DOM will work and the additional (for some) burdens of doing it the SAX way can be avoided. It's not at all clear that this richness of DOM implementations owes to a technical feature of Java itself; it's rather more likely that it's a function of Java's market dominance in server-side Internet projects, which was one of the first areas of obvious XML utility.

The Way Forward

Also in XML-Deviant

The More Things Change

Agile XML

Composition

Apple Watch

Life After Ajax?

Given this confluence of technical and social and psychological factors, the decision of whether to use DOM or SAX or both or neither is a lot more complicated than the standard account often suggests. The issues are much more complex than simply memory usage.

And it may be a strategically crucial decision, as Bob Hutchinson reminded us.

What's the consequence of getting it wrong? Serious trouble. You end up with slow, ugly, unmaintainable code. Worse, I've seen developers using the resultant mess to avoid using XML altogether (we really are still in the early days of XML).

So what, as Mike Champion said, is the way forward? Well, as with any other monarchy, there are always anti-monarchists hanging about, waiting to depose the King and Queen, desperate to offer the masses an alternative. As go kingdoms, so goes the XML world. Several alternatives to DOM and SAX were mentioned, including XML data binding, XML pull parser, and other combinations of tree and event, in-memory and seriatim processing.

The XML development community doesn't need to be told that technical alternatives to dominant paradigms are important, but sometimes it may need to be reminded of it, which, I think, may be one of the virtues of the xml-dev list, which tends to hash and rehash and re-rehash the same technical issues. While that can often look like and be just wasted motion, it can also be a spark that fans the flames of alternative approaches and ideas, and that's a very good thing.

Finally, it should be remembered, as Bob Hutchinson wisely pointed out, despite the occasional bout of technical ennui, these really are the early days of XML. Despite their dominance, DOM and SAX each have their own warts and vices. It is fully to be expected that XML developers will eventually depose DOM and SAX from their high perch, but probably only by making them the foundation of every other high-level XML processing API of the future.

Two participants in the recent discussion suggested precisely that, and I will give them the last word. Paul Tchistopolskii predicted that new, high-level APIs are coming.

My prediction is that the era of low-level lexer (called SAX) and low-level model (called DOM) is over and there will be soon more high-level bindings on top of these low-level APIs (or not on top of them).

I think that asking developers to write all the code in terms of SAX or DOM APIs is like asking them to write programs in assembly language.

Michael Brennan concurred in principle.

I agree entirely. DOM and SAX will be the domain for applications doing generic XML processing. Developers trying to solve business problems will be using tools that abstract away all XML-specific APIs -- either using transformation technologies with high-level modeling tools or declarative mapping/transformation languages, or using data-binding technologies that hide XML beneath simpler and more familiar object models.

DOM and SAX have reigned for about as long as there have been XML documents to process. Let's hope they endure long enough to spawn more powerful and more graceful heirs.



1 to 5 of 5
  1. Advanced SAX Tutorial
    2001-11-26 15:38:02 Remus Pereni
  2. DOS 'v' SAX
    2001-11-25 19:32:42 Ron Savage
  3. SAX Rules and DOM drools
    2001-11-21 06:28:24 Mike Jasnowski
  4. How about Digester?
    2001-11-18 14:10:18 dion gillard
  5. Always the same thing!, come on!
    2001-11-16 07:50:20 Eduardo Yanez
1 to 5 of 5