Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

DOM and SAX Are Dead, Long Live DOM and SAX
by Kendall Grant Clark | Pages: 1, 2

The Social Dominance of DOM

Len Bullard, in a follow up to his own question, gave a bit more insight into why he'd asked it.

I'm curious about this subject because I keep seeing people who learn just the barebones, get to DOM, then use it for everything...Is XML just that hard to learn, too obscure, too different, or is this just ossification brought on by years of copying code and not looking beneath?

Which offers another way of looking at this issue. In addition to the (for some) psychological oddness of SAX, there's also a kind of social dominance of the DOM. It is, after all, the W3C's blessed XML processing API. It has vastly more corporate marketing and sales and training resources behind it.

Some people who answered Bullard's question suggested that they knew of XML programmers, who were competent using DOM, who simply did not know about SAX at all.

Several others mentioned the issue of programming platform dominance, which, in the XML world, comes down to Microsoft and Java, both of which are rather DOM-friendly, even if for different reasons.

Michael Brennan fingered a very crucial reason for DOM being very widely used:

The case of MSXML offers another good example of why many programmers use the DOM. Microsoft's DOM includes integrated XPath support. Developers can load XML into a DOM, then easily query the structure with XPath to extract the data they are after. Switching to SAX adds substantial complexity to the code, which now has to deal with state management strategies to keep track of where it is in the document at any moment.

I've found many developers in the Microsoft world take the integrated XPath support for granted and don't realize that it is not a standard DOM feature (yet).

Which actually suggests that sometimes, perhaps often, XPath is just the right tool for the job, and its association -- whether formal or informal -- with DOM matters. But in looking for reasons for DOM's overuse at the expense of wider SAX usage, it's hard to overestimate the degree to which the dominant client-side computing platform, the dominant web browser, and one of the dominant server platforms are all Microsoft products, that MSXML is widely used in all three products, and that it is DOM and XPath-friendly.

The utility of XPath suggests that its role in the repertoire of XML processing may well expand in the future. Again, Michael Brennan made this point as clearly as anyone.

I've been eyeing the dom4j, SAXPath, and Jaxen stuff with great interest, lately...[T]his notion of registering a handler to match subtrees based on XPath is very interesting. Using XPath as the glue between object models that can support an infoset abstraction is also very interesting. We commonly load XML into a DOM just so we can leverage XPath. We use tools for mapping XML elements/attributes to internal data structures and functions using XPath expressions. It would be great to have that same abstraction and ease of implementation without having to load a DOM to do it...

I hope this sort of approach gains wider acceptance and adoption. I think having the sort of abstraction that Jaxen affords offers far greater potential in the long run than looking toward the DOM (or even SAX) as the glue between XML and other object models.

In the open source community, XML programming often means Java programming. Of course open source languages, like Perl and Python and many others, have good or even excellent XML support. But Java is at the top of the heap in terms of number of tools, corporate support for those tools, number of training materials, and so on. While SAX support is very good in Java, there is an embarrassment of riches from which to choose when considering Java DOM programming, as Bob McWhirter suggested.

I think that in the open-source Java world, focus has been more on the infoset than on any given object-model. Since we have JDOM, dom4j, EXML, along with normal DOM, and only certain utilities are supported under certain models (ie, Xalan won't work with dom4j Documents directly), there's been a lot of work on translating one model to another.

Then, you have things like dom4j's ElementHandler interfaces, which allow folks who are used to processing object trees deal appropriately with very large datasets. You can register a handler to match particular subtrees. Do whatever processing you need (including XPath expressions), and then detach the sub-tree, freeing up memory for the rest of the parse...

In my experience, it's not just DOM vs SAX, but competition between the DOMs (sometimes mixing several in the same application) and SAX. And typically, dom4j's sub-tree mechanisms have keep me from having to venture into hard-to-maintain SAX code.

In short, when processing XML in Java, there are often good technical reasons why the DOM will work and the additional (for some) burdens of doing it the SAX way can be avoided. It's not at all clear that this richness of DOM implementations owes to a technical feature of Java itself; it's rather more likely that it's a function of Java's market dominance in server-side Internet projects, which was one of the first areas of obvious XML utility.

The Way Forward

Also in XML-Deviant

The More Things Change

Agile XML

Composition

Apple Watch

Life After Ajax?

Given this confluence of technical and social and psychological factors, the decision of whether to use DOM or SAX or both or neither is a lot more complicated than the standard account often suggests. The issues are much more complex than simply memory usage.

And it may be a strategically crucial decision, as Bob Hutchinson reminded us.

What's the consequence of getting it wrong? Serious trouble. You end up with slow, ugly, unmaintainable code. Worse, I've seen developers using the resultant mess to avoid using XML altogether (we really are still in the early days of XML).

So what, as Mike Champion said, is the way forward? Well, as with any other monarchy, there are always anti-monarchists hanging about, waiting to depose the King and Queen, desperate to offer the masses an alternative. As go kingdoms, so goes the XML world. Several alternatives to DOM and SAX were mentioned, including XML data binding, XML pull parser, and other combinations of tree and event, in-memory and seriatim processing.

The XML development community doesn't need to be told that technical alternatives to dominant paradigms are important, but sometimes it may need to be reminded of it, which, I think, may be one of the virtues of the xml-dev list, which tends to hash and rehash and re-rehash the same technical issues. While that can often look like and be just wasted motion, it can also be a spark that fans the flames of alternative approaches and ideas, and that's a very good thing.

Finally, it should be remembered, as Bob Hutchinson wisely pointed out, despite the occasional bout of technical ennui, these really are the early days of XML. Despite their dominance, DOM and SAX each have their own warts and vices. It is fully to be expected that XML developers will eventually depose DOM and SAX from their high perch, but probably only by making them the foundation of every other high-level XML processing API of the future.

Two participants in the recent discussion suggested precisely that, and I will give them the last word. Paul Tchistopolskii predicted that new, high-level APIs are coming.

My prediction is that the era of low-level lexer (called SAX) and low-level model (called DOM) is over and there will be soon more high-level bindings on top of these low-level APIs (or not on top of them).

I think that asking developers to write all the code in terms of SAX or DOM APIs is like asking them to write programs in assembly language.

Michael Brennan concurred in principle.

I agree entirely. DOM and SAX will be the domain for applications doing generic XML processing. Developers trying to solve business problems will be using tools that abstract away all XML-specific APIs -- either using transformation technologies with high-level modeling tools or declarative mapping/transformation languages, or using data-binding technologies that hide XML beneath simpler and more familiar object models.

DOM and SAX have reigned for about as long as there have been XML documents to process. Let's hope they endure long enough to spawn more powerful and more graceful heirs.


Comment on this articleIs SAX too hard for mortal programmers, and is the domination of DOM a bad thing? Add your opinions to the discussion in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • Advanced SAX Tutorial
    2001-11-26 15:38:02 Remus Pereni [Reply]

    In my company I'm worried about the misuse of SAX! Most of my colleagues try to stay informed, and SAX do present a couple of advantages in most cases, one of them the ability to work on large documents. The result is that most of them use SAX for everything which doesn't have to change the XML document. This would be fine if SAX would be in all the cases a superior model but there are particular situation in which DOM would be more appropriate. In inertia they almost forgot about DOM.


    Another issue SAX is not difficult, all of them managed to get the basics pretty fast, the problem is when things get a bit complicated it is very easy to produce a messy piece of code which does the job. At the end it does it, but it's terrible to look at it or to work with it.


    Last time I tried to find a good tutorial on some advanced SAX techniques, found none. Maybe that would be a good start, an tutorial on SAX which would go a bit beyond the usual I have an XML with 3 fields. Let's see how the BIG ONES work and think with SAX on some real situations!


  • DOS 'v' SAX
    2001-11-25 19:32:42 Ron Savage [Reply]

    I happened to write a short article on this yesterday (25-Nov):
    http://savage.net.au/Ron/XML-ramble.html

  • SAX Rules and DOM drools
    2001-11-21 06:28:24 Mike Jasnowski [Reply]

    Okay, it's not that bad. But my experience with the two finds that SAX provides more control than DOM. Even though DOM Level 2 now provides a event system, SAX still provides the granularity that I need without the overhead and clunky interfaces. Even though my experience with event based programming may give me some bias towards SAX's ease of use, I still use DOM regularly. SAX may be harder for some to get their brains around because you have control over how you want the data model to look whereas DOM makes that decision for you. DOM could be considered the AOL of XML API's (Kidding!). For most developers purposes, they don't care what the data model looks like, they just wan't access to the it and thus the benefits of SAX may seem foreign.

    • SAX Rules and DOM drools
      2001-11-21 06:30:32 Mike Jasnowski [Reply]

      Just to clarify the last sentence of my post, "Most developers don't care about the data model". That was meant in the general sense, not mean that developers don't care at all.

  • How about Digester?
    2001-11-18 14:10:18 dion gillard [Reply]

    It's an Apache Jakarta Commons project, with a couple of stable releases behind it. It mixes the best of both worlds.


    You specify your rules by 'coding' them, and it uses a SAX parser to fire them off.

    • How about Digester?
      2001-11-24 23:04:37 Jeff Turner [Reply]

      Just what I was thinking. It's a layer on top of SAX, where you register certain actions to be taken when XPath-specified * patterns are met.


      The URL is:


      http://jakarta.apache.org/commons/digester.html


      But most of the docs (incl. examples) are at:


      http://jakarta.apache.org/commons/digester/commons-digester-1.1.1/docs/api/index.html



      * (Well not XPath, just '/foo/bar'-type patterns)

  • Always the same thing!, come on!
    2001-11-16 07:50:20 Eduardo Yanez [Reply]

    We always as developers have to balance between performance and resource requirements. SAX requires few memory resouces while DOM can demand large memory spaces. I think that we (the developers) have to decide when use one or the other because each of them have its own purpose and one is better than the other depending on the project objectives. There will be always low level APIs and high level APIs and XML wonīt be the exception (exampl: DOM-JDOM). The low level apis donīt die due to the high level ones.
    (exampl: OpenGL-OpenInventor).