XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

DOM and SAX Are Dead, Long Live DOM and SAX

DOM and SAX Are Dead, Long Live DOM and SAX

November 14, 2001

Most serious books, tutorials, or discussions of XML processing for programmers mention, even if only in passing, DOM and SAX, the two dominant APIs for handling XML. A general discussion of XML programming which failed to mention DOM and SAX would be as neglectful of its duty as would a monarchical subject who, upon entering the royal chambers, failed to acknowledge the presence of the King and Queen. Failing to acknowledge the potentate is simply one of the things one must never do.

Just as the permissible forms of obligatory acknowledgment of the royal personage are both highly ritualized and customary, so, too, are the forms with which DOM and SAX are customarily introduced. We will be told, one may rest assured, that DOM is a tree-based API, one which builds an in-memory representation of an XML instance. Likewise, we will be told that SAX is an event-driven API, one which, rather than building an in-memory representation of an XML, calls event handlers as it encounters, serially, particular features of the XML instance. The moral of this highly ritualized story is invariably: one may often wish to use DOM parsing, but SAX is helpful when the size of XML instances exceeds available system memory.

One might suppose that every competent programmer with real-world XML experience has a firm grasp upon, if not fully mastered, both DOM and SAX, and that in consequence of such widespread competence there is very little left of interest to be said about them by and to an expert audience. And yet as a recent XML-DEV discussion seems to have proved, things are neither always as one might reasonably suppose, nor are they always as purely technical as one might wish.

DOM and SAX In the Real World

The discussion commenced with a question posed by Len Bullard.

How often do you as experienced XML developers find people in your shop using DOM for work more appropriate to SAX? Have you asked them why and what do they say? What are the costs of picking the wrong API?

Innocent and simple enough. But Bullard is asking a very particular, rather interesting question: do the programmers you work with rely too much on DOM, using it at times when SAX is more appropriate?

The aggregate response to Bullard's question suggests that he was spot-on to ask it. Most respondents have seen less experienced programmers overusing DOM, and no one seemed to represent the contrary position, that of less experienced programmers overusing SAX.

The interesting follow-ons to Bullard's question are to think about why DOM is overused, why SAX isn't, and what the future of XML processing APIs might look like.

The Psychology of SAX

David Hunter's answer to Bullard was suggestive of an important reality -- namely, SAX is hard for some programmers.

A lot of programmers are not really used to event-based programming, as used by SAX. They're more comfortable working with an in-memory object model than in keeping track of context as events are passed in, etc.

Hunter voices here a theme that was repeated, with some variation, through the discussion. It seems that the event-driven nature of SAX processing discomforts some programmers enough that they use DOM processing when possible.

Many other diagnoses of this discomfort were offered. Mike Champion said,

The event processing paradigm is just plain foreign to most people who haven't dealt with low-level grammars/parsers since college, which describes the overwhelming majority of professional programmers, I suspect. (Hmmm, maybe I'm wrong ... the low-level GUI APIs are event driven ... but I'll bet lots of people can handle "OK/Cancel" button event handlers but would be overwhelmed by the detailed thought required to write a SAX application).

Michael Brennan extended this point by pointing out that the event-driven nature of SAX leads to programming by state-machine, with which some programmers also have difficulty.

...[T]he difficulty, here, is not just with the notion of event-based programming, but with conceptualizing the design of a component as a state-machine. I've found many developers have difficulty thinking in those terms for whatever reason.

It seems, then, that for some programmers, SAX's event-driven nature, and the kind of conceptual moves that it requires, are off-putting. And, of course, the flip side, which was echoed nicely by Mike Kay, is that DOM processing is more clearly seen to fit with ways of conceptualizing program flow that are far less off-putting. As Kay says,

I think for many inexperienced programmers, the imperative navigational style, where their own program is in control and issues requests to other subsystems, is the only model they really feel comfortable with. It's a control thing, a perception that the job of the programmer is to tell the computer what to do next.
Comment on this article Is SAX too hard for mortal programmers, and is the domination of DOM a bad thing? Add your opinions to the discussion in our forum.
Post your comments

Perhaps it's not just that some programmers are uncomfortable with events and state machines but that SAX is, despite its many virtues and charms, a fairly spartan interface. Bob Hutchinson suggests as much.

I've found that SAX is anything but obvious to the programmers I've worked with, even programmers with extensive GUI experience...And even after being pointed to SAX they don't always have much of an idea of how to proceed. This isn't entirely their fault. We have nice frameworks for dealing with events generated by GUIs. With SAX there is no such thing, that I'm aware of. The developer is faced with a stream of events and no framework for dealing with them. Yes, I know that you can quickly put something together but I've been doing that for years, not every one has.

Two explanations for SAX discomfort ran through the discussion: inexperience and, well, psychology. Many people find SAX hard because it's new and odd. Others find it hard, even after it's no longer new, because it requires a kind of conceptualization which they simply do not favor. There's no right or wrong here, merely the well-known fact that people's brains are wired differently. People understand this pretty well in other fields, and it should not be surprising that it's relevant to computer programming too. No one who's struggled with, or helped others struggle through, for example, Scheme and recursive functions can deny that different programming paradigms often imply the need for different kinds of conceptual ability or propensity.

Michel Rodriguez summed this point nicely by saying,

I guess I am in the vast majority of programmers that find DOM-type (tree-oriented) processing much easier to grasp than SAX processing. It feels much easier to "be in control" of the document and to act on it than to let it drive my code.

Pages: 1, 2

Next Pagearrow