Uses and Tradeoffs

November 17, 1999

Uses and Tradeoffs


Part 1: XML Programming with C++
Part 2: The SAX-based approach
Part 3: The Object Model-based approach
Part 4: Uses and Tradeoffs

In this section, uses of and tradeoffs between the event-based and object model approaches will be discussed. In particular, three topics will be examined: having XML-aware objects that can be used as handlers, using factories to create/modify objects, and having a document-centered application. The section concludes with a list of C++ parsers and their suitability for the purposes outlined.

Note that the designs outlined here, while useful and used in real applications, are not necessarily the best ones for your project. They are provided only as a guide.

XML-aware objects used as handlers

The first form of XML modules your application might use are simple XML aware classes designed to be registered in an event-driven parser. These classes are characterized not only by their handler nature, but by the self-containment of their activities. Their use doesn't usually compromise other classes, and the parsing of an XML document will often result in a change in their attributes or output operations.

These classes will take the following simple form:

Note that the handler base your handler inherits from is not necessarily HandlerBase from SAX; we use the term because every C++ event-driven parser will provide such base classes with a default behavior (HandlerBase in the case of xml4c2 or SAXParser, and expatpp itself in the case of expatpp).

Numerous examples of this approach are found with every parser, and they constitute the simplest, though very useful, form of XML use. Among the typical examples are the counting of elements in a document and the printing of each document element as it is found. Our SAX pretty printing example is good example of this approach.

These example activities could also be performed by traversing the object representation of the document, but they are typically done by the event-driven method because it does not require the memory expenditure that creating the object representation would.

There are lazy ways of creating the DOM in order to minimize the memory consumption, but even so in general terms, the event-driven approach must be used in these traversal cases for the sake of simplicity and good use of memory resources.

Factories for the creation or modification of other objects

The next typical scenario is the definition of factory classes for the manipulation of other objects according to the information in the XML document. These classes will usually take the following form:

The first of such manipulations that comes to mind is serialization. The process of making an object persistent using XML is a very important issue that can be handled this way. The responsibility of saving the object as XML could also be in the object itself, but this approach—abstracting the responsibility for serialization—is often preferred.

Many other uses for this approach can be found. A simple example would be the manipulation of a collection of objects in accordance with XML-encoded instructions sent by a remote object, for example, a collection like:


        map<name, long, less<string>> directory; 

 // A simple STL directory using an associative container map

 // That will hold names(the keys) and phone numbers(the longs

 // that serve as values for that keys)

 // the third parameter less<string> is a function object 

 // used to compare two keys.


could be manipulated by a remote object by sending messages like

<changeNumber name="Borges">5716124469</changeNumber>


to our factory that would translate them to:

            directory["Borges"] = 5716124469;


Again, as in the simpler handlers described above, the preferred approach for these factories is the event-driven one, since it doesn't involve the cost of keeping the document in memory.

Of course, there are many special cases that must be considered. Take the case of an object that must answer questions from other objects about a particular document (like "is there any element named H1?"). In this case a tradeoff between memory consumption and speed takes place, since both a DOM and an event-driven approach could be used. The first involves more memory consumption but will greatly speed the searches of specific elements in the document. The second will consume no memory representing the document but will require a O(n) complexity over the XML document on disk for each search. The final decision will have to take into consideration the amount of requests, the size of the documents, and the amount of available resources.

So far, we have reviewed two mainly event-driven uses. In the next section we will explore cases where DOM is a more natural option.

Document-centered applications

The uses of the DOM can be very varied, such as exposing an object model of the document in a browser so an extension language can be used to manipulate the document. We will concentrate on a general case: the use of DOM as the model in a Model-View-Controller scheme.

Using DOM in a Model View Controller (MVC)

Basically, the MVC defines a structure where there is an underlying model, several views that reflect the state of the model, and a controller that receives requests to change the model. It also ensures that these changes are reported to the views.

This is the case in many document-centered applications. For example, take a simple application that reads a document containing a memo like this:

<?xml version="1.0"?>



       <from>Le roi sanche</from>  



...and shows it to the user allowing it to be edited and written back to a file.

This, like any other application that needs to manipulate an object representation of the document itself as a base model, is perfect for DOM use.

Refinements to the model of constructing the tree and modifying the DOM via direct calls to the API can be used. Real life view objects can issue command objects that include the document fragment that is the target of the change (instead of changing the DOM directly). Also, several strategies to avoid loading the whole document at once could be applied.

Nevertheless, the basic idea remains what our simple example tries to show: whenever you need to manipulate the document directly as your model (instead of constructed domain-specific objects), DOM constitutes the natural choice.

Widely used parsers

We finish this section by presenting a list of commonly used C++ parsers and their basic information. (For performance comparisons, please refer to Benchmarking XML Parsers.)

Parser DOM Support SAX Support Platforms supported Validation support Download from
libxml YES YES Linux, Win32 (possibly many others) YES
expat (C) NO NO Win32, Unix, and practically everywhere a modern C compiler exists. NO
expatpp (only the wrapper, without extensions) NO NO Win32, Linux, Mac (and many untested others) NO
xml4c2 YES YES IBM platforms: AIX 4.1.4 and higher, Win32 (MSVC 5.0, 6.0 compilers), Solaris 2.6, HP-UX B10.2 (aCC and CC), HP-UX B11 (aCC and CC), Linux YES

As you can see, C++ is well suited for XML processing in terms of availability, size and complexity of code, conformance, and even to a great extent, portability—not to mention performance. There is nothing especially better about using other languages like Java for XML programming.

All languages have strengths and weaknesses that should be taken into consideration when choosing the language to use for your project. It may be true that C++ has a steeper learning curve, and perhaps C++ programmers can be a little harder to find, but those considerations have nothing to do with C++'s ability to process XML. Great care should be taken when facing marketing hype about supposedly "perfect marriages" between a language and a specific technology. XML is no exception.

Back to part 1: XML Programming with C++