A Survey of APIs and Techniques for Processing XML

July 9, 2003

Introduction

In recent times the landscape of APIs and techniques for processing XML has been reinvented as developers and designers learn from their experiences and some past mistakes. APIs such as DOM and SAX, which used to be the bread and butter of XML APIs, are giving way to new models of examining and processing XML. Although some of these techniques have become widespread among developers who primarily work with XML, they are still unknown to most developers. Nothing highlights this better than a recent article by Tim Bray, one of the coinventors of XML, entitled XML is too Hard for Programmers and the subsequent responses on Slashdot.

This article provides an overview of the current landscape of techniques for processing XML and runs the gamut from discussing old mainstays, such as push model APIs and tree model APIs as exemplified by SAX and DOM, to newer participants in the XML world such as cursor APIs and pull model parsers as exemplified by the .NET Framework's XPathNavigator and the XmlPull API respectively.

Sample Input Document

To give a clearer idea of what processing given the various API styles would look like I created the following sample document which describes a number of books I own and whether they have currently been loaned out to friends.


 <books>

  <book publisher="IDG books" on-loan="Sanjay">

    <title>XML Bible</title>

    <author>Elliotte Rusty Harold</author>

  </book>

  <book publisher="Addison-Wesley">

    <title>The Mythical Man Month</title>

    <author>Frederick Brooks</author>

  </book>

  <book publisher="WROX">

    <title>Professional XSLT 2nd Edition</title>

    <author>Michael Kay</author>

  </book>

  <book publisher="Prentice Hall" on-loan="Sander" >

   <title>Definitive XML Schema</title>

   <author>Priscilla Walmsley</author>

  </book>

  <book publisher="APress">

   <title>A Programmer's Introduction to C#</title>

   <author>Eric Gunnerson</author>

  </book>

</books>

For illustrative purposes each description of an XML processing technique will show a code sample that describes how to display the names of the persons who I've loaned books to and which books have been loaned to them. All of the code samples should produce the following output.

Sanjay was loaned XML Bible by Elliotte Rusty Harold
Sander was loaned Definitive XML Schema by Priscilla Walmsley

Push Model APIs

In a push model the XML producer (typically an XML parser) controls the pace of the application and informs the XML consumer when certain events occur. The classic example of this is the SAX API, where the XML consumer registers callbacks with the SAX parser, which invokes the callbacks as various parts of the XML document are seen.

The primary advantage of push model APIs when processing XML is that the entire XML document does not need to be stored in memory, only the information about the node currently being processed is needed. This makes it possible to process large XML documents which can range from several megabytes to a few gigabytes in size without incurring massive memory costs to the application. However it also means that certain context and state information such as the parents of the current node or its depth in the XML tree must be tracked by the programmer.

Another issue with push model parsers is that many developers find callbacks to be an unintuitive way to control program flow. Tim Bray described callbacks as being non-idiomatic and awkward when used from his programming language of choice.

The following code sample uses the SAX API in the Apache Xerces Parser to display the content of the title and author elements of books that have the on-loan attribute set.


    

import org.xml.sax.*; 

import org.xml.sax.helpers.*;

import java.io.*;





class LoanedBooksFinder extends DefaultHandler{



    public boolean borrowedBookSeen = false; 

    

    public void startElement(String uri, String name, 

                              String qName, Attributes atts){



     String borrower = atts.getValue("", "on-loan"); 



     if(qName.equals("book") && (borrower != null)){

	  borrowedBookSeen = true; 

	  System.out.print("\n" + borrower + " was loaned ");

     }if(qName.equals("author") && borrowedBookSeen){

	  System.out.println(" by ");

     }

	

    }





    public void endElement(String uri, String name, 

                            String qName){



	//reset flag

	if(qName.equals("book")){

	    borrowedBookSeen = false; 	

	}

    }



    public void characters(char[] ch, int start, int length){



	if(borrowedBookSeen){

	    for(int i = start; i < start + length; i++)

		System.out.print(ch[i]);

	}

    }



}



public class Test{



    public static void main(String[] args){

	

	try{ 

	

   string parser = "org.apache.xerces.parsers.SAXParser"



   XMLReader xr = XMLReaderFactory.createXMLReader(parsers);

   LoanedBooksFinder handler = new LoanedBooksFinder(); 

   xr.setContentHandler(handler); 

   xr.setErrorHandler(handler); 

	    

   FileReader r = new FileReader("books2.xml"); 

   xr.parse(new InputSource(r)); 



	}catch(SAXException se){

	    System.out.println("XML Parsing Error: " + se);

	}catch(IOException io){

	    System.out.println("File I/O Error: " + ioe);

	}

  

    }

}

It should be noted that to register callbacks one needs to create a class devoted to handling events from the SAX parser, either by implementing the ContentHandler interface or extending the DefaultHandler class.

Pull Model APIs

During pull model processing, the consumer of XML controls the program flow by requesting events from the XML producer as needed instead of waiting on events to be sent to it. This is very similar to the pseudocode described in Tim Bray's post as the typical text processing idiom. Like push model parsers, pull model XML parsers operate in a forward-only, streaming fashion while only showing information about a single node at any given time. This makes pull-based processing of XML as memory efficient as push-based processing but with a programming model that is more familiar to the average programmer.

Two notable pull model XML parsers are the .NET Framework's XmlReader class and the Common API for XML Pull Parsing. Programming using both APIs is fairly similar; one creates a loop that continually reads from the XML document until the end of the document is reached but acts solely open items of interest as they are seen.

The following code sample uses the .NET Framework's XmlTextReader class to display the contents of the title and author elements of books that have the on-loan attribute set.

using System; 

using System.IO; 

using System.Xml;



public class Test{





    static void Main(string[] args) {



      try{ 

      XmlTextReader reader = new XmlTextReader("books2.xml");

      ProcessBooks(reader);



      }catch(XmlException xe){

        Console.WriteLine("XML Parsing Error: " + xe);

      }catch(IOException ioe){

        Console.WriteLine("File I/O Error: " + ioe);

      }

    }  



    static void ProcessBooks(XmlTextReader reader) {

      

      while(reader.Read()){

      

        //keep reading until we see a book element 

        if(reader.Name.Equals("book") && 

	   (reader.NodeType == XmlNodeType.Element)){ 

          

	  if(reader.GetAttribute("on-loan") != null){ 

            ProcessBorrowedBook(reader);

          }else {

            reader.Skip();

          }

        }

      }

    }





   static void ProcessBorrowedBook(XmlTextReader reader){



 Console.Write("{0} was loaned ", 

                             reader.GetAttribute("on-loan"));

      

      

      while(reader.NodeType != XmlNodeType.EndElement && 

                                            reader.Read()){

       

       if (reader.NodeType == XmlNodeType.Element) {

          

	  switch (reader.Name) {

            case "title":              

              Console.Write(reader.ReadString());

              reader.Read(); // consume end tag

              break;

            case "author":

              Console.Write(" by ");

              Console.Write(reader.ReadString());

              reader.Read(); // consume end tag

              break;

          }

        }

      }

      Console.WriteLine();

    }

}

Pull model parsers typically do not require a specialized class for handling XML processing since there is no requirement to implement specific interfaces or subclass certain classes for the purpose of registering callbacks. Also the need to explicitly track application states using boolean flags and similar variables is significantly reduced when using a pull model parser.

Tree Model APIs

A tree-based API is an object model that represents an XML document as a tree of nodes. The object model consists of objects that map to various concepts from the XML 1.0 recommendation such as elements, attributes, processing instructions and comments. Such APIs provide mechanisms for loading, saving, accessing, querying, modifying, and deleting nodes from an XML document. The canonical example of a tree model API for processing XML is the W3C XML Document Object Model (DOM) which has inspired various programming language specific variations including JDOM and PyXML, for Java and Python respectively

Typically tree model APIs load the entire XML document into memory, and thus do not limit users to forward-only access of the XML data. This prevents traditional tree model APIs from being used in situations where large XML documents have to be processed. Although it is possible to build optimized tree model APIs that only load portions of an XML document as needed, such APIs are not in widespread usage.

The following code sample uses the Apache Xerces DOM API to display the contents of the title and author elements of books that have the on-loan attribute set.


import org.apache.xerces.parsers.*;

import org.apache.xerces.dom.*;

import org.w3c.dom.*; 



public class Test{



  public static void main(String[] args) {

  

        

    try {



	DOMParser parser = new DOMParser(); 

        parser.parse("books2.xml"); 

	

        org.w3c.dom.Document doc  = parser.getDocument();



	NodeList list = doc.getElementsByTagName("book"); 



for(int i = 0, length = list.getLength(); i < length; i++){



    Element book  = (Element)list.item(i);

    Attr borrower = book.getAttributeNode("on-loan"); 

	    

    if(borrower != null){

     System.out.print(borrower.getValue() + " was loaned ");



    //cast elements to Xerces specific classes 

    //to get access to getTextContent() method

    ParentNode title = (ParentNode)

              book.getElementsByTagName("title").item(0); 

    ParentNode author = (ParentNode)

              book.getElementsByTagName("author").item(0); 

		

    System.out.println(title.getTextContent() + " by " 

                                  + author.getTextContent());

	    }

	} 

      

    }catch (Exception e) {         

        System.out.println(e.getMessage());

    }      

  

  }



}

The W3C DOM offers fairly limited functionality, primarily because it was designed to be a generic API that could be implemented in a variety of programming languages. This usually means that most people who utilize the DOM API use helper methods, that is, extensions to the DOM API which are specific to particular implementations (such as the call to getTextContent() in the previous sample).

Cursor APIs

XML cursors are the newest class of APIs for processing XML. An XML cursor acts like a lens that focuses on one XML node at a time, but, unlike pull-based or push-based APIs, the cursor can be positioned anywhere along the XML document at any given time. In a way, pull model APIs are forward-only versions of a cursor model. Examples of XML cursor APIs are the .NET Framework's XPathNavigator class and the XmlCursor class from BEA's XMLBeans toolkit.

Just like tree model APIs, an XML cursor allows one to navigate, query, and manipulate an XML document loaded in memory. However, an XML cursor does not require the heavyweight interface of a traditional tree model API, where every significant token in the underlying XML must map to an object. This means that XML cursor APIs are potentially more memory efficient than tree model APIs.

More importantly, the fact that node objects do not need to be created for each information item in the underlying XML means that it makes it easier to create "XML views" of non-XML data. Whereas with traditional tree model APIs one had to duplicate an entire data source into a DOM or something similar to present the data source as XML, with XML cursors one can just implement a cursor over that data source that presents the underlying data as XML nodes when viewed through the cursor. This is the concept behind the ObjectXPathNavigator, which uses the .NET Framework's XPathNavigator, enabling one to treat an arbitrary set of objects as a virtual XML document that can be queried with XPath or transformed with XSLT.

The following code sample uses the .NET Framework's XPathNavigator class to display the contents of the title and author elements of books that have the on-loan attribute set.

   

using System; 

using System.Xml.XPath; 



public class Test{



  public static void Main(string[] args){



    XPathDocument doc  = new XPathDocument("books2.xml"); 

    XPathNavigator nav = doc.CreateNavigator(); 



    //select every book with an on-loan attribute 

    XPathNodeIterator iterator = 

                        nav.Select("/books/book[@on-loan]"); 



    while (iterator.MoveNext()){

      

      //create a navigator pointing at same position

      XPathNavigator nav2 = iterator.Current.Clone(); 

      string borrower     = nav2.GetAttribute("on-loan", "");



      nav2.MoveToFirstChild();

      string title        = nav2.Value; 

      nav2.MoveToNext(); 

      string author       = nav2.Value; 



      Console.WriteLine("{0} was loaned {1} by {2}", 

                                   borrower, title, author);

    }    

  }

}

Object to XML Mapping APIs

It is often convenient to map the contents of an XML document to objects that better represent the data within the XML document than interacting with the data via an XML-based object model. Developers working in object oriented languages typically prefer working with information as objects and classes as opposed to attempting to extract information from untyped XML nodes.

There are numerous advantages to this approach. First, the memory footprint of XML data can be reduced because information isn't being stored as nodes and textual data but as classes and programming language primitives. A DOM node that represents a <foo> element that contains a numeric value as text is more memory intensive than its counterpart foo class with an integer field. In particular, the memory footprint is better if you are able to turn lots of leaf values into primitive valued fields, or if you can do away with parent and sibling pointers. Second, it is more convenient to perform calculations on certain types of data such as numbers or dates as native programing language constructs than it is to interact with them as string values stored in nodes. But, third, the most compelling argument is the improved ease of use. It's no longer nessary to navigate the XML tree to access the information but instead one can simply access data as fields and properties of an object.

Object to XML mapping technologies have certain limitations that prevent them from replacing traditional methods for accessing XML data. Most of these technologies cannot represent all the information in an XML document with full fidelity. Many do not preserve processing instructions and comments. Similarly mixed content is problematic to map to objects since the tendency is to map element and attribute nodes to objects and text nodes to the values of fields or properties in said objects. Although the order of elements is significant in an XML document, this typically cannot be enforced on objects. Most object oriented languages do not have a way of expressing that in a book class the title field precedes the author field, although one could use ordered collections to get around this problem.

In most cases an XML document's schema (which could be a DTD, W3C XML Schema document, or proprietary schema language) is used as a basis for mapping the XML to native objects in the target programming language. Examples of such Object<->XML mapping technologies include JAXB, the .NET Framework's XmlSerializer and Castor. Another limitation of Object<->XML mapping technologies is that there are often impedance mismatches between XML schema languages such as W3C XML Schema and object oriented concepts.

The following sample shows how to utilize the .NET Framework's XmlSerializer class to display the contents of the title and author elements of books that have the on-loan attribute set.

Obtain the schema for your XML document. Below is a schema for my XML file generated using the Microsoft XSD Inference tool.


    

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 

    attributeFormDefault="unqualified"

    elementFormDefault="qualified">



 <xs:element name="books"> 

  <xs:complexType>

   <xs:sequence> 

    <xs:element name="book" maxOccurs="unbounded">

     <xs:complexType>

      <xs:sequence>

       <xs:element name="title" type="xs:string" />

       <xs:element name="author" type="xs:string" />

       </xs:sequence>

       <xs:attribute name="publisher" type="xs:string" 

                                        use="required" />

       <xs:attribute name="on-loan" type="xs:string" 

                                        use="optional" />

      </xs:complexType>

    </xs:element>

   </xs:sequence> 

  </xs:complexType>

 </xs:element>



</xs:schema>

Generate a C# class from the XML schema using xsd.exe

using System.Xml.Serialization;



/// <remarks/>

[System.Xml.Serialization.XmlRootAttribute("books", 

                             Namespace="", IsNullable=false)]

public class books {

    

    /// <remarks/>

    [System.Xml.Serialization.XmlElementAttribute("book")]

    public booksBook[] book;

}



/// <remarks/>

public class booksBook {

    

    /// <remarks/>

    public string title;

    

    /// <remarks/>

    public string author;

    

    /// <remarks/>

    [System.Xml.Serialization.XmlAttributeAttribute

          (Form=System.Xml.Schema.XmlSchemaForm.Unqualified)]

    public string publisher;

    

    /// <remarks/>

   [System.Xml.Serialization.XmlAttributeAttribute("on-loan",

      Form=System.Xml.Schema.XmlSchemaForm.Unqualified)]

    public string onloan;

}

Write code

 

using System; 

using System.Xml.Serialization; 

using System.IO; 



public class Test{



  public static void Main(string[] args){



    TextReader reader = new StreamReader("books2.xml");

    XmlSerializer serializer = 

                       new XmlSerializer(typeof(books));

    books myBooks = 

                      (books)serializer.Deserialize(reader);

    reader.Close();



    foreach(booksBook book in myBooks.book){



      if(book.onloan != null){

	Console.WriteLine("{0} was loaned {1} by {2}", 

                       book.onloan, book.title, book.author);

      }    

    }

  

  }

}

XML Specific Languages

It seems natural that one would process XML using a language that is designed for processing XML as opposed to going through traditional programming languages. For performing complex operations on XML data, all of the aforementioned techniques suffer from either being too cumbersome, require too many lines of code, or do not handle all of XML. In such cases, the wise decision is to go with a language which natively understands how to process XML to do the heavy lifting and invoke that from the target programming language. Examples of languages specifically designed for processing and manipulating XML include XPath, XQuery, XSLT, and Xtatic.

The following sample is an XQuery expression that displays the contents of the title and author elements of books that have the on-loan attribute set.


    

    for $b in document("books2.xml")/books/book[@on-loan]

     return (string($b/@on-loan), " was loaned ", 

                  $b/title/text(), " by ", $b/author/text())

There are various sites where one can try out sample XQuery expressions, including QEXO XQuery Sandbox and Microsoft's XQuery demo site

Conclusion

This article shows that processing XML isn't simply a choice of in-memory (DOM) versus streaming (SAX). Rather, it's a tradeoff between a number of choices with lots of small nuances. The following table is s supplement to this article: it provides a quick overview of the distinguishing characteristics of the various approaches and techniques

	Push Model	Pull Model	Tree Model	Cursor Model	Object to XML Mapping	XML-Specific Languages
Forward-Only Access (streaming)	X	X
Random Access (in memory)			X	X	X	X
Schema Required					X
Read-only access	X	X				depends
Event based	X
Emphasizes XML data model	X	X	X	X		X