Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

An Introduction to StAX
by Elliotte Rusty Harold | Pages: 1, 2

This simple example perhaps doesn't demonstrate the full power of StAX. Since the client application controls the process, it's easy to write separate methods for different elements. These methods can have detailed knowledge of the internal structure of the type of element they handle. For example, you could write one method that handles headers, one that handles img elements, one that handles tables, one that handles meta tags, and so forth. For example, you might process an html element that contains head and body child elements like this:

public void processHtml(XmlPullParser parser) {
  while (true) {
    int event = parser.next();
    if (event == XMLStreamConstants.START_ELEMENT) {
      if (parser.getLocalName().equals("head")) processHead(parser);
      else if (parser.getLocalName().equals("body")) processBody(parser)
    }
    else if (event == XMLStreamConstants.END_ELEMENT) {// </html>
      return;
    }
  }
}

Here I'm making a lot of assumptions about exactly which tags appear where when. This isn't unusual in XML processing . Most programs are written with particular vocabularies in mind. You wouldn't expect an XHTML outliner to know what to do with a DocBook document, much less an SVG picture, for example. However, it is best to test and verify your expectations about data formats. Normally, this would be done through validation. You can turn on validation by setting the factory's javax.xml.stream.isValidating property to true before instantiating the parser like this:

factory.setProperty("javax.xml.stream.isValidating", Boolean.TRUE);

You would then register an XMLReporter with the XMLInputFactory to receive notices of the validity errors. For example, using an anonymous inner class,

factory.setXMLReporter(new XMLReporter() {
  public void report(String message, String errorType,
    Object relatedInformation, Location location) {
      System.err.println("Problem in " + location.getLocationURI());
      System.err.println("at line " + location.getLineNumber()
        + ", column " + location.getColumnNumber());
      System.err.println(message);
  }
});

If you want validity errors to be fatal, throw an XMLStreamException from the report method rather than just printing the error message. However, StAX parsers are not required to be able to validate and the reference implementation can't, so this doesn't yet work.

StAX does offer an alternative for simple cases. If you expect a particular item to be present in the document, you can require it using a type and an optional name and namespace. For example, if I think that the cursor is positioned at an XHTML <head> start-tag, I'd require it thusly:

parser.require(XMLStreamConstants.START_ELEMENT,
               "http://www.w3.org/1999/xhtml",
               "head");

If my expectation proves wrong, then the require method throws an XMLStreamException, a checked exception. You can pass null for either the namespace or the element name to indicate that all namespaces and names are acceptable. Putting this all together, the general pattern might be something like:

try {
  parser.next();
  parser.require(XMLStreamConstants.START_ELEMENT,
                 "http://www.w3.org/1999/xhtml",
                 "head");
  processHead(parser);
}
catch (XMLStreamException ex) {
  // Oops! The head was missing!
}

Output

StAX is not limited to reading XML documents. It can also create them. For output, instead of an XMLStreamReader you use, naturally enough, an XMLStreamWriter. This interface provides methods to write elements, attributes, comments, text, and all the other parts of an XML document. An XMLStreamWriter is created by an XMLOutputFactory like this:

OutputStream out = new FileOutputStream("data.xml");
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(out);

You write data onto the stream by using various writeFOO methods: writeStartDocument, writeStartElement, writeEndElement, writeCharacters, writeComment, writeCDATA, etc. For example, these lines of code write a simple hello world document:

writer.writeStartDocument("ISO-8859-1", "1.0");
writer.writeStartElement("greeting");
writer.writeAttribute("id", "g1");
writer.writeCharacters("Hello StAX");
writer.writeEndDocument();

When you've finished creating the document, you want to flush and close the writer. This does not close the underlying output stream, so you'll need to close that too:

writer.flush();
writer.close();
out.close();

XMLStreamWriter helps maintain some well-formedness constraints. For instance, endDocument closes all unclosed start-tags, and writeCharacters performs any necessary escaping of reserved characters like & and <. However, the checking is minimal. XMLStreamWriter allows documents with multiple roots, documents with more than one XML declaration, element names that contain whitespace, characters that don't exist in the output character set, and a lot more. Implementations are allowed but not required to check these things. The reference implementation does not check them. Separate verification and testing of the output is necessary. Creating XML documents with XMLStreamWriter is faster and more more efficient than serializing a DOM or XOM tree, but it's not nearly as robust.

Summing Up

This article has just skimmed along the surface of StAX; the API has more to offer than there is space here to describe. Like SAX, StAX enables pipelines that chain the output of one process to the input of the next. It can filter the documents it parses to modify or log the documents. It can support XML views of non-XML data. It can marshal data structures and objects into XML documents and it can unmarshal the documents back into objects.

When is StAX not appropriate? Basically whenever a streaming API doesn't work. Like SAX, StAX still requires you to build data structures as the document is parsed in order to hold onto information for any length of time. In the worst case, these data structures can become as large and complex as the original document. In these cases, a tree-based API such as DOM or XOM may be more appropriate. Such an API definitely provides more convenient random access to the tree than does StAX (or any other streaming API). StAX works well when you need to process a large document a small piece at a time moving from beginning to end, that is, when you can essentially slide a peephole over the complete document. It works less well when you need to access widely separated parts of the document at the same time in unpredictable orders. However, many of the toughest XML processing problems come from exactly the domain where StAX does work well.

StAX is a fast, potentially extremely fast, straight-forward, memory-thrifty way to loading data from an XML document the structure of which is well known in advance. State management is much simpler in StAX than in SAX, so if you find that the SAX logic is just getting way too complex to follow or debug, then StAX is well worth exploring. A few features such as validation, schema support, and entity resolution are either not available or are not functional in the current reference implementation, but these should soon be available in independent implementations. StAX will be a very useful addition to any Java developer's XML toolkit.


Comment on this articleHave you tried StAX? Share your comments or questions about this article in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • Why "streaming" word is used in StAX?
    2007-03-22 02:55:51 vikas_khengare [Reply]

    Hi Friends,


    I just read Michael Trachtman comment on StAX, but I still have doubt that why people are using "Streaming" word in StAX?


    Can some one will give me explanation?


    Thanks
    Vikas K
    vikas_khengare@yahoo.com

  • StAX, Streaming APIs and DOM
    2006-06-06 09:49:58 MichaelTrachtman [Reply]

    What I would really like is a streaming API that works sort of like StAX, and sort of like DOM/JDom.


    It would be streaming in the sense that it would be very lazy and not read things in until needed. It would also be streaming in the sense that it would read everything forwards (but not backwards).


    Here's what code that used such an API would look like.


    URL url = ...
    XMLStream xml = XXXFactory(url.inputStream()) ;


    // process each <book> element in this document.
    // the <book> element may have subnodes.
    // You get a DOM/JDOM like tree rooted at the next <book>.


    while (xml.hasContent()) {
    XMLElement book = xml.getNextElement("book");
    processBook(book);
    }


    Another variation would be:
    Note that the implementation of the container would be lazy. I.e. it would only read things
    as they are pulled by the container.


    Collection<XMLElement> books = xml.getAllElement("book");
    for (XMLElement book : books) {
    processBook(book);
    }


    There would also be XPath aware versions of the above.


    Collection<XMLElement> books = xml.getAllElement("/*/libraries[city='chicago']/book");
    for (XMLElement book : books) {
    processBook(book);
    }


    And methods for controlling the depth of the produced tree. Something like (not sure of the best syntax).
    This example would create a collection of books,
    but only retain the name, ISBN and author of each
    and ignore everything else.


    Collection<XMLElement> books = xml.getAllElement("/*/libraries[city='chicago']/book",
    restrict("name|ISBN|author"));
    for (XMLElement book : books) {
    processBook(book);
    }


    Such a system could be very memory efficient and easy to program with. It is what I thought Stax would be, before I saw the Stax examples.


    Does anything like this exist?


    .. Michael
    michael@ideality.org




  • I have a question about this acticle
    2004-01-29 09:20:33 Jeff Wong [Reply]

    It seems from the way Stax is used you are already assuming where and when the tags come in for parsing. Is the XML file checked against the schema before using Stax to parse or does Stax do it for you? I am just wondering where would XML schema come in? I kinda new at this. thanks

  • bidirectional SAX
    2003-10-03 17:11:52 Taylor Cowan [Reply]

    "Unlike SAX, StAX is a bidirectional API."


    SAX is being used by programs to produce AND receive events. Perhaps you should have said StAX supports bidirectional processing better that SAX. Here is an article ON THE SAME SITE that demonstrates bidirectional use of SAX...


    http://www.xml.com/pub/a/2001/09/19/sax-non-xml-data.html

  • FYI: Patent pending
    2003-10-01 03:35:10 Marty Feldmann [Reply]

    "System and method for XML parsing" (US2003159112)


    http://l2.espacenet.com/espacenet/viewer?PN=US2003159112&CY=ep&LG=en&DB=EPD


    inventor: Chris Fry
    published: 2003-08-21


    and more trivial XML patents queued:


    "System and method for fast XSL transformation" (US2003159111)


    http://l2.espacenet.com/espacenet/viewer?PN=US2003159111&CY=ep&LG=en&DB=EPD


    "System and method for XML data binding" (US2003163603)


    http://l2.espacenet.com/espacenet/viewer?PN=US2003163603&CY=ep&LG=en&DB=EPD


  • xmliter: another alternative
    2003-09-21 18:29:55 Mark Hayes [Reply]

    This looks great. I posted a similar solution on sourceforge some time ago called xmliter. It is built on the SAX API and therefore does not require any new parser technology. It also allows easily skipping over the content of an element. It does not currently allow writing, only reading, however.


    I'm very interested in any feedback on it. It is available under the MIT license.


    http://xmliter.sourceforge.net/


    Mark Hayes