XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

An XML Fragment Reader

July 16, 2003

While many potential uses of XML result in fragments of XML text, not complete documents, XML parsers require complete documents to do their jobs properly. I have been running an XML-based servlet to conduct online surveys. It records user responses by adding XML formatted data to a continuously growing cumulative file. I needed a way to analyze survey responses on the fly without going to the trouble of copying the file and adding the markup required to create a complete document.

The solution turns out to be simple and quite flexible. It enables you to combine many bits of XML formatted character streams to feed an XML parser. With the release of the Java SDK 1.4, XML parser classes joined the standard Java release, creating a standard API for parser access. Thus, in the org.xml.sax package, you'll find the InputSource class. An InputSource object can feed a character stream to either a SAX or a DOM parser. You can create an InputSource from a Reader, the basic Java class for streams of characters.

A workable plan of attack is to create a class extending Reader that can supply characters to an InputStream from a sequence of character stream sources. In this example, I use String and File objects to supply character streams but any object that can create a Reader, such as a URLConnection or a java.sql.Clob from a database, will work. The class I created is called XMLfragmentReader.

The XMLfragmentReader Class

The following shows the import statements, instance variables, and constructor for the XMLfragmentReader class. The constructor takes an Object array and a PrintStream that can be used to log events and errors. When the constructor is finished, the first Reader has been opened and is ready:

package com.lanw.rutil ;

import java.io.* ;
import org.xml.sax.* ;
import org.xml.sax.helpers.* ;
import javax.xml.parsers.* ;
import org.w3c.dom.* ;

public class XMLfragmentReader extends java.io.Reader {

 boolean rdyflag = false ;
 Reader rdr ; // current Reader
 Object[] sources ;
 int[] lineCounts ; // per source
 char eol = '\n' ;
 String readerID ;
 int sourceN ; // index of current Reader

 long charsRead ; // in current Reader
  // 
 PrintStream log ;
 String lastErr ;

 public XMLfragmentReader( Object[] src, PrintStream ps )
         throws IOException {
   sources = src ; 
   lineCounts = new int[ sources.length ];
   if( ps != null ) log = ps ;
   else log = System.out ;
   log.println("Created XMLfragmentReader with " 
    + sources.length + " sources." );
   createReader( 0 );
 }

The createReader method that follows determines the type of the nth source object and creates the appropriate reader.

private void createReader( int n ) 
        throws IOException {
   Object src = sources[n] ;
   charsRead = 0 ;
   log.println("Creating reader for: " + src );
   if( src instanceof String ){
     rdr = new StringReader((String) src);
     rdyflag = true ;
     readerID = "InputString " + n ;
   }
   if( src instanceof File ){
     rdr = new BufferedReader(new FileReader( (File)src ));
     rdyflag = true ;
     readerID = ((File)src).getAbsolutePath() ;
   }
   // expand here with more source types
 }

Because XMLfragmentReader extends the java.io.Reader abstract class, we must supply the following methods. There are two important points to notice here.

  • Whenever a reading method gets an indication that the current source is exhausted, it calls nextReader to open the next one.
  • When one or more characters have been read, we always check to see if an end of line character has been encountered. A line count is maintained for each source to help interpret any parsing errors.
 public boolean ready() throws IOException { 
    return rdr.ready() ;
 }

 public void close() throws IOException { 
    rdr.close() ; rdyflag = false ;
 }

  // Return a single character or -1 if all reader sources
  // are exhausted. 
 public int read() throws IOException { 
   int ch = rdr.read();
   charsRead++ ;
   if( ch == -1 ){
     if( nextReader() ){
       ch = rdr.read();
     } // if no next reader return -1
   }
   if( ch == eol ) { lineCounts[sourceN]++ ; }
   return ch ;
 }

 public int read(char[] cbuf) throws IOException {
   return read( cbuf, 0, cbuf.length ) ;
 }

 public int read(char[] cbuf, int off, int len)
      throws IOException {
   int ct = rdr.read( cbuf, off, len );
   if( ct == -1 ){
     if( nextReader() ){
       ct = rdr.read( cbuf, off, len );
     } // if no next reader return -1
   }
   if( ct > 0 ){
      countLines( cbuf, off, ct );
   }
   charsRead += ct ;
   return ct ;
 }

 public long skip(long n) throws IOException {
   return rdr.skip( n ) ;
 } 

Every source after the first one is created by a call to the nextReader method which follows. It returns true if there is another source that can be opened. The countLines method is used by the read method that reads into a buffer to keep track of the number of lines in the nth source:

// return true if next reader created ok
 private boolean nextReader() throws IOException {
   close(); // sets rdyflag = false ;
   if( ++sourceN >= sources.length ) return false ;
   createReader( sourceN );
   return rdyflag ;
 }

 // note that len is the number actually read
 private void countLines( char[] cbuf, int off, int len ){
   for( int i = 0 ; i < len ; i++ ){
     if( cbuf[ off++ ] == eol ){
       lineCounts[ sourceN ]++ ;
     }
   }
 }

To facilitate debugging XML documents, a SAX parser keeps track of the number of lines it has read so it can report a line and column number where an error is detected. However, owing to the multiple sources we read, this absolute line number is useless if we can't report line number within a specific source. The following method returns a String that reports the source number and relative line number that corresponds to an absolute line number:

// Method to convert a absolute line number as reported in
// a SAXException to a source and relative line number
 public String reportRelativeLine( int absN ){
   int runningLines = 0 ;
   for( int i = 0 ; i < sources.length ; i++ ){
     runningLines += lineCounts[i] ;
     if( absN <= runningLines ){
       int startN = runningLines - lineCounts[i] ;
       return "Source number: " + i +
            " line: " + (absN - startN) ;
     }
   }
   return "Unable to locate line# " + absN ;
 }

Example Uses

One use of XMLfragmentReader is to create an org.xml.sax.InputSource object that is in turn passed to a SAXParser. The other thing the parser needs is an event handler, typically created as an extension to the DefaultHandler class in the org.xml.sax.helpers package. The following method in the XMLfragmentReader class takes a DefaultHelper and parses the combined sources provided by the reader, calling the event handler methods in the handler. Note how the exception reporting uses the reportRelativeLines method:

 // returns null if no error, else a String with details
 public String parse( DefaultHandler handler ){
  SAXParser parser ;
  try {
    InputSource input = new InputSource( this );
    SAXParserFactory fac = SAXParserFactory.newInstance();
    parser = fac.newSAXParser() ; // default
    log.println("Start parse");
    parser.parse( input, handler );
    log.println("End parse");
  }catch(SAXParseException spe){
    StringBuffer sb = new StringBuffer( spe.toString() );
    sb.append("\nAbsolute Line number: " + 
        spe.getLineNumber());
    sb.append("\nColumn number: " + 
        spe.getColumnNumber() );
    sb.append("\n");
    sb.append( reportRelativeLine( spe.getLineNumber() ));
    lastErr = sb.toString(); 
  }catch(Exception e){
       StringWriter sw = new StringWriter();
       e.printStackTrace( new PrintWriter( sw ) ); 
       lastErr = sw.toString();
  }
    return lastErr ;
 } 

The other way to work with an XML document is through creation of a org.xml.dom.Document object. I have also included a simple method to do this in the XMLfragmentReader class:

 public Document build( ){
   DocumentBuilder builder = null ;
   Document doc = null ;
   try {
     InputSource input = new InputSource( this );
     DocumentBuilderFactory fac = 
         DocumentBuilderFactory.newInstance();
     builder = fac.newDocumentBuilder(); // default
     Log.println("Start build");
     doc = builder.parse( input );
     Log.println("End build");
     return doc ;
   }catch(SAXParseException spe){
     StringBuffer sb = new StringBuffer( spe.toString() );
     sb.append("\nAbsolute Line number: " +
         spe.getLineNumber());
     sb.append("\nColumn number: " + 
         spe.getColumnNumber() );
     sb.append("\n");
     sb.append( reportRelativeLine(spe.getLineNumber()));
     lastErr = sb.toString(); 
   }catch(Exception e){
     StringWriter sw = new StringWriter();
     e.printStackTrace( new PrintWriter( sw ) ); 
     lastErr = sw.toString();
   }
   return null ;
 } 
}

Here is a simplified example of creating a document from fragments of XML text. Two strings are used to form the start and end of the document and two files representing survey results from two periods are used to create the contents:

public Document example() throws IOException {
  Object[] src = new Object[4] ;
  src[0] = "<?xml version=\"1.0\"?>\r\n<root>\r\n" ;
  src[1] = new File(
    "c:\\XMLonTheFly\\Data\\test0117A.xml");
  src[2] = new File(
    "c:\\XMLonTheFly\\Data\\test0117B.xml");
  src[3] = "</root>\r\n" ;
  XMLfragmentReader fr = 
     new XMLfragmentReader( src, System.out );
  Document dom = fr.build();
  if( dom == null ){
     System.out.println("Error: " + fr.lastErr );
  }
  return dom ;
}


1 to 2 of 2
  1. Oh, come on
    2003-07-24 07:25:14 Oleg Tkachenko
  2. Misleading Premise
    2003-07-17 18:29:24 Alex Milowski
1 to 2 of 2