org.brownell.xml
Class EchoHandler

java.lang.Object
  |
  +--org.brownell.xml.EchoHandler
Direct Known Subclasses:
XhtmlEchoHandler

public class EchoHandler
extends java.lang.Object
implements org.xml.sax.DocumentHandler, org.xml.sax.misc.LexicalHandler, org.xml.sax.DTDHandler, org.xml.sax.misc.DeclHandler

This class is a SAX handler which echoes all its input as a well formed XML or XHTML document. If driven using SAX2 events, this output will include a recreated document type declaration, and will optionally include recreated entity references.

Few of the methods on this class are intended for applications to use directly. Those methods are for the JavaBeans properties used to enable XHTML format output, or to disable recreation of entity references.

Note that any relative URIs in the source document, as found in entity and notation declarations, should have been fully resolved by the parser providing events to this handler. This means that the output text should only have fully resolved URIs, which may not be the desired behavior in cases where later binding is desired.


Users should note that the current SAX2 draft has problems with how it reports the beginning and end of entities, where those entities are used to construct other elements such as declarations in a DTD, attribute values, or conditional sections. In the face of such usage, entity references reported through this handler are nonsensical.

Also, declarations for attributes of type NOTATION are not fully reported by this SAX2 draft; the particular notations which are legal are not yet reported. This class reports the permitted notations as being the single .unknown. notation.

When used with the Parser2 adapter found in this package, those entity reporting issues are dealt with by not reporting any of the problematic entity expansions. In short, when used with that adapter, the following holds:

Version:
1.1 (3 September 1999)
Author:
David Brownell (db@post.harvard.edu)

Constructor Summary
EchoHandler()
          Constructs a handler which writes all input to System.out in the UTF-8 encoding, and closes System.out when endDocument is called.
EchoHandler(java.io.Writer writer)
          Constructs a handler which writes all input to the writer, and then closes the writer when the document ends.
EchoHandler(java.io.Writer writer, java.lang.String encoding)
          Constructs a handler which writes all input to the writer, and then closes the writer when the document ends.
 
Method Summary
 void attributeDecl(java.lang.String element, java.lang.String name, java.lang.String type, java.lang.String defaultType, java.lang.String defaultValue)
          SAX2: called on attribute declarations
 void characters(char[] buf, int off, int len)
          SAX1: reports content characters
 void comment(char[] buf, int off, int len)
          SAX2: called when comments are parsed
 void elementDecl(java.lang.String name, java.lang.String model)
          SAX2: called on element declarations
 void endCDATA()
          SAX2: called after parsing CDATA characters
 void endDocument()
          SAX1: indicates the completion of a parse
 void endDTD()
          SAX2: called after the doctype is parsed
 void endElement(java.lang.String name)
          SAX1: indicates the end of an element
 void endEntity(java.lang.String name)
          SAX2: called after parsing an entity
 void externalEntityDecl(java.lang.String name, java.lang.String pubid, java.lang.String sysid)
          SAX2: called on external entity declarations
 void ignorableWhitespace(char[] buf, int off, int len)
          SAX1: reports ignorable whitespace
 void internalEntityDecl(java.lang.String name, java.lang.String value)
          SAX2: called on internal entity declarations
 boolean isExpandingEntities()
          Returns true if the output will have no entity references; returns false (the default) otherwise.
 boolean isXhtml()
          Returns true if the output attempts to echo the input following "transitional" XHTML rules and matching the "HTML Compatibility Guidelines" so that an HTML version 3 browser can read the output as HTML; returns false (the default) othewise.
 void notationDecl(java.lang.String name, java.lang.String pubid, java.lang.String sysid)
          SAX1: called on notation declarations
 void processingInstruction(java.lang.String name, java.lang.String value)
          SAX1: reports a PI
 void setDocumentLocator(org.xml.sax.Locator l)
          SAX1: provides parser status information
 void setExpandingEntities(boolean value)
          Controls whether the output text contains references to entities (the default), or instead contains the expanded values of those entities.
 void setXhtml(boolean value)
          Controls whether the output should attempt to follow the "transitional" XHTML rules so that it meets the "HTML Compatibility Guidelines" appendix in the XHTML specification.
 void startCDATA()
          SAX2: called before parsing CDATA characters
 void startDocument()
          SAX1: indicates the beginning of a parse
 void startDTD(java.lang.String root, java.lang.String pubid, java.lang.String sysid)
          SAX2: called when the doctype is partially parsed
 void startElement(java.lang.String name, org.xml.sax.AttributeList atts)
          SAX1: indicates the start of an element
 void startEntity(java.lang.String name)
          SAX2: called before parsing an entity
 void unparsedEntityDecl(java.lang.String name, java.lang.String pubid, java.lang.String sysid, java.lang.String notation)
          SAX1: called on unparsed entity declarations
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

EchoHandler

public EchoHandler()
            throws java.io.IOException
Constructs a handler which writes all input to System.out in the UTF-8 encoding, and closes System.out when endDocument is called. (Yes it's annoying that this throws an exception -- but there's really no way around it, since it's barely possible a JDK may exist somewhere that doesn't know how to emit UTF-8.)

EchoHandler

public EchoHandler(java.io.Writer writer)
Constructs a handler which writes all input to the writer, and then closes the writer when the document ends. If an XML declaration is written onto the output, and this class can determine the name of the character encoding for this writer, that encoding name will be included in the XML declaration.

See the description of the constructor which takes an encoding name for imporant information about selection of encodings.

Parameters:
writer - XML text is written to this writer.

EchoHandler

public EchoHandler(java.io.Writer writer,
                   java.lang.String encoding)
Constructs a handler which writes all input to the writer, and then closes the writer when the document ends. If an XML declaration is written onto the output, this class will use the specified encoding name in that declaration. If no encoding name is specified, no encoding name will be declared unless this class can otherwise determine the name of the character encoding for this writer.

At this time, only the UTF-8 ("UTF8") and UTF-16 ("Unicode") output encodings are fully lossless with respect to XML data. If you use any other encoding you risk having your data be silently mangled on output, as the standard Java character encoding subsystem silently maps non-encodable characters to a question mark ("?") and will not report such errors to applications.

For a few cases the risk can be reduced. If the writer is a java.io.OutputStreamWriter, and uses either the ISO-8859-1 ("8859_1", "ISO8859_1", etc) or US-ASCII ("ASCII") encodings, content which can't be encoded in those encodings will be written safely. Where relevant, the XHTML entity names will be used; otherwise, numeric character references will be emitted.

However, there remain a number of cases where substituting such entity or character references is not an option. Such references are not usable within a DTD, comment, PI, or CDATA section. Neither may they be used when element, attribute, entity, or notation names have the problematic characters.

Parameters:
writer - XML text is written to this writer.
encoding - if non-null, and an XML declaration is written, this is the name that will be used for the character encoding.
Method Detail

setXhtml

public void setXhtml(boolean value)
Controls whether the output should attempt to follow the "transitional" XHTML rules so that it meets the "HTML Compatibility Guidelines" appendix in the XHTML specification. A "transitional" Document Type Declaration (DTD) is placed near the beginning of the output document, instead of whatever DTD would otherwise have been placed there, and XHTML empty elements are printed specially. When writing text in US-ASCII or ISO-8859-1 encodings, the predefined XHTML internal entity names are used (in preference to character references) when writing content characters which can't be expressed in those encodings.

When this option is enabled, it is the caller's responsibility to ensure that the input is otherwise valid as XHTML. Things to be careful of in all cases, as described in the appendix referenced above, include:

Additionally, some of the oldest browsers have additional quirks, to address with guidlines such as:

Also, some characteristics of the resulting output may be a function of whether the document is accessed through a path giving it a MIME content type of text/html rather than a content type indicating XML (application/xml or text/xml).


isXhtml

public boolean isXhtml()
Returns true if the output attempts to echo the input following "transitional" XHTML rules and matching the "HTML Compatibility Guidelines" so that an HTML version 3 browser can read the output as HTML; returns false (the default) othewise.

setExpandingEntities

public void setExpandingEntities(boolean value)
Controls whether the output text contains references to entities (the default), or instead contains the expanded values of those entities.

isExpandingEntities

public boolean isExpandingEntities()
Returns true if the output will have no entity references; returns false (the default) otherwise.

setDocumentLocator

public void setDocumentLocator(org.xml.sax.Locator l)
SAX1: provides parser status information
Specified by:
setDocumentLocator in interface org.xml.sax.DocumentHandler

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
SAX1: indicates the beginning of a parse
Specified by:
startDocument in interface org.xml.sax.DocumentHandler

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
SAX1: indicates the completion of a parse
Specified by:
endDocument in interface org.xml.sax.DocumentHandler

startElement

public void startElement(java.lang.String name,
                         org.xml.sax.AttributeList atts)
                  throws org.xml.sax.SAXException
SAX1: indicates the start of an element
Specified by:
startElement in interface org.xml.sax.DocumentHandler

endElement

public void endElement(java.lang.String name)
                throws org.xml.sax.SAXException
SAX1: indicates the end of an element
Specified by:
endElement in interface org.xml.sax.DocumentHandler

characters

public void characters(char[] buf,
                       int off,
                       int len)
                throws org.xml.sax.SAXException
SAX1: reports content characters
Specified by:
characters in interface org.xml.sax.DocumentHandler

ignorableWhitespace

public void ignorableWhitespace(char[] buf,
                                int off,
                                int len)
                         throws org.xml.sax.SAXException
SAX1: reports ignorable whitespace
Specified by:
ignorableWhitespace in interface org.xml.sax.DocumentHandler

processingInstruction

public void processingInstruction(java.lang.String name,
                                  java.lang.String value)
                           throws org.xml.sax.SAXException
SAX1: reports a PI
Specified by:
processingInstruction in interface org.xml.sax.DocumentHandler

startCDATA

public void startCDATA()
                throws org.xml.sax.SAXException
SAX2: called before parsing CDATA characters
Specified by:
startCDATA in interface org.xml.sax.misc.LexicalHandler

endCDATA

public void endCDATA()
              throws org.xml.sax.SAXException
SAX2: called after parsing CDATA characters
Specified by:
endCDATA in interface org.xml.sax.misc.LexicalHandler

startDTD

public void startDTD(java.lang.String root,
                     java.lang.String pubid,
                     java.lang.String sysid)
              throws org.xml.sax.SAXException
SAX2: called when the doctype is partially parsed
Specified by:
startDTD in interface org.xml.sax.misc.LexicalHandler

endDTD

public void endDTD()
            throws org.xml.sax.SAXException
SAX2: called after the doctype is parsed
Specified by:
endDTD in interface org.xml.sax.misc.LexicalHandler

startEntity

public void startEntity(java.lang.String name)
                 throws org.xml.sax.SAXException
SAX2: called before parsing an entity
Specified by:
startEntity in interface org.xml.sax.misc.LexicalHandler

endEntity

public void endEntity(java.lang.String name)
               throws org.xml.sax.SAXException
SAX2: called after parsing an entity
Specified by:
endEntity in interface org.xml.sax.misc.LexicalHandler

comment

public void comment(char[] buf,
                    int off,
                    int len)
             throws org.xml.sax.SAXException
SAX2: called when comments are parsed
Specified by:
comment in interface org.xml.sax.misc.LexicalHandler

notationDecl

public void notationDecl(java.lang.String name,
                         java.lang.String pubid,
                         java.lang.String sysid)
                  throws org.xml.sax.SAXException
SAX1: called on notation declarations
Specified by:
notationDecl in interface org.xml.sax.DTDHandler

unparsedEntityDecl

public void unparsedEntityDecl(java.lang.String name,
                               java.lang.String pubid,
                               java.lang.String sysid,
                               java.lang.String notation)
                        throws org.xml.sax.SAXException
SAX1: called on unparsed entity declarations
Specified by:
unparsedEntityDecl in interface org.xml.sax.DTDHandler

attributeDecl

public void attributeDecl(java.lang.String element,
                          java.lang.String name,
                          java.lang.String type,
                          java.lang.String defaultType,
                          java.lang.String defaultValue)
                   throws org.xml.sax.SAXException
SAX2: called on attribute declarations
Specified by:
attributeDecl in interface org.xml.sax.misc.DeclHandler

elementDecl

public void elementDecl(java.lang.String name,
                        java.lang.String model)
                 throws org.xml.sax.SAXException
SAX2: called on element declarations
Specified by:
elementDecl in interface org.xml.sax.misc.DeclHandler

externalEntityDecl

public void externalEntityDecl(java.lang.String name,
                               java.lang.String pubid,
                               java.lang.String sysid)
                        throws org.xml.sax.SAXException
SAX2: called on external entity declarations
Specified by:
externalEntityDecl in interface org.xml.sax.misc.DeclHandler

internalEntityDecl

public void internalEntityDecl(java.lang.String name,
                               java.lang.String value)
                        throws org.xml.sax.SAXException
SAX2: called on internal entity declarations
Specified by:
internalEntityDecl in interface org.xml.sax.misc.DeclHandler