|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--org.brownell.xml.HtmlParser
This is a wrapper around the javax.swing.text.html.parser.* HTML parser, implementing the 1-June-1999 draft SAX2 interfaces. On valid HTML, and much invalid or malformed HTML, it produces a stream of SAX parsing events corresponding to the parse of the corresponding (well formed) XHTML document. Element and attribute names are uniformly presented in lower case. (The Level 1 HTML DOM spec seems to be exotic in not adopting that convention.)
Only one type of lexical event is reported: comments are visible. This is generally used with HTML to access inlined CSS comments which are protected against browsers old enough that they don't understand what the "style" tag means. Expansions of built-in entities (such as " ") or character references are accordingly not visible.
This parser does not support dynamic modification of the input stream to the parser, needed to fully support <script> tags which use the DOM to splice new page content into documents as they load.
Current (Swing 1.1) HTML parsing issues of note include:
This driver adds ignorable newlines at various locations where they won't be confused with HTML content. These may of course be ignored. If they are not ignored, they make the output of this parser be more easily printed, since otherwise HTML files of all sizes will appear without line breaks of any kind, and viewing the output of this parser will cause trouble for most text editors.
Constructor Summary | |
HtmlParser()
Constructs a new HTML parser. |
Method Summary | |
boolean |
getFeature(java.lang.String featureId)
SAX2: Tells whether this parser supports the specified feature. |
java.lang.Object |
getProperty(java.lang.String propertyId)
SAX2: Returns the specified property. |
void |
parse(org.xml.sax.InputSource input)
SAX1: parse the HTML text in the given input source. |
void |
parse(java.lang.String uri)
SAX1: Parse the HTML text at the given input URI. |
void |
setDocumentHandler(org.xml.sax.DocumentHandler handler)
SAX1: Provides an object which receives callbacks for the most significant document information. |
void |
setDTDHandler(org.xml.sax.DTDHandler handler)
SAX1: Provides an object which may be used to intercept declarations related to notations and unparsed entities. |
void |
setEntityResolver(org.xml.sax.EntityResolver resolver)
SAX1: Provides an object which may be used when resolving external entities during parsing (both general and parameter entities). |
void |
setErrorHandler(org.xml.sax.ErrorHandler handler)
SAX1: Provides an object which receives callbacks for HTML errors of all levels (fatal, nonfatal, warning). |
void |
setFeature(java.lang.String featureId,
boolean state)
SAX2: Sets the state of features supported in this parser. |
void |
setLocale(java.util.Locale locale)
SAX1: Identifies the locale which the parser should use for the diagnostics it provides. |
void |
setProperty(java.lang.String propertyId,
java.lang.Object property)
SAX2: Assigns the specified property. |
Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
Constructor Detail |
public HtmlParser()
Method Detail |
public void setErrorHandler(org.xml.sax.ErrorHandler handler)
Note that this parser does not provide a consistent categorization of errors according to the categories defined in the SAX API. Most problems are reported at the "warning" level, and even those few validity related errors reported at the "nonfatal" level may not be viewed as issues in all HTML environments. No errors are reported as "fatal".
Throwing an exception from an error handler may not work well.
public void setDocumentHandler(org.xml.sax.DocumentHandler handler)
public void setDTDHandler(org.xml.sax.DTDHandler handler)
Not used by this parser.
public void setEntityResolver(org.xml.sax.EntityResolver resolver)
Not used by this parser.
public void setLocale(java.util.Locale locale) throws org.xml.sax.SAXException
Not used by this parser.
public void parse(org.xml.sax.InputSource input) throws org.xml.sax.SAXException, java.io.IOException
public void parse(java.lang.String uri) throws org.xml.sax.SAXException, java.io.IOException
public boolean getFeature(java.lang.String featureId) throws org.xml.sax.SAXException
public java.lang.Object getProperty(java.lang.String propertyId) throws org.xml.sax.SAXException
public void setFeature(java.lang.String featureId, boolean state) throws org.xml.sax.SAXException
public void setProperty(java.lang.String propertyId, java.lang.Object property) throws org.xml.sax.SAXException
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |