XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XMLPULL: A Response
Pages: 1, 2

Additional Remarks

In addition to resolving the general misunderstandings of the XmlPull API, we would like to point out some minor possible improvements to the article.

In the XmlPull API, there are two general options for iterating XML events. The preferred way is the method next(), which makes typical parsing as simple as possible by reporting START_TAG, END_TAG, TEXT, and END_DOCUMENT events only. Legacy or special purpose events such as COMMENT and PROCESSING_INSTRUCTION are silently ignored.

We think that it is a bit misleading to present the lower-level nextToken() method first. It is designed for advanced use of the API and exposes every possible detail about the input document.

The way the example of an XHTML outliner is written does not take full advantage of the XmlPull API. Instead of using a boolean state variable that can fail for nested headers (as the author notices), it is better to let the parsing code mirror the structure of XHTML. We think that the ability to minimize use of state flags (sometimes even whole state machines) is one of main advantages of XML pull parsing and should be properly exposed.

The updated example code is both easier to write and easier to maintain if it is split to reflect two distinctive functions. First, finding header elements

int event = parser.next();
while ( (event = parser.next()) != XmlPullParser.END_DOCUMENT) {
    if (event == XmlPullParser.START_TAG) {
        if (isHeader(parser.getName())) {
            printHeaderText(parser);
        }
    }
}

and, second, code that prints the text content of XHTML headers:

private static void printHeaderText(XmlPullParser parser)
    throws XmlPullParserException, IOException
{
    int level = 1;
    while( level > 0 ) {
        int evenType = parser.next();
        if (evenType == XmlPullParser.TEXT) {
            System.out.print(parser.getText());
        } else if (evenType == XmlPullParser.END_TAG) {
            --level;
        } else if (evenType == XmlPullParser.START_TAG) {
            ++level;
        }
    }
}

The full code sample is available for review.

It is important to note that namespace support is required and, although it is off by default, it can always be changed by calling:

factory.setNamespaceAware(true);

Support for validation in API

We have now provided an implementation of the XmlPull API called XNI2XmlPull, which is based on Xerces 2 and provides full support for XML validation.

In the case of example given in the article when XNI2XmlPull is used and validation is requested:

   factory.setValidating(true);

One will get an output similar to this when XHTML input is invalid:

org.xmlpull.v1.XmlPullParserException: could not parse:
:::::2:94:The content of element type "h1" must match "(a|br|
span|bdo|map|object|img|tt|i|b|big|small|em|strong|dfn|code|q|
samp|kbd|var|cite|abbr|acronym|sub|sup|input|select|textarea|
label|button|ins|del|script)". caused by: :::::2:94:The content
of element type "h1" must match "(a|br|span|bdo|map|object|img|
tt|i|b|big|small|em|strong|dfn|code|q|samp|kbd|var|cite|abbr|
acronym|sub|sup|input|select|textarea|label|button|ins|del|script)".
	at org.xmlpull.v1.xni2xmlpull1.X2Iterator.nextImpl(X2Iterator.java:763)
	at org.xmlpull.v1.xni2xmlpull1.X2Iterator.peekNextState(X2Iterator.java:784)
	at org.xmlpull.v1.xni2xmlpull1.X2Parser.nextImpl(X2Parser.java:1079)
	at org.xmlpull.v1.xni2xmlpull1.X2Parser.next(X2Parser.java:915)
	at XHTMLOutliner.printHeaderText(XHTMLOutliner.java:81)
	at XHTMLOutliner.main(XHTMLOutliner.java:56)

Conclusions

We have addressed the main concern of the author by providing an implementation of XmlPull API that supports validation. We hope that we have successfully addressed the other concerns as well.

The XmlPull API can be used now (it has been available for 8 months), and it is proven in practice. We are committed to incremental updates but also open to major changes if they are necessary. Future plans for XmlPull are to provide a serializer, more XML tests, and support for other languages (C++, PHP and so on) but we are open to user participation and new suggestions.

We are both working in the JSR 173 expert group that will probably generate a more object-oriented API without losing essential features of the XmlPull API. However, it is more important that the API is easy to use when writing code to parse XML than to enforce a particular paradigm. Since the API will be used as a building block for higher level APIs (for e.g., SOAP), it must be very efficient and should facilitate implementations with very small memory footprints.



1 to 3 of 3
  1. RE: design issues indict pattern
    2002-11-08 12:59:21 Marko Milicevic
  2. .NET Frameworks XmlReader
    2002-09-30 09:02:31 Chris Lovett
  3. design issues indict pattern
    2002-09-26 21:31:57 Eric Schwarzenbach
1 to 3 of 3