Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

XMLPULL: A Response
Pages: 1, 2

Additional Remarks

In addition to resolving the general misunderstandings of the XmlPull API, we would like to point out some minor possible improvements to the article.

In the XmlPull API, there are two general options for iterating XML events. The preferred way is the method next(), which makes typical parsing as simple as possible by reporting START_TAG, END_TAG, TEXT, and END_DOCUMENT events only. Legacy or special purpose events such as COMMENT and PROCESSING_INSTRUCTION are silently ignored.

We think that it is a bit misleading to present the lower-level nextToken() method first. It is designed for advanced use of the API and exposes every possible detail about the input document.

The way the example of an XHTML outliner is written does not take full advantage of the XmlPull API. Instead of using a boolean state variable that can fail for nested headers (as the author notices), it is better to let the parsing code mirror the structure of XHTML. We think that the ability to minimize use of state flags (sometimes even whole state machines) is one of main advantages of XML pull parsing and should be properly exposed.

The updated example code is both easier to write and easier to maintain if it is split to reflect two distinctive functions. First, finding header elements

int event = parser.next();
while ( (event = parser.next()) != XmlPullParser.END_DOCUMENT) {
    if (event == XmlPullParser.START_TAG) {
        if (isHeader(parser.getName())) {
            printHeaderText(parser);
        }
    }
}

and, second, code that prints the text content of XHTML headers:

private static void printHeaderText(XmlPullParser parser)
    throws XmlPullParserException, IOException
{
    int level = 1;
    while( level > 0 ) {
        int evenType = parser.next();
        if (evenType == XmlPullParser.TEXT) {
            System.out.print(parser.getText());
        } else if (evenType == XmlPullParser.END_TAG) {
            --level;
        } else if (evenType == XmlPullParser.START_TAG) {
            ++level;
        }
    }
}

The full code sample is available for review.

It is important to note that namespace support is required and, although it is off by default, it can always be changed by calling:

factory.setNamespaceAware(true);

Support for validation in API

We have now provided an implementation of the XmlPull API called XNI2XmlPull, which is based on Xerces 2 and provides full support for XML validation.

In the case of example given in the article when XNI2XmlPull is used and validation is requested:

   factory.setValidating(true);

One will get an output similar to this when XHTML input is invalid:

org.xmlpull.v1.XmlPullParserException: could not parse:
:::::2:94:The content of element type "h1" must match "(a|br|
span|bdo|map|object|img|tt|i|b|big|small|em|strong|dfn|code|q|
samp|kbd|var|cite|abbr|acronym|sub|sup|input|select|textarea|
label|button|ins|del|script)". caused by: :::::2:94:The content
of element type "h1" must match "(a|br|span|bdo|map|object|img|
tt|i|b|big|small|em|strong|dfn|code|q|samp|kbd|var|cite|abbr|
acronym|sub|sup|input|select|textarea|label|button|ins|del|script)".
	at org.xmlpull.v1.xni2xmlpull1.X2Iterator.nextImpl(X2Iterator.java:763)
	at org.xmlpull.v1.xni2xmlpull1.X2Iterator.peekNextState(X2Iterator.java:784)
	at org.xmlpull.v1.xni2xmlpull1.X2Parser.nextImpl(X2Parser.java:1079)
	at org.xmlpull.v1.xni2xmlpull1.X2Parser.next(X2Parser.java:915)
	at XHTMLOutliner.printHeaderText(XHTMLOutliner.java:81)
	at XHTMLOutliner.main(XHTMLOutliner.java:56)

Conclusions

We have addressed the main concern of the author by providing an implementation of XmlPull API that supports validation. We hope that we have successfully addressed the other concerns as well.

The XmlPull API can be used now (it has been available for 8 months), and it is proven in practice. We are committed to incremental updates but also open to major changes if they are necessary. Future plans for XmlPull are to provide a serializer, more XML tests, and support for other languages (C++, PHP and so on) but we are open to user participation and new suggestions.

We are both working in the JSR 173 expert group that will probably generate a more object-oriented API without losing essential features of the XmlPull API. However, it is more important that the API is easy to use when writing code to parse XML than to enforce a particular paradigm. Since the API will be used as a building block for higher level APIs (for e.g., SOAP), it must be very efficient and should facilitate implementations with very small memory footprints.


Comment on this articleComment on your experiences with XMLPULL in our forum
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • RE: design issues indict pattern
    2002-11-08 12:59:21 Marko Milicevic [Reply]

    It's completely ridiculous to suggest that pull has no value because it does not fit into your pure model of OO. OO is not an optimal design solution for all problems. Design/implementation of any code is full of all sorts of tradeoffs, including performance and usability. You almost never can not have it all. When the rubber hits the road you will have to make a decision as to what is most important to you as a designer. In my opinion, if some degree of OO purity needs to be sacrificed to optimize performance and/or usability for a particular problem, then so be it.


  • .NET Frameworks XmlReader
    2002-09-30 09:02:31 Chris Lovett [Reply]

    Another example of a validating pull model parser is the XmlValidatingReader which we shipped in the .NET Frameworks back in January '02. I don't buy that pull is any less "object oriented", it's just different. Those who are used to the push model have difficulty switching to pull. Those who've never seen an XML parser before seem to prefer the pull model.

  • design issues indict pattern
    2002-09-26 21:31:57 Eric Schwarzenbach [Reply]

    I think the difficulties noted here for applying good OO design techniques to a pull parser, really serve as an indictment of the pull parsing pattern in general, as opposed to event-driven designs like SAX.


    In other words, you have this design quandry of choosing between some convolutions and costly inefficiencies to make it more OO, or to have this very flat, ugly, non-OO design. When you find yourself in design dilemmas that have no good solution it's usually in indication that some broader design decision that has sent you down this path, was wrong.




    • design issues indict pattern
      2004-02-27 08:07:50 Liam Coughlin [Reply]

      There are also some problems that simply are solved better with an aesthetic that is "flat, ugly, non-OO". You're making assertions about the value of the design based on your aesthetic preferences, and intimating that some nebulous "mistake" has been made earlier in the process. Not good criticism, not good anything really.