|
hi,
first i wanted to compliment the article author about very good work.
however as co-author of XMLPULL V1 API i would like to comment on issues brought in the article:
(...) First, we're using XmlPullParser as a rough equivalent of XmlTextReader. One difference is that
while we are able to instantiate an XmlTextReader directly in C# (remember, Microsoft is a
one-stop shop), we have to use the Java XmlPullParserFactory to get a concrete implementation
of the XmlPullParser interface. This should be
a familiar exercise for anyone who's used JAXP or, for that matter, JDBC. (...)
this allow to select alternate implementations of XMLPULL API
(even such that works with WBXML) and does not lock users
with one implementation (as XmlTextReader unfortunately does ...)
(...) Another difference between the .NET XmlReader and the Java XmlPullParser has to
do with the way in which events are pulled out of the XML document. In the former,
the ReadString() method will return all the text for the
current element; while in the latter, next() must explicitly be called
to position the parser at the text node before
calling getText() or readText() to read the text. (...)
this was changed and in the newest XMLPULL API one can cal nextText()
to get text content of element in one step.
(...) This may be a minor difference, but it tends to make our port a little more difficult. To better
handle this requirement, I've changed several while loops into do...while loops.
This, unfortunately, makes it less than a simple port; the logic has changed, but not considerably. (...)
so in short this is now fixed and change is no longer needed :-)
there is one incompatibility introduced in the latest 1.0.8 release of XMLPULL API
that is affecting your sample code: now it has parser.nextText() instead of parser.readText()
consequently there i no need to call "if (parser.next() == XmlPullParser.TEXT)"
(if possible please update sample as an appendix to the article).
i have also modified RSSReader to correctly follow RSS DTD
(http://my.netscape.com/publish/formats/rss-0.91.dtd)
- the key difference is that channel content model allow title at any position and item is inside
channel:
<!ELEMENT channel (title | description | link | language | item+ | rating? | image? | textinput? |
copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? |
skipDays?)*>
<!ELEMENT item (title | link | description)*>
the modified sample tries to validate input so it will detect if it is not RSS feed
or if it has unexpected structure and will report it to the user
- also for convenience now parser will skip also unknown
top-level elements in channel. additionally though the modified sample is now more
sophisticated it is now shorter and i think it demonstrates that XMLPULL API
can work pretty good for parsing in Java! the modified sample is appended
at the bottom of the message (and attached to email)
also currently on XMLPULL-DEV mailing list we are working on a class
to write XML to make it easy to do what you described with XMLPULL API
and we welcome input from anybody interested in XML pull parsing and in XML
serialization.
thanks,
alek
ps. there is a bug in ItemToHtml() - assumption that link must be not null
- if attribute value is null then Writer dies - fix it with
(...) if(link != null) attributes.put("href", link);(...)
the same fix is required for description to not print it if null:
(...) if(description != null) writer.write(description); (...)
- that is related to problem that as you can see in DTD
item may or may not have link or description:
<!ELEMENT item (title | link | description)*>
ps2. this is modified version of the sample that was in the article:
//package com.xml;
// modifiad based by Aleksander Slominski
// based on http://www.xml.com/pub/a/2002/05/22/parsing.html?page=2
import java.io.*;
import java.net.*;
import java.util.*;
import com.alexandriasc.xml.XMLWriter;
import org.xmlpull.v1.*;
public class RSSReader {
public static void main(String [] args)
{
// create an instance of RSSReader
RSSReader rssreader = new RSSReader();
XMLWriter writer = null;
try {
writer = new XMLWriter(new OutputStreamWriter(System.out),false);
XmlPullParser parser = XmlPullParserFactory.newInstance().newPullParser();
String url = args[0];
InputStreamReader stream = new InputStreamReader(
new URL(url).openStream());
parser.setInput(stream);
rssreader.convertRSSToHtml(parser, writer);
} catch (Exception e) {
e.printStackTrace(System.err);
}
}
public void convertRSSToHtml(XmlPullParser parser, XMLWriter writer)
throws IOException, XmlPullParserException
{
// <!ELEMENT rss (channel)>
if (parser.nextTag() == XmlPullParser.START_TAG
&& parser.getName().equals("rss"))
{
writer.beginElement("html");
if (parser.nextTag() == XmlPullParser.START_TAG
&& parser.getName().equals("channel"))
{
convertChannelToHtml(parser, writer);
parser.require(XmlPullParser.END_TAG, null, "channel");
} else {
new RuntimeException("expectd channel start tag not "+parser.getPositionDescription());
}
parser.nextTag();
parser.require(XmlPullParser.END_TAG, null, "rss");
writer.endElement();
writer.flush();
} else {
throw new RuntimeException("expectd an RSS document at" + parser.getPositionDescription());
}
}
public void convertChannelToHtml(XmlPullParser parser, XMLWriter writer)
throws IOException, XmlPullParserException
{
// <!ELEMENT channel (title | description | link | language | item+ | rating? | image? | textinput? |
copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? |
skipDays?)*>
boolean seenBody = false; //assumption that title is before items ...
while (parser.nextTag() != XmlPullParser.END_TAG) { // this guranteed by well formednes of XML &&
parser.getName().equals("channel"))) {
// if (parser.getEventType() == XmlPullParser.START_TAG) { //guranteed by nextTag
// <!ELEMENT title (#PCDATA)>
if(parser.getName().equals("title") && !seenBody) {
writer.beginElement("head");
writer.writeElement("title",null,parser.nextText());
writer.endElement();
} else if(parser.getName().equals("item")) {
if(!seenBody) {
writer.beginElement("body");
seenBody = true;
}
convertItemToHtml(parser, writer);
} else {
// skip any element content including sub elements...
int level = 1;
while (level > 0) {
switch(parser.next()) {
case XmlPullParser.START_TAG: ++level; break;
case XmlPullParser.END_TAG: --level; break;
}
}
}
}
if(seenBody) writer.endElement();
}
public void convertItemToHtml(XmlPullParser parser, XMLWriter writer)
throws IOException, XmlPullParserException
{
writer.beginElement("p");
//<!ELEMENT item (title | link | description)*>
String title = null, link = null, description = null;
while (parser.nextTag() != XmlPullParser.END_TAG) {
if (parser.getName().equals("title")) {
title = parser.nextText();
} else if (parser.getName().equals("link")) {
link = parser.nextText();
} else if (parser.getName().equals("description")) {
description = parser.nextText();
}
}
HashMap attributes = new HashMap(1);
if(link != null) attributes.put("href", link);
writer.beginElement("a",attributes);
if(title != null) writer.write(title);
writer.endElement();
writer.writeEmptyElement("br");
if(description != null) writer.write(description);
writer.endElement(); // end the "p" element
}
}
|