Learning C# XML
by Niel Bornstein
|
Pages: 1, 2, 3
XmlReader
System.Xml contains many very useful classes. When
you're talking about stream-based XML parsing, however,
XmlReader is what you want..
XmlReader is most analogous to SAX, although it does
not require implementing an interface as most SAX implementations do
(as JAXP's does). Instead, you simply instantiate a concrete
XmlReader of your choice -- there are several to choose
from: XmlTextReader and XmlStreamReader are
the most useful -- then call its Read() method and pick
nodes off as they are returned to you.
In algorithmic terms, we could say that SAX uses callbacks, while
XmlReader uses an event loop. An event loop is an
infinite loop, during which certain events are received and dispatched
to you. Internally, SAX may well have the same sort of event loop, but
the callback methods hide that particular detail from you. A callback
method is simply invoked when SAX's parser comes across an event.
While that means that we don't need all those callback methods to
satisfy an interface, we can still use them to dig through the nodes
that we've read. In fact, we're just retrofitting a SAX-like API on
top of XmlReader. Here's the new version of the code; be
sure and compare these to the Java version above.
public static void Main(string [] args) {
// create an instance of RSSReader
RSSReader reader = new RSSReader();
// parse the document from the parameter
reader.Parse(args[0]);
}
// we have to write this method ourselves, since it's
// not provided by the API
public void Parse(string url) {
try {
XmlTextReader reader = new XmlTextReader(url);
while (reader.Read()) {
switch (reader.NodeType) {
case XmlNodeType.Document:
StartDocument();
break;
case XmlNodeType.Element:
string namespaceURI = reader.NamespaceURI;
string name = reader.Name;
Hashtable attributes = new Hashtable();
if (reader.HasAttributes) {
for (int i = 0; i < reader.AttributeCount; i++) {
reader.MoveToAttribute(i);
attributes.Add(reader.Name,reader.Value);
}
}
StartElement(namespaceURI,name,name,attributes);
break;
case XmlNodeType.EndElement:
EndElement(reader.NamespaceURI,
reader.Name, reader.Name);
break;
case XmlNodeType.Text:
Characters(reader.Value,0,reader.Value.Length);
break;
// There are many other types of nodes, but
// we are not interested in them
}
}
} catch (XmlException e) {
Console.WriteLine(e.Message);
}
}
...
public void StartElement(string namespaceURI,string sName,
string qName,Hashtable attrs) {
string eName = sName; // element name
if ("".Equals(eName))
eName = qName; // namespaceAware = false
stack.Push(eName);
value = new StringBuilder();
}
You'll notice that XmlReader has several interesting
members, including NodeType, NamespaceURI,
Name, and Value. NodeType can
have several values depending on the node read from the XML document;
we're only interested in Element,
EndElement, and Text, though there are
several other node types that you can use: CDATA,
ProcessingInstruction, Comment,
XmlDeclaration, Document,
DocumentType, and EntityReference.
As we receive one of the NodeTypes we're interested
in, we read the Name or Value and hand it
off to our callback methods.
One other point to note is that SAX has an Attributes
class, for which the closest equivalent in C# is
XmlAttributeCollection. However, creating one of those
is a more complex task than I really want to deal with right now, so
instead, I've changed StartElement's last parameter to a
Hashtable. I've done some fancy processing to populate
it, but we're not interested in attributes for this example so I won't
go into any more detail about it.
Now that we've got our event loop and our "callback" methods in place, we can port the RSS-to-HTML conversion logic from Java to C#. This isn't exactly rocket science either, as the syntax of C# is basically the same as Java. Don't forget, though: all method names start with capital letters.
Here is the complete source listing for RSSReader.cs:
using System;
using System.Collections;
using System.IO;
using System.Text;
using System.Xml;
public class com {
public class xml {
public class RSSReader {
static String LineEnd = "\n";
Stack stack = new Stack();
StringBuilder value = null;
string title = null;
string link = null;
string desc = null;
public static void Main(string [] args) {
// create an instance of RSSReader
RSSReader reader = new RSSReader();
// parse the document from the parameter
reader.Parse(args[0]);
}
// we have to write this method ourselves, since it's
// not provided by the API
public void Parse(string url) {
try {
XmlTextReader reader = new XmlTextReader(url);
while (reader.Read()) {
switch (reader.NodeType) {
case XmlNodeType.Document:
StartDocument();
break;
case XmlNodeType.Element:
string namespaceURI = reader.NamespaceURI;
string name = reader.Name;
Hashtable attributes = new Hashtable();
if (reader.HasAttributes) {
for (int i = 0; i < reader.AttributeCount; i++) {
reader.MoveToAttribute(i);
attributes.Add(reader.Name,reader.Value);
}
}
StartElement(namespaceURI,name,name,attributes);
break;
case XmlNodeType.EndElement:
EndElement(reader.NamespaceURI,
reader.Name, reader.Name);
break;
case XmlNodeType.Text:
Characters(reader.Value,0,reader.Value.Length);
break;
// There are many other types of nodes, but
// we are not interested in them
}
}
} catch (XmlException e) {
Console.WriteLine(e.Message);
}
}
public void StartDocument() {
Emit("<html>" + LineEnd);
}
public void EndDocument() {
Emit("</html>" + LineEnd);
}
public void StartElement(string namespaceURI,string sName,
string qName,Hashtable attrs) {
string eName = sName; // element name
if ("".Equals(eName)) eName = qName; // namespaceAware = false
stack.Push(eName);
value = new StringBuilder();
}
public void EndElement(string namespaceURI,string sName,
string qName) {
string eName = (string)stack.Pop();
if (eName.Equals("title") &&
stack.Peek().Equals("channel")) {
Emit(" <head>" + LineEnd);
Emit(" <title>" + value + "</title>" + LineEnd);
Emit(" </head>" + LineEnd);
Emit(" <body>" + LineEnd);
} else if (eName.Equals("title") &&
stack.Peek().Equals("item")) {
title = null == value ? "" : value.ToString();
} else if (eName.Equals("link") &&
stack.Peek().Equals("item")) {
link = null == value ? "" : value.ToString();
} else if (eName.Equals("description") &&
stack.Peek().Equals("item")) {
desc = null == value ? "" : value.ToString();
} else if (eName.Equals("item")) {
Emit(" <p><a href=\"" + link + "\">" +
title + "</a><br>" + LineEnd);
Emit(" " + desc + "</p>" + LineEnd);
} else if (eName.Equals("channel")) {
Emit(" </body>" + LineEnd);
}
value = null;
}
public void Characters(string buf, int offset, int len) {
value.Append(buf);
}
private static void Emit(string s) {
Console.Write(s);
}
}
}
}
Try it, and you'll see that it compiles without a problem. Run it like so (output wrapped for legibility):
C:\> RSSReader http://xmlhack.com/rss.php <head> <title>xmlhack</title> </head> <body> <p><a href="http://www.xmlhack.com/read.php?item=1511">Activity around the Dublin Core</a><br> The Dublin Core Metadata Initiative (DCMI) has seen a recent spate of activity, Recent publications include The Namespace Policy for the Dublin Core Metadata Initiative, Expressing Simple Dublin Core in RDF/XML, and Expressing Qualified Dublin Core in RDF/XML.</p> ...
This should look very familiar, as it's exactly the same output that our Java program produced. You might have seen a completely different result, however, like this:
C:\> RSSReader http://xmlhack.com/rss.php
Unhandled Exception: System.Security.SecurityException:
Request for the permission of type
System.Net.WebPermission, System, Version=1.0.3300.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089 failed.
at System.Security.CodeAccessSecurityEngine.CheckHelper(
PermissionSet grantedSet,
PermissionSet deniedSet, CodeAccess
Permission demand, PermissionToken permToken)
at System.Security.CodeAccessSecurityEngine.Check(
PermissionToken permToken,
CodeAccessPermission demand, StackCrawlMark& stackMark,
Int32 checkFrames, Int32 unrestrictedOverride)
at System.Security.CodeAccessSecurityEngine.Check(
CodeAccessPermission cap, StackCrawlMark& stackMark)
at System.Security.CodeAccessPermission.Demand()
at System.Net.HttpRequestCreator.Create(Uri Uri)
at System.Net.WebRequest.Create(Uri requestUri,
Boolean useUriBase)
at System.Net.WebRequest.Create(Uri requestUri)
at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri,
ICredentials credentials)
at System.Xml.XmlDownloadManager.GetStream(Uri uri,
ICredentials credentials)
at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri,
String role, Type ofObjectToReturn)
at System.Xml.XmlTextReader.CreateScanner()
at System.Xml.XmlTextReader.Init()
at System.Xml.XmlTextReader.Read()
at RSSReader.Parse(String url) in U:\thing\RSSReader.cs:line 31
at RSSReader.Main(String[] args) in U:\thing\RSSReader.cs:line 23
The state of the failed permission was:
<IPermission class="System.Net.WebPermission, System,
Version=1.0.3300.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089" version="1">
<ConnectAccess>
<URI uri="http://xmlhack\.com/rss\.php"/>
</ConnectAccess>
</IPermission>
This happens because C# will not load an assembly on a network
drive. Just move your RSSReader.exe executable onto a
local driveand try again.
Conclusions
What can we learn from XmlReader? First of all, unlike
Java's XML libraries, all of System.Xml is provided by
Microsoft. This means that, among other things, there is a consistent
interface and a consistent set of tools for all your XML needs. No
need to shop around for parsers and SAX implementations.
That can also be considered a drawback. Since Java has multiple implementations, you're free to use the one that fits best. And with the advent of JAXP, you can drop in the different implementations without changing your code at all. Doing that in C# is, well, impossible; you're stuck with the one, true Microsoft way.
As far as the XmlReader event model, it doesn't seem
that SAX has much to learn at all. You'll remember that we actually
had to write some additional code when we ported the program to
XmlReader because SAX provided the event loop for us;
with SAX, all we have to do is write the callbacks.
On the other hand, System.Xml does provide
some nifty classes like XmlAttribute (which we didn't
really discuss here) and XmlNodeType, which give you very
convenient, standard ways to look at attributes and nodes, instead of
having to deal with strings and Hashtables and such. And all these
classes are used throughout C#'s XML facilities.
If you don't want to write either an event loop or callbacks, the
read-only, forward-only, stream-based model might not be for you; you
might prefer a whole-document model (like, say, DOM). In that case,
XmlReader will not appeal to you any more than SAX
does. There is another set of tools in C#, starting with
XmlDocument, which we'll discuss in the next article,
which gives you all the power of a document stored in memory, plus the
added convenience of building on what you've already learned.
- Use of C# in Visual Studio
2002-12-13 06:20:58 Diane Weiss - Descrepancies between Java and C#
2002-03-21 08:42:44 Dennis Wilson - Descrepancies between Java and C#
2002-03-21 10:52:01 Niel Bornstein - JDOM
2002-03-20 07:07:07 Peter Sparkes - JDOM
2002-03-20 08:00:09 Niel Bornstein - Difference between push and pull parsers
2002-03-12 04:07:08 Jirka Kosek