XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Learning C# XML
by Niel Bornstein | Pages: 1, 2, 3

XmlReader

System.Xml contains many very useful classes. When you're talking about stream-based XML parsing, however, XmlReader is what you want..

XmlReader is most analogous to SAX, although it does not require implementing an interface as most SAX implementations do (as JAXP's does). Instead, you simply instantiate a concrete XmlReader of your choice -- there are several to choose from: XmlTextReader and XmlStreamReader are the most useful -- then call its Read() method and pick nodes off as they are returned to you.

In algorithmic terms, we could say that SAX uses callbacks, while XmlReader uses an event loop. An event loop is an infinite loop, during which certain events are received and dispatched to you. Internally, SAX may well have the same sort of event loop, but the callback methods hide that particular detail from you. A callback method is simply invoked when SAX's parser comes across an event.

While that means that we don't need all those callback methods to satisfy an interface, we can still use them to dig through the nodes that we've read. In fact, we're just retrofitting a SAX-like API on top of XmlReader. Here's the new version of the code; be sure and compare these to the Java version above.

public static void Main(string [] args) {
  // create an instance of RSSReader
  RSSReader reader = new RSSReader();

  // parse the document from the parameter
  reader.Parse(args[0]);
}

// we have to write this method ourselves, since it's
// not provided by the API
public void Parse(string url) {
  try {
    XmlTextReader reader = new XmlTextReader(url);
    while (reader.Read()) {
      switch (reader.NodeType) {
      case XmlNodeType.Document:
        StartDocument();
        break;
      case XmlNodeType.Element:
        string namespaceURI = reader.NamespaceURI;
        string name = reader.Name;
        Hashtable attributes = new Hashtable();
        if (reader.HasAttributes) {
          for (int i = 0; i < reader.AttributeCount; i++) {
            reader.MoveToAttribute(i);
            attributes.Add(reader.Name,reader.Value);
          }
        }
        StartElement(namespaceURI,name,name,attributes);
        break;
      case XmlNodeType.EndElement:
        EndElement(reader.NamespaceURI,
               reader.Name, reader.Name);
        break;
      case XmlNodeType.Text:
        Characters(reader.Value,0,reader.Value.Length);
        break;
      // There are many other types of nodes, but
      // we are not interested in them
      }
    }
  } catch (XmlException e) {
    Console.WriteLine(e.Message);
  }
}

...

public void StartElement(string namespaceURI,string sName,
             string qName,Hashtable attrs) {
  string eName = sName; // element name
  if ("".Equals(eName))
    eName = qName; // namespaceAware = false
  stack.Push(eName);
  value = new StringBuilder();
}

You'll notice that XmlReader has several interesting members, including NodeType, NamespaceURI, Name, and Value. NodeType can have several values depending on the node read from the XML document; we're only interested in Element, EndElement, and Text, though there are several other node types that you can use: CDATA, ProcessingInstruction, Comment, XmlDeclaration, Document, DocumentType, and EntityReference.

As we receive one of the NodeTypes we're interested in, we read the Name or Value and hand it off to our callback methods.

One other point to note is that SAX has an Attributes class, for which the closest equivalent in C# is XmlAttributeCollection. However, creating one of those is a more complex task than I really want to deal with right now, so instead, I've changed StartElement's last parameter to a Hashtable. I've done some fancy processing to populate it, but we're not interested in attributes for this example so I won't go into any more detail about it.

Now that we've got our event loop and our "callback" methods in place, we can port the RSS-to-HTML conversion logic from Java to C#. This isn't exactly rocket science either, as the syntax of C# is basically the same as Java. Don't forget, though: all method names start with capital letters.

Here is the complete source listing for RSSReader.cs:

using System;
using System.Collections;
using System.IO;
using System.Text;
using System.Xml;

public class com {
  public class xml {
    public class RSSReader {
      static String LineEnd = "\n";

      Stack stack = new Stack();
      StringBuilder value = null;
      string title = null;
      string link = null;
      string desc = null;

      public static void Main(string [] args) {
        // create an instance of RSSReader
        RSSReader reader = new RSSReader();

        // parse the document from the parameter
        reader.Parse(args[0]);
      }

      // we have to write this method ourselves, since it's
      // not provided by the API
      public void Parse(string url) {
        try {
          XmlTextReader reader = new XmlTextReader(url);
          while (reader.Read()) {
            switch (reader.NodeType) {
            case XmlNodeType.Document:
              StartDocument();
              break;
            case XmlNodeType.Element:
              string namespaceURI = reader.NamespaceURI;
              string name = reader.Name;
              Hashtable attributes = new Hashtable();
              if (reader.HasAttributes) {
                for (int i = 0; i < reader.AttributeCount; i++) {
                  reader.MoveToAttribute(i);
                  attributes.Add(reader.Name,reader.Value);
                }
              }
              StartElement(namespaceURI,name,name,attributes);
              break;
            case XmlNodeType.EndElement:
              EndElement(reader.NamespaceURI,
                     reader.Name, reader.Name);
              break;
            case XmlNodeType.Text:
              Characters(reader.Value,0,reader.Value.Length);
              break;
            // There are many other types of nodes, but
            // we are not interested in them
            }
          }
        } catch (XmlException e) {
          Console.WriteLine(e.Message);
        }
      }

      public void StartDocument() {
        Emit("<html>" + LineEnd);
      }

      public void EndDocument() {
        Emit("</html>" + LineEnd);
      }

      public void StartElement(string namespaceURI,string sName,
                   string qName,Hashtable attrs) {
        string eName = sName; // element name
        if ("".Equals(eName)) eName = qName; // namespaceAware = false
        stack.Push(eName);
        value = new StringBuilder();
      }

      public void EndElement(string namespaceURI,string sName,
                   string qName) {
        string eName = (string)stack.Pop();
        if (eName.Equals("title") &&
            stack.Peek().Equals("channel")) {
          Emit("  <head>" + LineEnd);
          Emit("  <title>" + value + "</title>" + LineEnd);
          Emit("  </head>" + LineEnd);
          Emit("  <body>" + LineEnd);
        } else if (eName.Equals("title") &&
                   stack.Peek().Equals("item")) {
          title = null == value ? "" : value.ToString();
        } else if (eName.Equals("link") &&
                   stack.Peek().Equals("item")) {
          link = null == value ? "" : value.ToString();
        } else if (eName.Equals("description") &&
                   stack.Peek().Equals("item")) {
          desc = null == value ? "" : value.ToString();
        } else if (eName.Equals("item")) {
          Emit("  <p><a href=\"" + link + "\">" +
                title + "</a><br>" + LineEnd);
          Emit("   " + desc + "</p>" + LineEnd);
        } else if (eName.Equals("channel")) {
          Emit("  </body>" + LineEnd);
        }
        value = null;
      }

      public void Characters(string buf, int offset, int len) {
        value.Append(buf);
      }

      private static void Emit(string s) {
        Console.Write(s);
      }
    }
  }
}

Try it, and you'll see that it compiles without a problem. Run it like so (output wrapped for legibility):

C:\> RSSReader http://xmlhack.com/rss.php
  <head>
  <title>xmlhack</title>
  </head>
  <body>
  <p><a href="http://www.xmlhack.com/read.php?item=1511">Activity
around the Dublin Core</a><br>
   The Dublin Core Metadata Initiative (DCMI) has seen a recent
spate of activity,
	Recent publications include The Namespace Policy for the
Dublin Core Metadata
	Initiative, Expressing Simple Dublin Core in RDF/XML, and
Expressing Qualified
	Dublin Core in RDF/XML.</p>
...

This should look very familiar, as it's exactly the same output that our Java program produced. You might have seen a completely different result, however, like this:

C:\> RSSReader http://xmlhack.com/rss.php

Unhandled Exception: System.Security.SecurityException:
    Request for the permission of type
	System.Net.WebPermission, System, Version=1.0.3300.0,
    Culture=neutral, PublicKeyToken=b77a5c561934e089 failed.
   at System.Security.CodeAccessSecurityEngine.CheckHelper(
    PermissionSet grantedSet,
    PermissionSet deniedSet, CodeAccess
    Permission demand, PermissionToken permToken)
   at System.Security.CodeAccessSecurityEngine.Check(
    PermissionToken permToken,
    CodeAccessPermission demand, StackCrawlMark& stackMark,
    Int32 checkFrames, Int32 unrestrictedOverride)
   at System.Security.CodeAccessSecurityEngine.Check(
    CodeAccessPermission cap, StackCrawlMark& stackMark)
   at System.Security.CodeAccessPermission.Demand()
   at System.Net.HttpRequestCreator.Create(Uri Uri)
   at System.Net.WebRequest.Create(Uri requestUri, 
    Boolean useUriBase)
   at System.Net.WebRequest.Create(Uri requestUri)
   at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri,
    ICredentials credentials)
   at System.Xml.XmlDownloadManager.GetStream(Uri uri,
    ICredentials credentials)
   at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri,
    String role, Type ofObjectToReturn)
   at System.Xml.XmlTextReader.CreateScanner()
   at System.Xml.XmlTextReader.Init()
   at System.Xml.XmlTextReader.Read()
   at RSSReader.Parse(String url) in U:\thing\RSSReader.cs:line 31
   at RSSReader.Main(String[] args) in U:\thing\RSSReader.cs:line 23

The state of the failed permission was:
<IPermission class="System.Net.WebPermission, System,
    Version=1.0.3300.0, Culture=neutral, 
    PublicKeyToken=b77a5c561934e089" version="1">
   <ConnectAccess>
    <URI uri="http://xmlhack\.com/rss\.php"/>
   </ConnectAccess>
</IPermission>

This happens because C# will not load an assembly on a network drive. Just move your RSSReader.exe executable onto a local driveand try again.

Conclusions

What can we learn from XmlReader? First of all, unlike Java's XML libraries, all of System.Xml is provided by Microsoft. This means that, among other things, there is a consistent interface and a consistent set of tools for all your XML needs. No need to shop around for parsers and SAX implementations.

That can also be considered a drawback. Since Java has multiple implementations, you're free to use the one that fits best. And with the advent of JAXP, you can drop in the different implementations without changing your code at all. Doing that in C# is, well, impossible; you're stuck with the one, true Microsoft way.

As far as the XmlReader event model, it doesn't seem that SAX has much to learn at all. You'll remember that we actually had to write some additional code when we ported the program to XmlReader because SAX provided the event loop for us; with SAX, all we have to do is write the callbacks.

On the other hand, System.Xml does provide some nifty classes like XmlAttribute (which we didn't really discuss here) and XmlNodeType, which give you very convenient, standard ways to look at attributes and nodes, instead of having to deal with strings and Hashtables and such. And all these classes are used throughout C#'s XML facilities.

If you don't want to write either an event loop or callbacks, the read-only, forward-only, stream-based model might not be for you; you might prefer a whole-document model (like, say, DOM). In that case, XmlReader will not appeal to you any more than SAX does. There is another set of tools in C#, starting with XmlDocument, which we'll discuss in the next article, which gives you all the power of a document stored in memory, plus the added convenience of building on what you've already learned.



1 to 4 of 4
  1. Use of C# in Visual Studio
    2002-12-13 06:20:58 Diane Weiss
  2. Descrepancies between Java and C#
    2002-03-21 08:42:44 Dennis Wilson
  3. JDOM
    2002-03-20 07:07:07 Peter Sparkes
    • JDOM
      2002-03-20 08:00:09 Niel Bornstein
  4. Difference between push and pull parsers
    2002-03-12 04:07:08 Jirka Kosek
1 to 4 of 4