Menu

From JDOM to XmlDocument

April 3, 2002

Niel Bornstein

The Microsoft .NET framework is becoming well known for its integration of XML into nearly all data-manipulation tasks. In the first article in this series, I walked through the process of porting a simple Java application using SAX to one using .NET's XmlReader. I concluded that there are advantages and disadvantages to each language's way of doing things, but pointed out that if you are not a fan of forward-only, event-based XML parsing, neither one will really fit your fancy. This article focuses on porting whole-document XML programs from Java to C#.

The Exercise

I begin with one of those standard small programs that everyone has written at some point to learn about XML. I've written a Java program which used JDOM to read and write an XML file representing a catalog of compact discs. Of course this is really a program of little practical use given the relatively easy availability of similar free and open source applications; however, it represents a fairly simple problem domain, and it also allows me to show off the diversity of my CD collection.

So, to begin, here is the source listing for CDCatalog.java. I have not bothered to create any sort of DTD or schema at this time because it's a very simple document, and validation is not necessary.


package com.xml;

import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintStream;
import java.util.Iterator;
import java.util.List;

public class CDCatalog {

    File file = null;
    Document document = null;

    public static void main(String args[]) {
        if (args.length > 0) {
            String xmlFile = args[0];
            CDCatalog catalog = new CDCatalog(xmlFile);

            String action = args[1];

            try {
                if (args.length == 5) {
                    String title = args[2];
                    String artist = args[3];
                    String label = args[4];
                    if (action.equals("add")) {
                        catalog.addCD(title, artist, 
                            label);
                    } else if (action.equals("delete")) {
                        catalog.deleteCD(title, 
                            artist, label);
                    }
                }

                // save the changed catalog
                catalog.save();
            } catch (Exception e) {
                e.printStackTrace(System.err);
            }
        }
    }

    public CDCatalog(String fileName) {
        try {
            file = new File(fileName);
            if (file.exists()) {
                loadDocument(file);
            } else {
                createDocument();
            }
        } catch (Exception e) {
            e.printStackTrace(System.err);
        }
    }

    private void loadDocument(File file) 
    throws JDOMException {
        SAXBuilder builder = new SAXBuilder();
        document = builder.build(file);
    }

    private void createDocument() {
        Element root = new Element("CDCatalog");
        document = new Document(root);
    }

    public void addCD(String title, String artist, 
        String label) {
        Element cd = new Element("CD");
        cd.setAttribute("title",title);
        cd.setAttribute("artist",artist);
        cd.setAttribute("label",label);

        document.getRootElement().getChildren().add(cd);
    }

    public void deleteCD(String title, String artist,  
        String label) {
        List cds = document.getRootElement().getChildren();
        for (int i = 0; i < cds.size(); i++) {
            Element next = (Element)cds.get(i);
            if (next.getAttribute("title").getValue()
                .equals(title) &&
                next.getAttribute("artist").getValue()
                .equals(artist) &&
                next.getAttribute("label").getValue()
                .equals(label)) {
                next.detach();
            }
        }
    }

    public void save() throws IOException {
        XMLOutputter outputter = new XMLOutputter();
        FileWriter writer = new FileWriter(file);
        outputter.output(document,writer);
        writer.close();
    }
}

I very intentionally used JDOM for this program rather than DOM; since the target of our port is a real DOM, I wanted to add the twist of starting off with an API that, while DOM-like, is really not DOM.

Compiling and running it a few times, we get this XML file (reformatted to be more human-readable):


C:\java com.xml.CDCatalog CDCatalog.xml add "Dummy" 
  "Portishead" "Go!"

<?xml version="1.0" encoding="UTF-8"?>
<CDCatalog>
  <CD title="Dummy" artist="Portishead" label="Go!" />
  <CD title="Caribe Atomico" artist="Aterciopelados" 
    label="BMG" />
  <CD title="New Favorite" artist="Alison Kraus + 
    Union Station" label="Rounder" />
  <CD title="Soon As I'm On Top Of Things" artist=
    "Zoe Mulford" label="MP3.com" />
  <CD title="Japanese Melodies" artist="Yo-Yo Ma" 
    label="CBS" />
  <CD title="In This House, On This Morning" artist=
    "Wynton Marsalis Septet" label="Columbia" />
</CDCatalog>

Everything's a Node

XmlDocument is the .NET XML document tree view object. Much like JDOM's Document class, XmlDocument allows you to access any node in the XML tree randomly. Unlike Document, however, XmlDocument is itself a subclass of XmlNode. I didn't talk about XmlNode in the previous article, so let's take a look at it and some of its members now. If you are already familiar with DOM, this will explain how .NET implements it; if not, this should serve as a basic introduction, with the caveat that we're talking specifically about the .NET implementation.

The System.Xml assembly is somewhat monolithic; everything you might find in an XML document is a subclass of XmlNode. Besides XmlDocument, this includes the document type (XmlDocumentType), elements (XmlElement), attributes (XmlAttribute), CDATA (XmlCDataSection), even the humble entity reference (XmlEntityReference). And, as you can tell, the object names are very descriptive.

As an experienced object-oriented developer, you know that this makes for some very nicely polymorphic code. All of XmlNode's subclasses inherit, override, or overload several important members. Among the properties that you can access are the following.

  • Name - the name of the node
  • NamespaceURI - the namespace of the node
  • NodeType - the type of the node
  • OwnerDocument - the document in which the node appears
  • Prefix - the namespace prefix of the node
  • Value - the value of the node

I said all of these members are properties. It may at first seem like a violation of object-oriented encapsulation, but you may access any of these properties directly if you want to change, for example, the value of a node. Once you realize that getting or setting a C# property actually involves calling an implicit accessor method, however, any objections to this technique should disappear. These accessor methods are generated automatically through the use of a special syntax, which you'll see if you look up XmlNode.Value, for example, in the .NET framework SDK reference:

public virtual string Value {get; set;}

An explanation of the mechanism at work here is beyond the scope of this article; suffice to say, it works.

There are exceptions to this willy-nilly access to C# properties, of course; you cannot set the NodeType of an XmlNode, because it already is what it is. Also, you can set some properties of some node types and not others; for example, while you can set the Value of an XmlAttribute, attempting to set the Value of an XmlElement will cause an InvalidOperationException to be thrown, because an element cannot have a value (though it can certainly have children elements and attributes and other types of nodes).

Additionally, some nodes may be marked as read-only. You can check the IsReadOnly property of any XmlNode to verify whether it can be changed. Setting some properties of a read-only node will cause an ArgumentException to be thrown. In those cases where the node is read-only, all you can really do is remove it from one location in the tree and insert it elsewhere.

XmlNodes have methods as well as properties. Some of the ones you'll use a lot are AppendChild() and InsertAfter(), both of whose names are fairly descriptive.

That's enough description for now, let's dive into the code.

Ready, Set, Port

Remembering our basic C# language lessons from part one, let's skip right on ahead to the API-specific code.

There are a few changes we have to worry about that were not relevant in our first exercise. For example, while there is a File class in C#, it's not directly comparable to the Java File. For example, nearly all its methods are static.

As I mentioned earlier, we're converting our code not just from Java to C#, but from JDOM to DOM. While this is not necessarily a formidable task, it does complicate things a bit. Perhaps the easiest part of this port will be changing instances of JDOM's Document and Element to C#'s XmlDocument and XmlElement, respectively.

JDOM's way of doing things is often the reverse of DOM's way. For example, in our Java createDocument() method, we instantiate a root Element and then instantiate the Document, passing in the Element. In C#, we instantiate the XmlDocument, call its CreateElement() method to create an XmlElement, and then insert the resultant XmlElement as the root element of the tree with AppendChild().

A similar pattern is used in the AddCD() method; the XmlDocument's CreateElement() method is called, and the XmlElement is then inserted as a child of the root element with AppendChild(). In short, the XmlDocument serves a dual role as a representation of the document itself and as a factory to create new elements.

Another difference between JDOM and System.Xml, and indeed between JDOM and DOM itself, is that while JDOM deals exclusively with standard Java collections (such as List, in deleteCD()), DOM defines its own collections of nodes. In our C# DeleteCD() method, we're dealing with an XmlNodeList. In practical terms, an XmlNodeList is not dealt with any differently than a Java list.

Down Another Path

Upon further thought, it seems like our original Java program is missing something; there's no way to search for entries. This sounds like a job for XPath. XPathNavigator implements XPath in .NET, and you create an XPathNavigator by calling XmlNode.CreateNavigator(). The following XPath will find the Zoe Mulford CDs in my collection:


//CD[@artist='Zoe Mulford']

In C# code, we'll compile this path and select those nodes that match it, then print them.


XPathNavigator nav = document.CreateNavigator();
XPathNodeIterator iterator = 
    nav.Select("//CD[@artist=normalize-space('" + 
    artist + "')]");
Console.WriteLine("CDs by {0}", artist);
while (iterator.MoveNext()){
    XPathNavigator nav2 = iterator.Current.Clone();
    Console.WriteLine(" \"{0}\"",
        ((IHasXmlNode)nav2).GetNode().Attributes[0].Value);
}

You'll notice that I added a call to normalize-space() in the XPath. If you're not familiar with it, normalize-space() strips off leading and trailing white space from the value and reduces any repeating whitespace to a single space character. While it's not strictly necessary, I thought it might be useful in this case because the data, which was entered manually by a person, might not be normalized.

By using XPath, we can navigate directly to any CDs we want to find. This may or may not be any more efficient than searching through an entire CD catalog by hand; but it is easier to use in a consistent manner and has the added benefit that any performance improvements in the .NET runtime will automatically be reflected in your application.

So, here's our final code listing.


using System;
using System.IO;
using System.Xml;
using System.Xml.XPath;

public class CDCatalog {

    FileStream file = null;
    XmlDocument document = null;

    public static void Main(string [] args) {
        if (args.Length > 0) {
            string xmlFile = args[0];
            CDCatalog catalog = new CDCatalog(xmlFile);

            string action = args[1];

            if (args.Length == 5) {
                string title = args[2];
                string artist = args[3];
                string label = args[4];
                if (action == "add") {
                    catalog.AddCD(title, artist, 
                        label);
                } else if (action == "delete") {
                    catalog.DeleteCD(title, 
                        artist, label);
                }
            } else if (args.Length == 3) {
                string artist = args[2];
                if (action == "find") {
                    catalog.SearchForArtist(artist);
                }
            }

            // save the changed catalog
            catalog.Save();
        }
    }

    public CDCatalog(string fileName) {
        if (File.Exists(fileName)) {
            LoadDocument(fileName);
        } else {
            CreateDocument(fileName);
        }
    }

    private void LoadDocument(string fileName) {
        file = File.Open(fileName,FileMode.Open);
        document = new XmlDocument();
        document.Load(file);
    }

    private void CreateDocument(string fileName) {
        file = File.Create(fileName);
        document = new XmlDocument();
        XmlElement root = document.CreateElement("CDCatalog");
        document.AppendChild(root);
    }

    public void AddCD(string title, string artist, 
        string label) {
        XmlElement cd = document.CreateElement("CD");
        cd.SetAttribute("title",title);
        cd.SetAttribute("artist",artist);
        cd.SetAttribute("label",label);

        document.DocumentElement.AppendChild(cd);
    }

    public void DeleteCD(string title, string artist,  
        string label) {
        XmlNodeList cds = document.DocumentElement.ChildNodes;
        for (int i = 0; i < cds.Count; i++) {
            XmlElement next = (XmlElement)cds[i];
            if (next.GetAttribute("title") == title &&
                next.GetAttribute("artist") == artist &&
                next.GetAttribute("label") == label) {
                document.DocumentElement.RemoveChild(next);
            }
        }
    }

    public void SearchForArtist(string artist) {
        XPathNavigator nav = document.CreateNavigator();
        XPathExpression expr = nav.Compile(
            "//CD[@artist=normalize-space('" + artist + "')]");
        XPathNodeIterator iterator = nav.Select(expr);
        Console.WriteLine("CDs by {0}:", artist);
        while (iterator.MoveNext()){
            XPathNavigator nav2 = iterator.Current.Clone();
            Console.WriteLine(" \"{0}\"",
                ((IHasXmlNode)nav2).GetNode().Attributes[0].Value);
        }
    }

    public void Save() {
        file.Position = 0;
        XmlTextWriter writer = new XmlTextWriter(
            new StreamWriter(file));
        document.WriteTo(writer);
        file.SetLength(file.Position);
        writer.Close();
    }
}

And now we'll compile and run a couple of tests:

C:\>csc /debug /r:System.Xml.dll /t:exe CDCatalog.cs
Microsoft (R) Visual C# .NET Compiler version 7.00.9466
for Microsoft (R) .NET Framework version 1.0.3705
Copyright (C) Microsoft Corporation 2001. All rights reserved.


C:\>CDCatalog CDCatalog.xml add "High Strung Tall Tales" 
    "Adrian Legg" "Relativity"

C:\>CDcatalog CDCatalog.xml find "Alison Kraus +  Union Station"
CDs by Alison Kraus +  Union Station:
 "New Favorite"

Conclusions

I've shown you how to port your SAX Java code to XmlReader and your JDOM Java code to XmlDocument, with a small helping of XPath. These are the basic technologies that most developers are familiar with, and you should now be ready to apply them in your C# programming.

But the original task I set out to accomplish was to see what could be learned from Microsoft's XML APIs. In my first article, I concluded that Microsoft's one-stop-shopping is both positive and negative, depending on your point of view. However, I'm beginning to see a greater benefit to this single source of objects; the XmlNodeType that you deal with in XmlReader is exactly the same object that you deal with in DOM. This could easily have the benefit of shortening your learning cycle, as well as making your code more reusable. The Java community could certainly stand to learn something here.

In the next installment of this series, I'll take another look at the venerable RSSReader, and make it a better C# program by using XmlReader the way it was meant to be used, as a pull-parser. And I'll compare that to some pull-parsers in the Java world.