XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

REXML: Processing XML in Ruby

November 09, 2005

REXML (Ruby Electric XML) is the XML processor of choice for Ruby programmers. It comes bundled with the standard Ruby distribution. It's fast, written in Ruby, and can be used in two ways: tree parsing and stream parsing. In this article, we show some basic constructs on how to use REXML for XML processing. We also introduce the use of Ruby's interactive debugger irb for exploring XML documents with the help of REXML.

We'll be using a DocBook bibliography file as example XML document. You will learn how to parse the document with the tree parsing API, to access elements and attributes, and to create and insert elements. We'll also look into the peculiarities of text nodes and entity processing. Finally, we will show an example use of the stream parsing API. This is our DocBook file:

Listing 1: The bibliography.xml file

bibliography.xml

Beginning with Tree Parsing

We start with the tree parsing API, which is very DOM-like, but more intuitive. This is our first code example:

Listing 2: Showing an XML File (code1.rb)

require 'rexml/document'
include REXML
file = File.new("bibliography.xml")
doc = Document.new(file)
puts doc

The require statement loads the REXML library. Then we include the REXML namespace, so we don't have to use names like REXML::Document all the time. We open an existing file named bibliography.xml and parse the XML source code, with the result in a Document object. Finally we show the document on the screen. When you execute the command ruby code1.rb, the source code of our bibliography XML document is shown.

It's possible that you may get this error message:

example1.rb:1:in `require': No such file to load 
  -- rexml/document (LoadError)
        from example1.rb:1

In which case your Ruby installation doesn't have REXML installed, because some package managers, such as Debian's APT, install the libraries as separate packages. Install the rexml package and try again.

The Document.new method takes an IO, Document or String object as its argument. The argument specifies the source from which we want to read an XML document. In our first example, we used an IO object, namely the File object which inherits from the IO class. Another child class of IO is the Socket class, which can be used with Document.new to get an XML document over a network connection.

If the Document constructor takes a Document as its argument, all its Element nodes are cloned to the new Document object. If the constructor takes a String argument, the string will be expected to contain an XML document. An example:

Listing 3: Showing an XML "Here Document" (code2.rb)

require 'rexml/document'
include REXML
string = <<EOF
<?xml version="1.0" encoding="ISO-8859-15"?>
<!DOCTYPE bibliography PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
    "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<bibliography>
    <biblioentry id="FHIW13C-1234">
      <author>
        <firstname>Godfrey</firstname>
        <surname>Vesey</surname>
      </author>
      <title>Personal Identity: A Philosophical Analysis</title>
      <publisher>
        <publishername>Cornell University Press</publishername>
      </publisher>
      <pubdate>1977</pubdate>
   </biblioentry>
</bibliography>
EOF
doc = Document.new(string)
puts doc

We use a "here document" string: all characters between <<EOF and EOF, with newlines included, are part of the string.

Pages: 1, 2, 3, 4, 5

Next Pagearrow