XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

REXML: Processing XML in Ruby
by Koen Vervloesem | Pages: 1, 2, 3, 4, 5

Accessing Elements and Attributes

From now on, we will use irb, Ruby's interactive debugger, for examples of the use of the library REXML. At the irb prompt, we load the file bibliography.xml into a document. After that, we can execute commands to access the elements and attributes of the document, in an interactive way.

koan$ irb
irb(main):001:0> require 'rexml/document'
=> true
irb(main):002:0> include REXML
=> Object
irb(main):003:0> doc = Document.new(File.new("bibliography.xml"))
=> <UNDEFINED> ... </>

Now you can explore the document very easily. Let's look at a typical irb session with our XML file:

irb(main):004:0> root = doc.root
=> <bibliography id='personal_identity'> ... </>
irb(main):005:0> root.attributes['id']
=> "personal identity"
irb(main):006:0> puts root.elements[1].elements["author"]
<author>
  <firstname>Godfrey</firstname>
  <surname>Vesey</surname>
</author>
irb(main):007:0> puts root.elements["biblioentry[1]/author"]
<author>
  <firstname>Godfrey</firstname>
  <surname>Vesey</surname>
</author>
irb(main):008:0> puts root.elements["biblioentry[@id='FHIW13C-1260']"]
<biblioentry id='FHIW13C-1260'>
      <author>
        <firstname>Sydney</firstname>
        <surname>Shoemaker</surname>
      </author>
      <author>
        <firstname>Richard</firstname>
        <surname>Swinburne</surname>
      </author>
      <title>Personal Identity</title>
      <publisher>
        <publishername>Basil Blackwell</publishername>
      </publisher>
      <pubdate>1984</pubdate>
    </biblioentry>
=> nil
irb(main):009:0> root.each_element('//author') {|author| puts author}
<author>
  <firstname>Godfrey</firstname>
  <surname>Vesey</surname>
</author>
<author>
  <firstname>René</firstname>
  <surname>Marres</surname>
</author>
<author>
  <firstname>James</firstname>
  <surname>Baillie</surname>
</author>
<author>
  <firstname>Brian</firstname>
  <surname>Garrett</surname>
</author>
<author>
  <firstname>John</firstname>
  <surname>Perry</surname>
</author>
<author>
  <firstname>Geoffrey</firstname>
  <surname>Madell</surname>
</author>
<author>
  <firstname>Sydney</firstname>
  <surname>Shoemaker</surname>
</author>
<author>
  <firstname>Richard</firstname>
  <surname>Swinburne</surname>
</author>
<author>
  <firstname>Jonathan</firstname>
  <surname>Glover</surname>
</author>
<author>
  <firstname>Harold</firstname>
  <othername>W.</othername>
  <surname>Noonan</surname>
</author>
=> [<author> ... </>, <author> ... 
  </>, <author> ... </>, <author> ... 
  </>, <author> ... </>, <author> ... 
  </>, <author> ... </>, <author> ... 
  </>, <author> ... </>, <author> ... </>]

First we use the name root to access the document root. The document root is here the bibliography element. Each Element object has an Attributes object named attributes which acts as a hash map with the names of the attributes as keys and the attribute values as values. So with root.attributes['id'] we get the value of the attribute id of the root element. In the same manner, each Element object has an Elements object named elements, with each and [] methods to get access to the subelements. The [] method takes an index or XPath expression as its argument and returns the child elements which match the expression. The XPath expression acts like a filter, deciding which elements will be returned. Note that root.elements[1] is the first child element, because XPath indexes start from 1, not from 0. Actually, root.elements[1] equals root.elements[*[1]], where *[1] is the XPath expression for the first child. The method each of the class Elements iterates through all the child elements, optionally filtering them by a given XPath expression. The code block will be executed then for each iteration. In addition, the method Element.each_element is a shorthand notation for Element.elements.each.

Pages: 1, 2, 3, 4, 5

Next Pagearrow