XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

REXML: Processing XML in Ruby
by Koen Vervloesem | Pages: 1, 2, 3, 4, 5

Deleting Elements and Attributes

The add_element and add_attribute methods have their counterparts for deleting elements and attributes, respectively. This is how it goes with attributes:

irb(main):033:0> doc2.root.delete_attribute('id')
=> <bibliography> ... </>
irb(main):034:0> puts doc2
<bibliography>
  <biblioentry id='ISBN0-19-285423-2'>
    <author>
      <firstname>Bertrand</firstname>
      <surname>Russell</surname>
    </author>
    <title>The Problems of Philosophy</title>
    <publisher>
      <publishername>Oxford University Press</publishername>
    </publisher>
    <pubdate>1912</pubdate>
  </biblioentry>
</bibliography>
=> nil

The delete_attribute method returns the removed attribute.

The delete_element method can take an Element object, a string or an index as its argument:

irb(main):034:0> doc2.delete_element("//publisher")
=> <publisher> ... </>
irb(main):035:0> puts doc2
<bibliography>
  <biblioentry id='ISBN0-19-285423-2'>
    <author>
      <firstname>Bertrand</firstname>
      <surname>Russell</surname>
    </author>
    <title>The Problems of Philosophy</title>
    <pubdate>1912</pubdate>
  </biblioentry>
</bibliography>
=> nil
irb(main):036:0> doc2.root.delete_element(1)
=> <biblioentry id='ISBN0-19-285423-2'> ... </>
irb(main):037:0> puts doc2
<bibliography/>
=> nil

The first delete_element invocation in our example uses an XPath expression to locate the element that has to be deleted. The second time we use the index 1, meaning the first element in the document root will be deleted. The delete_element method returns the removed element.

Text Nodes and Entity Processing

We already used text nodes in the previous examples. In this section we will show some more advanced stuff with text nodes. Especially, how does REXML handle entities? REXML is a non-validating parser, and therefore is not required to expand external entities. So external entities aren't replaced by their values, but internal entities are: when REXML parses an XML document, it processes the DTD and creates a table of the internal entities with their values. When one of these entities occurs in the document, REXML replaces it with its value. An example:

irb(main):038:0> doc3 = Document.new('<!DOCTYPE testentity [
irb(main):039:1' <!ENTITY entity "test">]>
irb(main):040:1' <testentity>&entity; the entity</testentity>')
=> <UNDEFINED> ... </>
irb(main):041:0> puts doc3
<!DOCTYPE testentity [
<!ENTITY entity "test">]>
<testentity>&entity; the entity</testentity>
=> nil
irb(main):042:0> doc3.root.text
=> "test the entity"

You see that the XML document, when printed, correctly contains the entity. When you access the text, the entity &entity; gets expanded correctly to "test".

However, REXML uses lazy evaluation of the entities. As a result, the following problem occurs:

irb(main):043:0> doc3.root.text = "test the &entity;"
=> "test the &entity;"
irb(main):044:0> puts doc3
<!DOCTYPE testentity [
<!ENTITY entity "test">
]>
<testentity>&entity; the &entity;</testentity>
=> nil
irb(main):045:0> doc3.root.text                      
=> "test the test"

As you see, the text "test the &entity;" is changed to "&entity; the &entity;". If you change the value of the entity, it will give a different result than you expect: more will be changed in the document than you want. If this is problematic for your application, you can set the :raw flag on any Text or Element node, even on the Document node. The entities in that node won't be processed, so you have to deal with entities yourself. An example:

irb(main):046:0> doc3 = Document.new('<!DOCTYPE testentity [
irb(main):047:1' <!ENTITY entity "test">]>
irb(main):048:1' <testentity>test the &entity;</testentity>', 
                 {:raw => :all})
=> <UNDEFINED> ... </>
irb(main):049:0> puts doc3
<!DOCTYPE testentity [
<!ENTITY entity "test">
]>
<testentity>test the &entity;</testentity>
=> nil
irb(main):050:0> doc3.root.text
=> "test the test"

The entities for &, <, >, ", and ' are processed automatically. Moreover, if you write one of these characters in a Text node or attribute, REXML converts them to their entity equivalent, e.g. &amp; for &.

Pages: 1, 2, 3, 4, 5

Next Pagearrow