XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XSH, An XML Editing Shell

XSH, An XML Editing Shell

July 10, 2002

Introduction

A few months ago we briefly examined some of the command line utilities available to users of Perl and XML. This month we will continue in that vein by looking at the 300-pound gorilla of Perl/XML command line tools, Petr Pajas' intriguing XML::XSH.

XML::XSH and the xsh executable provide a rich shell environment which makes performing common XML-related tasks as terse and straightforward as using a UNIX shells like bash or csh. Yes, that's right -- an XML editing shell. As we will see, it's not as crazy as it seems.

xsh Basics

Before we look at xsh's advanced tricks, let's get familiar with the environment it provides. We'll begin by starting the xsh shell:

[user@host user] xsh -i
-----------------------------------------------------
 xsh - XML Editing Shell version 0.9 (Revision: 1.6)
-----------------------------------------------------
...
xsh scratch:/>

The xsh shell starts in interactive mode, creating a new default scratch pad document, called new_document.xml with the ID scratch. The shell prompt takes the form of the current document's ID (scratch, in this case), followed by a colon, and then the current working context within that document expressed as an XPath location (/, in this case). In other words, we can tell from the prompt that we are at the root (/) level of the current XML document, whose ID is scratch.

We can open an existing XML document from the file system in order to figure out how to navigate within and between documents:

xsh scratch:/> open cams=files/camelids.xml
parsing files/camelids.xml
done.
xsh cams:/>

The open command opens the document camelids.xml from the directory files in the same directory in which we started the xsh shell, assigns it the ID of cams, and changes the working context to the root (/) of that document.

To list the elements contained in the current context we use the ls command.

xsh cams:/> ls
<?xml version="1.0" encoding="iso-8859-1"?>
<camelids>...</camelids>

Found 1 node(s).
xsh cams:/>

Also in Perl and XML

OSCON 2002 Perl and XML Review

PDF Presentations Using AxPoint

Multi-Interface Web Services Made Easy

Perl and XML on the Command Line

Introducing XML::SAX::Machines, Part Two

Since the current context is the abstract root of the document, we see the XML declaration and the sole top-level <camelids> element. If our document contained processing instructions or a Document Type Definition between the XML declaration and the top-level element, they would appear here, too.

Right through here is where is where things get interesting. Just like its UNIX shell cousins, many of xsh's commands accept paths as arguments, specifying the context in which that command is evaluated. The difference is that in xsh those paths are XPath expressions which provide access to the contents of the open XML documents, rather than file system paths that provide an interface to the files and directories of the mounted volumes.

So, for example, if we wanted list all of the <habitat> elements in our camelids document, we need only supply the appropriate XPath expression to the ls command:

xsh cams:/> ls //habitat

This yields:

<habitat>
  Bactrian camels' habitat consists mainly of Asia's deserts.
  The temperature ranges from -29 degrees Celsius in
  the winter to 38 degrees Celsius in the summer.
</habitat>
<habitat>
  Dromedary camels prefer desert conditions characterized by
  a long dry season and a short rainy season.
  Introduction of the dromedary into other climates has
  proven unsuccessful as the camel is sensitive to the
  cold and humidity (Nowak 1991).
</habitat>
<habitat>
  Llamas are found in deserts, mountainous areas, and
  grasslands.
</habitat>
<habitat>
  Guanacos inhabit grasslands and shrublands from sea
  level to 4,000m. Occasionally they winter in forests.
</habitat>
<habitat>
  Vicunas are found in semiarid rolling grasslands and
  plains at altitudes of 3,500-5,750 meters. These lands
  are covered with short and tough vegetation.  Due to
  their daily water demands, vicunas live in areas where
  water is readily accessible. Climate in the habitat is
  usually dry and cold. Nowak (1991), Grizmek (1990).
</habitat>

Found 5 node(s).
xsh cams:/>

Or, if we want our query to be more specific, we can use predicate expressions in our XPath statement. For example,

xsh cams:/> ls //habitat[ancestor::species/@name='Lama guanicoe']

to select just the Guanaco's habitat element.

Similarly, we can change the command evaluation context within the current document by giving an XPath expression to the cd command:

xsh cams:/> cd //species[@name='Camelus dromedarius']/natural-history
xsh cams:/camelids/species[2]/natural-history>

Which causes the context location in our shell prompt to change to reflect the new context to which we have navigated. Thus, commands not explicitly passed an absolute location path will be evaluated in the context of the <natural-history> element contained in the document's second <species> element (the one whose name attribute is equal to "Camelus dromedarius"). Thus, if we give the ls commadn with no path specified, we'll see the contents of the new context:

xsh cams:/camelids/species[2]/natural-history> ls
<natural-history>
       <food-habits>...</food-habits>
       <reproduction>...</reproduction>
       <behavior>...</behavior>
       <habitat>...</habitat>
</natural-history>

Found 1 node(s).
xsh cams:/camelids/species[2]/natural-history>

In addition, xsh provides a way to execute commands on any currently open document without changing the element context by prepending that document's ID and a colon to the XPath expression:

xsh cams:/camelids/species[2]/natural-history> cd /
xsh cams:/> open xmlnews=http://www.xml.com/xml/news.rss
parsing http://www.xml.com/xml/news.rss
done.
xsh xmlnews:/>
xsh xmlnews:/> ls cams:/camelids/species[3]/common-name
<common-name>Llama</common-name>

Found 1 node(s).
xsh xmlnews:/>

Notice that the context changed to the root of the newly opened RSS document once it is parsed into memory, but we still have easy access to the data contained in the camelids document by adding that document's ID (cams) and a colon to the front of the path.

Also note that the location of the file passed to the open command is not limited to files on the local machine; it can also be an HTTP or FTP URL, so long as a well-formed XML document is returned.

To see a list of all the currently open documents and their associated IDs, use the files command:

xsh xmlnews:/> files
cams = files/camelids.xml
xmlnews = http://www.xml.com/xml/news.rss
xsh xmlnews:/>

Closing an open document is as easy as passing its ID to the close command.

xsh xmlnews:/> close xmlnews
closing file http://www.xml.com/xml/news.rss
xsh :>

If we wanted to save a local copy of the xmlnews document before closing, we would use the saveas command:.

xsh xmlnews:/> saveas xmlnews files/xmldotcom_news.rss
xmlnews=new_document1.xml --> files/xmldotcom_news.rss (utf-8)
saved xmlnews=files/xmldotcom_news.rss as files/xmldotcom_news.rss 
in utf-8 encoding
xsh :>

We've now reviewed xsh basics: we can start the shell, open, close, and navigate through contents of XML documents. If this is all there was to xsh, it would still be a winner as an XPath testbed and teaching tool (making it quite useful to users of XSLT and XPathScript, as well as XML::LibXML and the other Perl modules which offer an XPath interface). But xsh bills itself as an XML editing shell, and as we will see, it's that and a fair bit more.

Pages: 1, 2

Next Pagearrow