
XSH, An XML Editing Shell
Introduction
A few months ago we briefly examined some of the command line
utilities available to users of Perl and XML. This month we will continue
in that vein by looking at the 300-pound gorilla of Perl/XML command line
tools, Petr Pajas' intriguing XML::XSH.
XML::XSH and the xsh executable provide a rich
shell environment which makes performing common XML-related tasks as terse
and straightforward as using a UNIX shells like bash or
csh. Yes, that's right -- an XML editing shell. As we will see,
it's not as crazy as it seems.
xsh Basics
Before we look at xsh's advanced tricks, let's get familiar
with the environment it provides. We'll begin by starting the
xsh shell:
[user@host user] xsh -i
-----------------------------------------------------
xsh - XML Editing Shell version 0.9 (Revision: 1.6)
-----------------------------------------------------
...
xsh scratch:/>
The xsh shell starts in interactive mode, creating a new
default scratch pad document, called new_document.xml with the
ID scratch. The shell prompt takes the form of the current
document's ID (scratch, in this case), followed by a colon, and then the
current working context within that document expressed as an XPath location
(/, in this case). In other words, we can tell from the prompt that we are at
the root (/) level of the current XML document, whose ID is
scratch.
We can open an existing XML document from the file system in order to figure out how to navigate within and between documents:
xsh scratch:/> open cams=files/camelids.xml
parsing files/camelids.xml
done.
xsh cams:/>
The open command opens the document camelids.xml
from the directory files in the same directory in which we
started the xsh shell, assigns it the ID of cams,
and changes the working context to the root (/) of that document.
To list the elements contained in the current context we use the
ls command.
xsh cams:/> ls
<?xml version="1.0" encoding="iso-8859-1"?>
<camelids>...</camelids>
Found 1 node(s).
xsh cams:/>
Also in Perl and XML |
|
OSCON 2002 Perl and XML Review PDF Presentations Using AxPoint Multi-Interface Web Services Made Easy |
Since the current context is the abstract root of the document, we see the
XML declaration and the sole top-level <camelids> element.
If our document contained processing instructions or a Document Type
Definition between the XML declaration and the top-level element, they would
appear here, too.
Right through here is where is where things get interesting. Just like its
UNIX shell cousins, many of xsh's commands accept paths as
arguments, specifying the context in which that command is evaluated. The
difference is that in xsh those paths are XPath expressions
which provide access to the contents of the open XML documents, rather than
file system paths that provide an interface to the files and directories of
the mounted volumes.
So, for example, if we wanted list all of the <habitat>
elements in our camelids document, we need only supply the appropriate XPath
expression to the ls command:
xsh cams:/> ls //habitat
This yields:
<habitat>
Bactrian camels' habitat consists mainly of Asia's deserts.
The temperature ranges from -29 degrees Celsius in
the winter to 38 degrees Celsius in the summer.
</habitat>
<habitat>
Dromedary camels prefer desert conditions characterized by
a long dry season and a short rainy season.
Introduction of the dromedary into other climates has
proven unsuccessful as the camel is sensitive to the
cold and humidity (Nowak 1991).
</habitat>
<habitat>
Llamas are found in deserts, mountainous areas, and
grasslands.
</habitat>
<habitat>
Guanacos inhabit grasslands and shrublands from sea
level to 4,000m. Occasionally they winter in forests.
</habitat>
<habitat>
Vicunas are found in semiarid rolling grasslands and
plains at altitudes of 3,500-5,750 meters. These lands
are covered with short and tough vegetation. Due to
their daily water demands, vicunas live in areas where
water is readily accessible. Climate in the habitat is
usually dry and cold. Nowak (1991), Grizmek (1990).
</habitat>
Found 5 node(s).
xsh cams:/>
Or, if we want our query to be more specific, we can use predicate expressions in our XPath statement. For example,
xsh cams:/> ls //habitat[ancestor::species/@name='Lama guanicoe']
to select just the Guanaco's habitat element.
Similarly, we can change the command evaluation context within the current
document by giving an XPath expression to the cd command:
xsh cams:/> cd //species[@name='Camelus dromedarius']/natural-history
xsh cams:/camelids/species[2]/natural-history>
Which causes the context location in our shell prompt to change to reflect
the new context to which we have navigated. Thus, commands not explicitly
passed an absolute location path will be evaluated in the context of the
<natural-history> element contained in the document's
second <species> element (the one whose name
attribute is equal to "Camelus dromedarius"). Thus, if we give the
ls commadn with no path specified, we'll see the contents of the
new context:
xsh cams:/camelids/species[2]/natural-history> ls
<natural-history>
<food-habits>...</food-habits>
<reproduction>...</reproduction>
<behavior>...</behavior>
<habitat>...</habitat>
</natural-history>
Found 1 node(s).
xsh cams:/camelids/species[2]/natural-history>
In addition, xsh provides a way to execute commands on any
currently open document without changing the element context by prepending
that document's ID and a colon to the XPath expression:
xsh cams:/camelids/species[2]/natural-history> cd /
xsh cams:/> open xmlnews=http://www.xml.com/xml/news.rss
parsing http://www.xml.com/xml/news.rss
done.
xsh xmlnews:/>
xsh xmlnews:/> ls cams:/camelids/species[3]/common-name
<common-name>Llama</common-name>
Found 1 node(s).
xsh xmlnews:/>
Notice that the context changed to the root of the newly opened RSS
document once it is parsed into memory, but we still have easy access to the
data contained in the camelids document by adding that document's ID
(cams) and a colon to the front of the path.
Also note that the location of the file passed to the open
command is not limited to files on the local machine; it can also be an HTTP
or FTP URL, so long as a well-formed XML document is returned.
To see a list of all the currently open documents and their associated
IDs, use the files command:
xsh xmlnews:/> files
cams = files/camelids.xml
xmlnews = http://www.xml.com/xml/news.rss
xsh xmlnews:/>
Closing an open document is as easy as passing its ID to the
close command.
xsh xmlnews:/> close xmlnews
closing file http://www.xml.com/xml/news.rss
xsh :>
If we wanted to save a local copy of the xmlnews document
before closing, we would use the saveas command:.
xsh xmlnews:/> saveas xmlnews files/xmldotcom_news.rss
xmlnews=new_document1.xml --> files/xmldotcom_news.rss (utf-8)
saved xmlnews=files/xmldotcom_news.rss as files/xmldotcom_news.rss
in utf-8 encoding
xsh :>
We've now reviewed xsh basics: we can start the shell, open,
close, and navigate through contents of XML documents. If this is all there
was to xsh, it would still be a winner as an XPath testbed and
teaching tool (making it quite useful to users of XSLT and XPathScript, as
well as XML::LibXML and the other Perl modules which offer an
XPath interface). But xsh bills itself as an XML
editing shell, and as we will see, it's that and a fair bit more.
Pages: 1, 2 |