
Perl and XML on the Command Line
by Kip HamptonApril 17, 2002
Introduction
The truth is that putting Perl's XML processing facilities to work is no harder than using any other part of Perl; and if the applications that feature Perl/XML in a visible way are complex, it is because the problems that those applications are designed to solve are complex. To drive this point home, this month we will get back to our Perlish roots by examining how Perl can be used on the command line to perform a range of common XML tasks.
For our first few examples we will focus on those modules that ship with command line tools as part of their distributions.
XML::XPath and the xpath Utility
Requires: XML::Path, XML::Parser
Matt Sergeant's fine XML::XPath module provides a way
access the contents of XML documents using the W3C-recommended XPath Language. This module also installs a Perl
utility called xpath, which allows XPath expressions to
be used to examine the contents of XML documents. The XML document
can be specified either by passing in a path to the file as the
first argument or by piping the document via STDIN.
Examples:
Find all section titles in a DocBook XML:
xpath mybook.xml //section/title
The same command using a pipe:
cat files/mybook.xml | xpath //section/title
Retrieve just the significant text (not including nodes containing all-whitespace) from a given document:
xpath somefile.xml "//text()[string-length(normalize-space(.)) > 0 ]"
DBIx::XML_RDB and the sql2xml Utility
Requires DBIx::XML_RDB, DBI
Fans of Matt's popular DBIx::XML_RDB module will be
pleased to know that it too ships with a command line tool,
sql2xml, that returns an entire database table as a
single XML document.
Examples:
Save the data stored in the 'users' table as the file users.xml:
sql2xml.pl -sn myserver -driver Oracle -uid user -pwd seekrit -table user -output users.xml
Or, to send data to STDOUT,
sql2xml.pl -sn myserver -driver Oracle -uid user -pwd seekrit -table user -output -
XML::Handler::YAWriter and the xmlpretty Utility
Requires: XML::Handler::YAWriter, XML::Parser::PerlSAX
No matter how carefully XML document are edited, they often need
reformatting to be reasonably called "human-readable". Michael
Koehne's XML::Handler::YAWriter SAX Handler installs an
XML pretty-printer called xmlpretty which reduces this
task to a quick one-liner.
Examples:
Also in Perl and XML |
|
OSCON 2002 Perl and XML Review PDF Presentations Using AxPoint |
Passing a file name:
xmlpretty overwrought.xml > new.xml
Reading from STDIN:
cat overwrought.xml | xmlpretty > new.xml
XML::SemanticDiff and the xmlsemdiff Utility
Requires: XML::SemanticDiff, XML::Parser
Unfortunately, standard command line text-processing tools like
diff often fall short when dealing with XML documents. My
XML::SemanticDiff was designed to make comparing the
relevant parts of two XML documents (while ignoring things like extra
whitespace, or having the same namespace URI bound to different
prefixes) easy and straightforward. Newer versions of this module
install the xmlsemdiff tool, which allows simple access
from the shell.
|
Related Reading
|
Example:
Print the semantic differences between two XML documents to STDOUT
xmlsemdiff file1.xml file2.xml
XML::Xerces
The Apache Software Foundation's Xerces-Perl project offers a Perl
interface to the Xerces C++ XML parser. Xerces-Perl ships with several
sample scripts that can be copied into your favorite bin
directory. The most notable difference between Xerces and the other
XML parsers available to Perl is that it provides a way to validate
XML documents against W3C XML Schemas.
Example:
Calculate the time needed to process an XML document while validating it against an XML Schema:
DOMCount.pl -v=auto -s mydoc.xml
A Visitor From Planet C -- xmllint
Developers using XML::LibXML often aren't aware of
the feature-rich command line XML processing tool,
xmllint, which is installed with the C libraries that
XML::LibXML depends upon. No, xmllint is not
a Perl tool, but its many features, and the fact that it can be easily
piped together with other tools, makes it more than worthy of mention
here.
Examples:
Use the built-in HTML parser to convert ill-formed HTML to XML before further processing:
xmllint --html khampton_perl_xml_17.html | xpath "//a[@href]"
Or the same thing, but using the DocBook SGML parser:
xmllint --sgml ye-olde.sgml | xpath "//chapter[@id='chapt4']"
Using xmllint as a pretty-printer:
cat some.xml | xmllint --format
Using xmllint to validate a document against an external DTD:
cat some.xml | xmllint --postvalid --dtdvalid my.dtd
Devel::TraceSAX and XML::SAX::Machines
Requires: Devel::TraceSAX, XML::SAX, XML::SAX::Machines
While the syntax may be a bit verbose, it is entirely possible to
use XML::SAX::Machines to bring the power of Perl SAX2 to
the command line.
Examples:
Using XML::SAX::Machines to produce an XML document to
STDOUT after applying a SAX filter:
perl -MXML::SAX::Machines=Pipeline -e
'Pipeline("XML::MyFilter", \*STDOUT)->parse_uri("files/camelids.xml");'
Or, reading from STDIN,
cat files/camelids.xml | perl -MXML::SAX::Machines=Pipeline -e
'Pipeline("XML::MyFilter", \*STDOUT)->parse_string(join "", <STDIN>);'
It is often very helpful when writing custom SAX Filters to be able
to examine what events are being generated and forwarded to which
classes. Barrie Slaymaker's Devel::TraceSAX makes this
painless.
Debugging SAX events by tracing them through multiple filters:
perl -d:TraceSAX -MXML::SAX::Machines=Pipeline -e
'Pipeline("XML::Filter1", "XML::Filter2")->parse_uri("file.xml");'
Conclusions
Processing XML with Perl does not have to mean buying into a huge XML-centric application with a steep learning curve or departing from Perl's long history as a command line tool. You may not use all of the tools or techniques described here, but it is nice to know that they are available when and if you need them.
Resources
Do you use Perl XML tools on the command line? Share your experience in our forums.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- xml to txt or xml to mdb
2002-12-26 10:24:09 Zoran Trifunovic [Reply]
Hi
Do anybody have a source code for PC(visual basic,delphi..) for file convert XML to TXT,XML to ACCESS.
Zoran Trifunovic
SYSTEM PROGRAMMER
PTT
BELGRADE
zokit@ptt.yu
- xml to txt or xml to mdb
2004-11-11 05:34:54 jeephero [Reply]
- xml to txt or xml to mdb
2004-11-11 05:33:01 jeephero [Reply]
i want xml to mdb
- xml to txt or xml to mdb
- And don't forget PYX
2002-04-26 03:37:55 Grant McLean [Reply]
Thanks Kip, I certainly picked up few handy tips
from this article.
Matt's XML::PYX is another module that provides
command line functionality. For example this one-liner prints statistics on how many times each element type occurs in a document:
pyx doc.xml | sed -n 's/^(//p' | sort | uniq -c
- Screen Scraping!
2002-04-23 06:04:52 Matt Sergeant [Reply]
The perl XML tools are very cool for screen scraping. Here's something that gets you the top-ten viruses from our virus-eye service:
lwp-request http://www.messagelabs.com/VirusEye/ | perl -pe 's/--->/-->/' | xmllint --html --format - | xpath '/html/body/table[2]/tr[1]/td[1]/table[1]/tr[1]/td[2]/table[1]/tr[3]/td[2]/table[3]/tr[1]/td[1]/table[1]/tr[2]/td[1]/table[1]/tr[1]/td[2]/table[1]/tr/td[3]/a/text()'
(note that normally you wouldn't need the perl -pe stuff to fix up the broken HTML assuming it parses cleanly - I think this may be a bug in libxml2)
- Thanks
2002-04-18 07:24:40 ron reidy [Reply]
Thanks for this article. I have been looking for this type of functionality for a while and thought I would have to roll my own. Now I can return to being lazy!
- Thanks
2003-09-14 20:43:42 Mikhail Grushinskiy [Reply]
You might also take a look at this one (XmlStarlet Command Line XML Toolkit)
http://xmlstar.sourceforge.net/
The toolkit's feature set includes options to:
Check or validate XML files (simple well-formedness check, DTD, XSD, RelaxNG)
Calculate values of XPath expressions on XML files (such as running sums, etc)
Search XML files for matches to given XPath expressions
Apply XSLT stylesheets to XML documents (including EXSLT support, and passing parameters to stylesheets)
Query XML documents (ex. query for value of some elements of attributes, sorting, etc)
Modify or edit XML documents (ex. delete some elements)
Format or "beautify" XML documents (as changing indentation, etc)
Fetch XML documents using http:// or ftp:// URLs
Browse tree structure of XML documents (in similar way to 'ls' command for directories)
Include one XML document into another using XInclude
XML c14n canonicalization
Escape/unescape special XML characters in input text
Print directory as XML document
Convert XML into PYX format (based on ESIS - ISO 8879)
- xml starlet
2006-09-25 11:56:16 daczkowski [Reply]
I'm new at xmlstarlet. I'm running under windows XP.
I used the following line to edit the value of @filename attribute.
xml ed -u "/KLAMATH/ENCODESESSION/MEDIAOUTPUT/FILEMEDIA/@filepath" -v c:/wd_tel2 eos3dr1_device0_drc1500.prj
The output indicates that the attribute value has changed. However the original file did not change. How do I save my changes that I made via the xml ed commmand?
Can you help or direct me to some info?
Thanks
- xml starlet
- Thanks

