![]() |
|
![]() |
![]() ![]() ![]() ![]() | |
Home | Resources | Buyer's Guide | FAQs | Newsletter | Tech Jobs | |
![]() |
![]() |
![]() ![]() Annotated XML What is XML? What is XSLT? What is XLink? What is XML Schema? What is RDF? What are Web Services?
Manage Your Account ![]()
|
![]() |
![]() Perl XML Quickstart: The Perl XML Interfacesby Kip HamptonApril 18, 2001 IntroductionThe XML modules available from CPAN can be divided into three main categories: modules that provide unique interfaces to XML data (usually concerned with translating data between an XML instance and Perl data structures), modules that implement one of the standard XML APIs, and special-purpose modules that seek to simplify the execution of some specific XML-related task. This month we will be looking the first of these, the Perl-specific XML interfaces. use Disclaimer qw(:standard);This is not an exercise in comparative performance benchmarking, nor is it my intention to suggest that any one module is inherently more useful than another. Choosing the right XML module for your project depends largely upon the nature of the project and your past experience. Different interfaces lend themselves to different kinds of tasks and to different kinds of people. My only goal is to offer working examples of the various interfaces by defining two simple tasks, and then showing how to achieve the same net result using each of the selected modules. The TasksWhile the uses for XML are rich and varied, most XML-related tasks can be divided into two groups: those related to extracting data from existing XML documents, and those related to creating a new XML documents using data from other sources. With this in mind, the examples that we will use for our module introductions will consist of extracting a specific set data from an XML file, and and marking up a Perl data structure in a specific XML format. Task One: Extracting InformationFirst, consider the following XML fragment: <?xml version="1.0"?> <camelids> <species name="Camelus dromedarius"> <common-name>Dromedary, or Arabian Camel</common-name> <physical-characteristics> <mass>300 to 690 kg.</mass> <appearance> The dromedary camel is characterized by a long-curved neck, deep-narrow chest, and a single hump. ... </appearance> </physical-characteristics> <natural-history> <food-habits> The dromedary camel is an herbivore. ... </food-habits> <reproduction> The dromedary camel has a lifespan of about 40-50 years ... </reproduction> <behavior> With the exception of rutting males, dromedaries show very little aggressive behavior. ... </behavior> <habitat> The camels prefer desert conditions characterized by a long dry season and a short rainy season. ... </habitat> </natural-history> <conservation status="no special status"> <detail> Since the dromedary camel is domesticated, the camel has no special status in conservation. </detail> </conservation> </species> ... </camelids> Now let's say that the complete document (available with this
month's sample code) contains the same information for all the members
of Camelidae family, not just our friend the single-humped
Dromedary Camel. To illustrate how each module might be used to
extract a subset of the data stored in this document, we will write a
tiny script that parses the camelids.xml document and, for
each species found, prints a line to Bactrian Camel (Camelus bactrianus) endangered Dromedary, or Arabian Camel (Camelus dromedarius) no special status Llama (Lama glama) no special status Guanaco (Lama guanicoe) special concern Vicuna (Vicugna vicugna) endangered Task Two: Creating An XML DocumentTo demonstrate how each of the selected modules may be used to create XML documents from other data sources, we will write a small script that marks up a simple Perl hash containing URLs to a few cool camelid-related pages on the Web as a simple XHTML document. Here's the hash: my %camelid_links = ( one => { url => ' http://www.online.discovery.com/news/picture/may99/photo20.html', description => 'Bactrian Camel in front of Great ' . 'Pyramids in Giza, Egypt.'}, two => { url => 'http://www.fotos-online.de/english/m/09/9532.htm', description => 'Dromedary Camel illustrates the ' . 'importance of accessorizing.'}, three => { url => 'http://www.eskimo.com/~wallama/funny.htm', description => 'Charlie - biography of a narcissistic llama.'}, four => { url => 'http://arrow.colorado.edu/travels/other/turkey.html', description => 'A visual metaphor for the perl5-porters ' . 'list?'}, five => { url => 'http://www.galaonline.org/pics.htm', description => 'Many cool alpacas.'}, six => { url => 'http://www.thpf.de/suedamerikareise/galerie/vicunas.htm', description => 'Wild Vicunas in a scenic landscape.'} ); And here is an example of the document that we hope to create from that hash: <?xml version="1.0"> <html> <body> <a href="http://www.eskimo.com/~wallama/funny.htm">Charlie - biography of a narcissistic llama.</a> <a href="http://www.online.discovery.com/news/picture/may99/photo20.html">Bactrian Camel in front of Great Pyramids in Giza, Egypt.</a> <a href="http://www.fotos-online.de/english/m/09/9532.htm">Dromedary Camel illustrates the importance of accessorizing.</a> <a href="http://www.galaonline.org/pics.htm">Many cool alpacas.</a> <a href="http://arrow.colorado.edu/travels/other/turkey.html">A visual metaphor for the perl5-porters list?</a> <a href="http://www.thpf.de/suedamerikareise/galerie/vicunas.htm">Wild Vicunas in a scenic landscape.</a> </body> </html> It's important to note that while the resulting XML is indented for readability (as shown above), this sort of fine-grained whitespace handling is not part of our sample requirement. All we care about is that the resulting document is well-formed XML, and that it accurately reflects the data stored in our hash. With our tasks defined, let's get straight to the code samples. Samples of the Perl-specific XML InterfacesXML::SimpleOriginally created to simplify the task of reading and writing config files
in an XML format, Readinguse XML::Simple; my $file = 'files/camelids.xml'; my $xs1 = XML::Simple->new(); my $doc = $xs1->XMLin($file); foreach my $key (keys (%{$doc->{species}})){ print $doc->{species}->{$key}->{'common-name'} . ' (' . $key . ') '; print $doc->{species}->{$key}->{conservation}->final . "\n"; } Writinguse XML::Simple; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $xsimple = XML::Simple->new(); print $xsimple->XMLout(\%camelid_links, noattr => 1, xmldecl => '<?xml version="1.0">'); Note that the requirements of the data-to-document task reveals one
of The following illustrates how to use use XML::Writer; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $writer = XML::Writer->new(); $writer->xmlDecl(); $writer->startTag('html'); $writer->startTag('body'); foreach my $item ( keys (%camelid_links) ) { $writer->startTag('a', 'href' => $camelid_links{$item}->{url}); $writer->characters($camelid_links{$item}->{description}); $writer->endTag('a'); } $writer->endTag('body'); $writer->endTag('html'); $writer->end(); XML::SimpleObject
Readinguse XML::Parser; use XML::SimpleObject; my $file = 'files/camelids.xml'; my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parsefile($file) ); foreach my $species ($xso->child('camelids')->children('species')) { print $species->child('common-name')->{VALUE}; print ' (' . $species->attribute('name') . ') '; print $species->child('conservation')->attribute('status'); print "\n"; } Writing
XML::TreeBuilderThe Readinguse XML::TreeBuilder; my $file = 'files/camelids.xml'; my $tree = XML::TreeBuilder->new(); $tree->parse_file($file); foreach my $species ($tree->find_by_tag_name('species')){ print $species->find_by_tag_name('common-name')->as_text; print ' (' . $species->attr_get_i('name') . ') '; print $species->find_by_tag_name('conservation')->attr_get_i('status'); print "\n"; } Writinguse XML::Element; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $root = XML::Element->new('html'); my $body = XML::Element->new('body'); my $xml_pi = XML::Element->new('~pi', text => 'xml version="1.0"'); $root->push_content($body); foreach my $item ( keys (%camelid_links) ) { my $link = XML::Element->new('a', 'href' => $camelid_links{$item}->{url}); $link->push_content($camelid_links{$item}->{description}); $body->push_content($link); } print $xml_pi->as_XML; print $root->as_XML(); XML::Twig
Readinguse XML::Twig; my $file = 'files/camelids.xml'; my $twig = XML::Twig->new(); $twig->parsefile($file); my $root = $twig->root; foreach my $species ($root->children('species')){ print $species->first_child_text('common-name'); print ' (' . $species->att('name') . ') '; print $species->first_child('conservation')->att('status'); print "\n"; } Writinguse XML::Twig; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $root = XML::Twig::Elt->new('html'); my $body = XML::Twig::Elt->new('body'); $body->paste($root); foreach my $item ( keys (%camelid_links) ) { my $link = XML::Twig::Elt->new('a'); $link->set_att('href', $camelid_links{$item}->{url}); $link->set_text($camelid_links{$item}->{description}); $link->paste('last_child', $body); } print qq|<?xml version="1.0"?>|; $root->print; These examples have illustrated the basic usage for the more
generic Perl XML modules. My goal has been to give just enough example
code to give you a feel for what it is like to work with each of these
modules. Next month we will look at those Perl modules that implement
one of the standard XML interfaces; specifically,
Resources
![]() |
![]() |
![]() ![]() ![]() |
![]() |
|
Contact Us | Our Mission | Privacy Policy | Advertise With Us | Site Help | |
Copyright © 2001 O'Reilly & Associates, Inc. |