XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 Perl and XML on the Command Line
Subject: Screen Scraping!
Date: 2002-04-23 06:04:52
From: Matt Sergeant

The perl XML tools are very cool for screen scraping. Here's something that gets you the top-ten viruses from our virus-eye service:


lwp-request http://www.messagelabs.com/VirusEye/ | perl -pe 's/--->/-->/' | xmllint --html --format - | xpath '/html/body/table[2]/tr[1]/td[1]/table[1]/tr[1]/td[2]/table[1]/tr[3]/td[2]/table[3]/tr[1]/td[1]/table[1]/tr[2]/td[1]/table[1]/tr[1]/td[2]/table[1]/tr/td[3]/a/text()'


(note that normally you wouldn't need the perl -pe stuff to fix up the broken HTML assuming it parses cleanly - I think this may be a bug in libxml2)


Previous Message Previous Message   Next Message Next Message

Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938