|
The perl XML tools are very cool for screen scraping. Here's something that gets you the top-ten viruses from our virus-eye service:
lwp-request http://www.messagelabs.com/VirusEye/ | perl -pe 's/--->/-->/' | xmllint --html --format - | xpath '/html/body/table[2]/tr[1]/td[1]/table[1]/tr[1]/td[2]/table[1]/tr[3]/td[2]/table[3]/tr[1]/td[1]/table[1]/tr[2]/td[1]/table[1]/tr[1]/td[2]/table[1]/tr/td[3]/a/text()'
(note that normally you wouldn't need the perl -pe stuff to fix up the broken HTML assuming it parses cleanly - I think this may be a bug in libxml2)
|