XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 Screenscraping the Senate
Subject: Texas
Date: 2004-09-09 10:03:13
From: banksean

A while back I started to map out the Texas state legislature in a similar way. The data sources were scattered all over (hard to even find a map of rep -> party, strangely enough)


http://www.cricketschirping.com/weblog/index.php?p=186


This applet shows a map of senators, committees, and registered contributors. Mousing over a link between a contributor and a senator shows you how much was cotributed.


Don't remember off the top of my head how many sources had to be scraped but it was about five or six different web sites. The database of contributors was an MS Word file for crying out loud. Had to save it as rtf or .txt just to parse it IIRC.


A big issue for me is maintenance. It's not so hard to collect all this info ONE TIME. Unfortunately these sites tend to change formats and update the data with no coordination between them. Keeping such a distillation complete and up to date is going to be a real pain.


Previous Message Previous Message   Next Message Next Message

Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938