XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 Screenscraping the Senate
Subject: its not a 'pain' if you get enough grunts to do work
Date: 2004-09-09 15:08:44
From: aeoe@freeshell.org

say there are 100 senators and 500 congress people. say there are 10 web pages you need data from. say the most 'pain in the ass' part is getting the data from the web pages into some automated format, because the web pages are them most scattered part... PDF, DOC, screwy html, etc.


fine. here is what you do.


lets say that temp worker billy can scrape and re-enter, by hand, all the data for 1 congress person in 5 minutes. 600 congress persons = 3000 minutes. = 50 hours. at 8 bucks an hour thats 400 dollars. about a week of work if its one person. about a day of work if its 5 people and you set up things well.


now consider some fancy ass programmer you hire to do xslt/xmllib/etc until they are blue in the face. they charge 40 dollars an hour. thats only 10 hours of work and i guarantee you it would take them more than 10 hours to do the thing.


add on about 10 hours of 'data cleanup', nicknames, mispellings, etc. you can do this 'scrape by hand' once every 2 years and it would cost a few hundred dollars.


i bet you could even auction the job off on ebay. you would get dozens of applicants.


hell i will do it my damn self if you would pay me 8 dollars an hour.


Previous Message Previous Message   Next Message Next Message

Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938