GovTrack.us, Public Data, and the Semantic Web
No matter where you fall in debates over free software or DRM, there's one type of information that is unarguably meant to be free, and that's information about our government. The more knowledge citizens have about government the better. So how can we use XML and the Semantic Web to make it easier to get that knowledge, and to foster civic participation?
This is a question I've spent a lot of time on over the past few years while putting together www.GovTrack.us, a site that gathers existing information on the web about the U.S. Congress and puts it all together in new ways, using RSS feeds and Google Maps, for instance. The site is possible because the government has been posting the relevant information online for a while, but in scattered locations. For instance, legislation is posted in one place and votes on the very same legislation in another. Gathering the information in one place and in a common format gives rise to new ways of mixing the information together.
Each day GovTrack screen-scrapes these sites to gather the new information. The information gets normalized and goes into XML files so that when GovTrack wants to display the status of a bill to a user, it can just run an XSLT stylesheet on the XML bill file.
There have been around 40,500 bills introduced in Congress since 1999 (the vast majority aren't ever seriously considered, which says a lot about the process). Here's part of the file for the bill passed Sept. 14, 2001, authorizing the President to use military force against terrorists:
<bill session="107" type="sj" number="23"> <titles> <title type="popular">Military Force Authorization resolution</title> <title type="official">A joint resolution to authorize the use of United States Armed Forces against those responsible for the recent attacks launched against the United States.</title> </titles> <sponsor id="300031" /> <actions> <vote date="1000440000" how="roll" roll="281" where="s" /> <vote date="1000520280" how="without objection" where="h" /> <enacted law="107-40" date="1000785600"/> </actions> <subjects> <term name="Defense policy"/> <term name="Air piracy"/> <term name="Armed forces"/> ... </subjects> <summary> Authorizes the President to use all necessary and appropriate force... </summary> </bill>
All of that comes from the official source, except the official source doesn't provide the information in a structured way. GovTrack is responsible for parsing dates, turning names into IDs, picking out the list of actions, and so on. GovTrack also fetches voting records and other documents and puts them into XML. (By the way, if you want to play with the data, all of the XML files that power GovTrack are made available to be freely reused.)