Sign In/My Account | View Cart  
advertisement

Article:
 GovTrack.us, Public Data, and the Semantic Web
Subject: Swedish law xmlized by student, too
Date: 2006-03-01 20:28:55
From: somegeek
Response to: Swedish law xmlized by student, too

hi
I was wondering if it is possible to use this kind of data (metadata) to build a psearch engine that is able to take in queries on a certain Senator or a certain bill and return that information to the user.
i was thinking of using the popular search engine Nutch and customize it to do this kind of searches


Do you think this is possible


thanks
ilango


No Previous Message Previous Message Move up to Parent Message Up Next Message No Next Message


Titles Only Titles Only Newest First
  • Making a search engine
    2006-03-12 05:00:00 JoshuaTauberer

    Yes, and some people have already tried to make search engines over semantic web data. Search on TAP (http://sp02.stanford.edu/) and Swoogle (http://swoogle.umbc.edu/) come to mind.


    The limiting factor here is that there's simply very little data in RDF out there to use. Google is so great because there are 20 billion web pages indexed, each containing hundreds or thousands of "bits" of information (in this case, words). That's at least in the trillions of words. If I had to guess, there are probably less than 200 million RDF triples out there on the web (of real-life data, and not counting info on proteins from UniProt), the majority of which coming from just a handful of sources.


    But also consider that the semantic web makes new types of question-answering applications possible. Search engines can't answer 'questions' with any accuracy unless the question is "who is using this word on their page?". The SW provides the structure for accurately answering a new, larger class of questions. We're going to have to think up new and better user interfaces for getting answers out of the semantic web.


    People are working on this in various places in various ways. The idea of question-answering (in general) is a big topic in computational linguistics.


Sponsored By: