Query Census Data with RDF
by Joshua Tauberer
|
Pages: 1, 2, 3
Question-Answering
Question-answering — like asking Google for the population of Philadelphia — is where I see the Semantic Web making its most important contribution to the world. Remember that the problem Google had is that it can't understand the information on web pages. Clearly, if we want to build a system that can do that, that can understand knowledge spread throughout the Internet, we all need to be using some common framework for representing knowledge, like RDF.
So let's go ahead and write a little question-answering system over the census data we've been using. It should recognize questions like this:
what is the ____ of _____ ?
ex. what is the population of California?
It's actually quite easy to get something crude working. Using a regular expression, the two blanks in the question can be extracted:
import re;
m = re.search('what is the (.*) of (.*)\??', question);
if m:
predicatename = m.group(1);
entityname = m.group(2)
# do more processing
else :
print "I don't understand the question."
Then we have to find the RDF entities that match the predicate and entity names given in the question. For the entities, we can use the dc:title predicate:
entity = store.value(None, dc["title"], Literal(entityname));
To find the predicate entity, we don't have any RDF statements to use that relate a predicate to a name for it. That is, we lack this:
census:population rdfs:label "population" .
That's the kind of statement you would find in an RDF schema. If we had that available, we would use the same technique that we used with dc:title, except with rdfs:label. Since we don't have that, we can fall back to looking at the URIs of the predicates as a hint:
predicate = None
for p in store.predicates() :
if (p.lower().endswith(predicatename.lower().replace(' ', ''))) :
predicate = p
Once we have the predicate and entity, there's just one more step to finding the corresponding value:
value = store.value(entity, predicate, None);
print entityname + "'s " + predicatename + " is " + value;
The complete Python source for this program is posted.
Running the program yields:
# python qa.py what is the population of California?
California's population is 33871648
If this were the only question we wanted to ask, we wouldn't have written the program. Of course we can ask it for any state, county, or town that the census reported statistics for (provided we know the exact name the census used for it). But we can also use other predicates.
# python qa.py what is the USPS state code of Mississippi?
Mississippi's USPS state code is MS
# python qa.py what is the land area of New York?
New York's land area is 122283145776 m^2
Surprise, right? Haven't you ever forgotten a state abbreviation for postal mail? I hadn't mentioned it, but in the RDFized census files that I posted there are predicates named census:landArea and census:uspsStateCode in the RDF data along with the population predicate. Maybe we got a little lucky that I chose good URIs for the predicates.
But it did work, after all.
That's the thing about RDF. We were able to write a totally generic question-answering program. It might only be able to answer a certain form of question, but it's not specific to any particular subject. Without revising the program, it could answer questions really about anything—if it has the answers in RDF.
Share your experience in our forums.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Locksmith West Los Angeles 310-925-1720
2009-01-13 15:27:05 services123 [Reply]
Locksmith West Los Angeles 310-925-1720
locksmith,locks,key,door,auto,car,home,business,rekey,repair,install doors Locks
- A. Locksmith Los Angeles 877-364-5264 Locksmith in Los Angeles - (877) 364-5264
2008-12-21 12:41:04 services123 [Reply]
A. Locksmith Los Angeles 877-364-5264 Locksmith in Los Angeles - - (877) 364-5264
- Locksmith Los Angeles 1-877-364-5264
2008-11-28 10:15:17 orellytos [Reply]
Locksmith Los Angeles 1-877-364-5264
- AAA Locksmith Los Angeles 1-877-364-5264
2008-11-28 10:15:00 orellytos [Reply]
AAA Locksmith Los Angeles 1-877-364-5264
- AAA Locksmith Los Angeles 1-877-364-5264
2008-11-28 10:14:58 orellytos [Reply]
AAA Locksmith Los Angeles 1-877-364-5264
- Carpet Cleaners Los Angeles 1-323-678-2704
2008-09-26 18:13:54 0 [Reply]
Carpet Cleaning Los Angeles 1-323-678-2704
Our mission is to provide the very best carpet and upholstery cleaning services call 1-818-386-1022 to residential and commercial clients throughout Los Angeles, San Fernando Valley, CA . Clean Health Carpet Care is dedicated to meeting the needs of our clients through innovative cleaning technologies, 5-star customer service and industry expertise. Our certified technicians specialize in deep cleaning your carpets and rugs, leaving them looking great and germ free. Free Pick Up and Delivery of area rugs is available upon request. Cleaning of your area rugs can be done in our specialized facility, at your home or in your location. Our specialties: Spot and Stain Removal Pet Stain and Odor Removal Wall to Wall Carpets Schotchguard © Sanitizing and Disinfecting Oriental Rug Cleaning Specialty and Delicate Rugs Water Damage and Restoration. Upholstered furniture has a more varied range of materials and manufacturing methods than carpet. Our knowledgeable technicians can identify the fabric type of each upholstered piece and what cleaning methods will give your upholstery the best, safest and longest lasting results. Sofas Recliners Dinning chairs Love Seats Arm Chairs Leather furniture professional technicians are specially trained in the care of all fabric types, even the most delicate, so you can feel comfortable in knowing they’ll choose the proper upholstery cleaning solution for your furniture. And using specially designed tools to gently clean folds and crevices, they’ll ensure the entire piece is entirely clean
- Query Census Data with RDF
2006-12-28 04:41:03 CharlesKinniburgh [Reply]
Nice article Joshua. Pitched at a very practical level, it was just what I was looking for.
I would love to get up to speed on this stuff sometime.
I wonder if there is a list of tools in your toolbox - and simple instructions or links on how to get them up and working ?
