
Stuck in the Senate
Last month we created an RDF representation of the United States Senate, and this month I was going to do the same for the House of Representatives. But after looking closely at my Senate RDF, and thinking about the sort of queries I wanted to make of it, I realized that it's a mess. So in this column, we're going to (hopefully) fix it.
Let's take a look at a sample Senator again:
<USSenator rdf:about="http://kerry.senate.gov/">
<FullName>Kerry, John F.</FullName>
<URI>http://kerry.senate.gov/</URI>
<Party>Democrat</Party>
<State>MA</State>
<Address>304 RUSSELL SENATE OFFICE BUILDING
WASHINGTON DC 20510</Address>
<Phone>(202) 224-2742</Phone>
<SenateClass>II</SenateClass>
<ContactURI>
http://kerry.senate.gov/bandwidth/contact/email.html
</ContactURI>
</USSenator>
Figure 1. A sample senator, in RDF.
Here's the problem: this RDF above describes John Kerry, a human being. But "USSenator" is not a "Human Being" -- make of that what you will. If someone were born a senator, and remained one for life (Strom Thurmond came close), then "USSenator" might be a fine subject in our RDF triple. But people are not their roles. If the human being John Kerry is elected president in a few weeks, he'll go from Senator to President, and my current ad-hoc RDF schema will burst into flames. People enact many roles over their lifetimes, not just one; or looked at the other way around, many roles are fulfilled by more than one person. So we need to split up roles and humans.
OK, that's not too hard; we can describe humans like this:
<Human rdf:ID="JohnKerry">
<HasRole rdf:resource="#USSenator"/>
<-- Description goes here -->
</Human>
Figure 2. The RDF for a human being.
And roles like this:
<Role rdf:ID="USSenator">
<-- Description goes here -->
</Role>
Figure 3. The RDF for a role, in this case the role of "USSenator."
And we're home free, right? Now, in our hypothetical government-browsing application, we can generate a list of Roles, and sort people by their roles, and so forth, yes? Not really.
People vs. Roles
Let's say Kerry is elected in November, and again in 2008. Despite the fact that, when I say "John Kerry" everyone knows who I'm talking about, JohnKerry is not a unique enough identifier if we're creating data that, hopefully, will be used far in the future. Sure, we could call him JohnKerry01 and the next John Kerry could be JohnKerry02, and so forth; but what if we dig into the history of the House of Representatives and find another "John Kerry" from 1850? Do we start using negative numbers? Our numbering scheme will go seriously out of whack.
There's another problem. That rdf:ID up there? When all the namespaces get resolved, that ID is actually an HTTP URI: http://www.hackingcongress.org/ns/Politics#JohnKerry. And that opens up a can of web architecture worms, because HTTP URIs look exactly like URLs. When we see them, we expect them to point to something, and we expect to be able to dereference them. In RDF, HTTP URIs don't necessarily point to anything. They may just serve as unique identifiers, sort of like logical constants. Whether HTTP URIs should point to something or not, and variations on that theme, is a constant source of debate. It all gets to be a little much, sometimes.
URNs Aren't Just for Funerals
Enter the URN. URN stands for Uniform Resource Names. URNs are legitimate URIs, but they don't point to anything. Not only do URNs not point to anything, but they obviously don't point to anything; no one will waste time putting a URN into Firefox expecting something useful to happen. A URN looks like this: myscheme:some-unique-id. If we wanted to use a religious metaphor, we could say that HTTP URIs are like Christianity -- they show you the way to another place. URNs, on the other hand, are Zen. They don't need to point anywhere. They simply bask in the light of their own uniqueness.
Of course, URNs can point to things. For instance, the LSID URN scheme describes resources specific to the life sciences, and LSID Resolution Project is working on ways to make applications aware of LSID URNs.
URNs have one major limitation for our purposes, however: each scheme is supposed to be registered with the IETF in order to be considered a standard. Which would be a major pain, except that someone has come up with a solution: the Tag URI.
A Tag URI combines the best of both worlds: they look and act like URNs, offering a unique name for a resource that no one will try to dereference, just like a URN. But, unlike URNs and like URIs, you don't have to send off to the IETF gurus to be able to coin them legally. You can coin new Tag URIs as easily as you can coin HTTP URIs.
XML.com's editor Kendall Clark turned me on to the Tag URI. Tag URI is a very simple algorithm for creating unique identifiers. "It is simple enough," says its creators Tim Kindberg and Sandro Hawke, "to do in your head." Here's a sample Tag URI for John Kerry: tag:hackingcongress.info,2004-10-05:Kerry,John+F. Like all Tag URIs, it has six parts:
| Order | Part of URI | What is it? |
| 1 | tag: | The URN scheme |
| 2 | hackingcongress.info | the tagging entity |
| 3 | , | a comma |
| 4 | 2004-10-05 | a date in ISO format |
| 5 | : | a colon |
| 6 | Kerry,John+F | a specific identifier |
Now if John Kerry has a great-grandson named John F. Kerry who is elected president in 2104, we can create a new URN for him like this: tag:hackingcongress.info,2104-10-05:Kerry,John+F, and we're home free. The sixth part of the Tag URI, the specific identifier, only has to be uniquely relevant to the date in the Tag URI. This allows us to avoid all manner of brain-bending numbering schemes.
Taking it a bit further, here are Tag URI URNs for the other two candidates:
George W. Bush
tag:hackingcongress.info,2004-10-05:Bush,George+W
Ralph Nader
tag:hackingcongress.info,2004-10-05:Nader,Ralph
Pages: 1, 2 |