Menu

Getting in Touch with XML Contacts

March 31, 2004

John E. Simpson

Q: How do I record contact information in XML?

I am trying to develop an address book kind of application. The contact information will be maintained in XML format. Is there any standard DTD for contacts?

A: Great question, especially since it's firmly grounded in common sense. An address book would seem to be one of the simplest XML applications to develop from scratch. But if it's so simple, surely someone must have already tackled it. Why reinvent the wheel? And as it happens, you've got several options. Which you select is a matter of preference, compatibility with other standards, and perhaps compatibility with the other parts of your application.

vCard in XML

First, as far back as 1998 -- the year XML 1.0 became an XML Recommendation -- Frank Dawson submitted a proposal to the Internet Engineering Task Force (IETF) for a "vCard in XML" standard. As you may know, a vCard is an "electronic business card," suitable for exchanging information between, for example, two e-mail correspondents. (Indeed, many e-mail application programs allow you to set up and attach vCards to your messages.) The vCard standard consists of two documents promoted by IETF and the Internet Mail Consortium: "A MIME Content-Type for Directory Information" and "vCard MIME Directory Profile."

MIME is the Multipurpose Internet Mail Extensions standard, also an IETF specification, which dates back to 1996. The simplest way to think of MIME in this context is that it allows you attach to an e-mail message some other content, such as a text file, an image, or even a vCard.

There's nothing inherently XML-based about the vCard specification itself. (Most applications, for that matter, don't represent vCards in XML format.) But Dawson, who also contributed to the aforementioned two documents, independently devised a DTD for representing vCard data. You can find a copy of it, together with an abstract and other supporting materials, at Robin Cover's invaluable "XML Cover Pages" site.

All of these documents date back to 1998, which is ancient history in terms of XML. Why might you be interested in such a cobwebbed standard?

The answer is that the vCard in XML standard has been adopted by the Jabber Software Foundation for use in their flagship Jabber project -- an open-source instant-messaging protocol. Dozens of IM clients are now available supporting Jabber's various protocols, including their version of vCard in XML. (Note that this is a de-facto standard: although it hasn't been officially blessed by the Jabber Software Foundation, it's in widespread use among Jabber clients.)

A Jabber vCard is contained in a Jabber XML wrapper element (including instructions for sending/retrieving the vCard itself), called iq. Here's a sample vCard-only portion of such an exchange, taken from the specification (actual addresses altered for obvious reasons):

<vCard xmlns='vcard-temp'>

  <FN>JosephUser</FN>

  <N>

    <GIVEN>Joseph</GIVEN>

    <FAMILY>User</FAMILY>

    <MIDDLE/>

  </N>

  <NICKNAME>joe</NICKNAME>

  <EMAIL>

    <INTERNET/>

    <PREF/>

    <USERID>joseph@notareal.org</USERID>

  </EMAIL>

  <JABBERID>joe@notareal.org</JABBERID>

</vCard>

The W3C vCard in XML/RDF Note

In 2001, IPR Systems Pty Ltd submitted a Note to the W3C, formally outlining the use of XML as a vCard standard. Like other Notes, this one -- its full title is "Representing vCard Objects in RDF/XML" -- has no official status; you might consider such Notes "strawman"-style proposals or extended comments on other proposals. Still, depending on how much detail you want to provide in your contacts-management application, and how concerned you are with meshing your approach with the larger world of standards, it might be worth taking a look at.

Like Jabber's vCard in XML approach, the vCard in XML/RDF proposal (which I'll henceforth refer to simply as vCard/RDF) embeds vCard-type information in a larger XML document. The wrapper here, though, isn't an application-specific one (like Jabber's IM protocol). Instead, it's a general-purpose Resource Description Framework (RDF) document. RDF is a full-blown W3C Recommendation; its purpose is to encode metadata about Internet resources. In vCard/RDF's case, the resource in question is the vCard itself.

What might you want to know about a vCard, other than the contact information which it includes? At the very least, you might want to know what (or rather, who) a given vCard is about. My vCard might tell you how to get in touch with me by various means: postal and e-mail addresses, phone numbers, and so on. But that contact information doesn't lay out for you everything you might need to know about me; in short, it doesn't describe me.

vCard/RDF attacks this problem by combining, in a given document, information in the RDF namespace with information in the vCard namespace. Here's an example, taken from the vCard/RDF Note (RDF-namespace elements and attributes boldfaced):

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

  xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" >

  <rdf:Description

rdf:about="http://qqqfoo.com/staff/corky">

    <vCard:FN>Corky Crystal</vCard:FN>

    <vCard:N

rdf:parseType="Resource">

      <vCard:Family>Crystal</vCard:Family>

      <vCard:Given>Corky</vCard:Given>

    </vCard:N>

    <vCard:EMAIL

rdf:parseType="Resource">

     

<rdf:value>corky@qqqfoo.com</rdf:value>

      <rdf:type

rdf:resource="http://www.w3.org/2001/vcard-rdf/3.0#internet"/>

    </vCard:EMAIL>

    <vCard:ORG

rdf:parseType="Resource">

      <vCard:Orgname>qqqfoo.com Pty

Ltd</vCard:Orgname>

      <vCard:Orgunit>

        <rdf:seq>

          <rdf:li>Commercialisation

Division</rdf:li>

          <rdf:li>Engineering

Office</rdf:li>

          <rdf:li>Java

Unit</rdf:li>

        </rdf:seq>

      </vCard:Orgunit>

    </vCard:ORG>

  </rdf:Description>

</rdf:RDF>

In general, most of the RDF markup is used to describe constraints on how the contact information is structured or what sort of resource a particular datum is. (For instance, the three rdf:li elements are to be used in the order shown when referring to "Corky Crystal's" work unit; this constraint is imposed by making those elements children of an rdf:seq element.) Aside from that markup, however, note in particular the rdf:Description element:

  • Everything about this contact is contained within rdf:Description's scope. While the simple rdf:RDF element does perfunctory duty as the document's true root, rdf:Description might be considered its heart and soul.
  • The rdf:about attribute points to a resource outside this document which really tells you about Corky -- not how to get in touch with Corky, but who Corky is. (Of course, an application which cares only about contacting Corky would be free to ignore this information. But it's great to have it available, and mixing the vCard markup with RDF is what makes that availability possible.)

It's also interesting to compare this vCard/RDF sample with the Jabber vCard above. Even without considering the namespace prefixes, the vCard/XML Note doesn't seem to be consistently tied to the element names from the earlier standard: EMAIL is EMAIL in both, but Jabber's FAMILY becomes vCard/RDF's Family.

A commercial alternative

In researching this column, I came across an existing commercial contact-management package which touts XML-readiness as a feature. The application is called GoldMine, from FrontRange Solutions. (I don't claim, of course, that this is the only such package. If you know of others, feel free to use the "Comment on this Article" link below.)

While GoldMine isn't just a contact manager, managing contacts seems to be at the heart of the other things the product does. The last several versions have offered an import from/export to XML feature, specifically for transferring contact data between GoldMine itself and other applications or data sources. All that's required for importing to GoldMine is that the data conform to the expected structure. (Exported data presumably conforms to the structure without further user involvement.)

The structure in question is codified in an XML Schema document. You will probably search the FrontRange Web site in vain for this Schema -- I certainly did -- but I was able to obtain a copy of it through the generosity of FrontRange's marketing organization. While you asked specifically for a DTD, it's worth taking a look at the GoldMine Schema for insights into how a commercially successful product solves the problem (including, not insignificantly, how to handle multiple contacts in the scope of a large-scale application).

Tying it together

So the instinct implied in your question was right: you're nowhere near the first to consider using XML as a structured-data format for contact information. But you might consider broadening the question's scope a bit, by imagining something a bit more elaborate than a "closed-shop" contact-management system: how might you build a tool for translating contact information from one of these standards (or any others you can find) to any one of the others?

The obvious platform for such a tool is XSLT. I'm about out of space in this month's column to detail every issue you'd want to (ahem) address, should you decide to tackle this bigger project. Still, here are a few points to consider:

  • Do you want a single stylesheet for handling all the combinations of input (source) and output (result) data structures?
    • A single stylesheet might be parameter-driven (specify the input and output data types, e.g. "vCardRDF" and "GoldMine," at runtime).
    • Also in XML Q&A

      From English to Dutch?

      Trickledown Namespaces?

      From XML to SMIL

      From One String to Many

      Little Back Corners

    • Multiple stylesheets -- one for each input/output combination -- might be simpler to tackle at first. But they might be harder to maintain and keep consistent over time. (Plus, they wouldn't be able to take advantage of structural duplication; for instance, regardless whether you're transforming to Jabber or vCard/RDF, an EMAIL element is an EMAIL element.)
  • Each standard includes not only required elements and attributes, but optional ones as well. How will you handle this optionality? For example, vCard/RDF allows for the inclusion within the vCard of text-encoded binary data, such as an image. Is there some other way of including (or at least referencing) this possibly useful data in vCards conforming to the other standards?
  • Are the specific element contents and attribute values' data types consistent across standards? How will you handle differences?
  • Is there some way to leverage your newfound awareness of the various data structures to provide output to formats besides other XML-based contact managers? As an example, think of feeding the contact information through a stylesheet to generate an XSL-FO document; this might be suitable for printing to Rolodex-type hardcopies, or even being passed to a text reader for audible output.

The important thing, I think, is not to confine your imagination to the relatively static context of "an XML document" -- even a bunch of XML documents. As always with XML, the most important questions are not those dealing with the data as such, but those dealing with what to do with the data once it's in XML form -- not only what the data might be, but what it might just come to mean.