XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XML Linking Technologies
by Eric van der Vlist | Pages: 1, 2

Simple XLinks

In contrast to the changes required to document structure for RDF, the latest XLink CR has chosen to locate most of the linking information in attributes. These attributes, placed in a separate namespace, are usable with minimal impact as an addition to XML vocabularies.

However, XLink is still a Candidate Recommendation, its list of public implementations is very limited, and many areas of possible usage are unexplored.

The next revision of our book catalog vocabulary will use so-called XLink "simple links," which are very similar in principle to XHTML's "<a href=..." link. XLink uses XPointer syntax, which, according to the current CR, lets us choose between three different addressing schemes.

For the purposes of this article, we will ignore the child-sequence scheme in which the nodes are accessed through their sequence order in the document. It's very sensitive to document changes. We'll also ignore the full XPointer scheme, which, relying on XPath expressions, cannot be processed by XSLT processors without XPointer support -- which don't exist yet -- or proprietary extensions to evaluate variable XPath expressions. Instead we'll use the last addressing scheme, which is called as bare-names scheme.

The bare-names scheme is the most similar in its syntax to the XHTML anchors, and it's based on the IDs that we have already used above as physical user links. Since the DTD is currently the only mechanism supported by XPointer, we will add a minimal DTD to declare the IDs used in our document

<!DOCTYPE library [
<!ATTLIST author id ID #IMPLIED>
<!ATTLIST book id ID #IMPLIED>
<!ATTLIST character id ID #IMPLIED>
]>

(library6.xml)

This DTD is sufficient for non-validating parsers. We have declared the IDs as optional (#IMPLIED) because we are using the author and character elements both as resources and links, and DTDs do not offer different attribute lists based on the context.

The declaration of the target nodes is done by using the id attribute defined above.

   <author id="author_Charles-M.-Schulz">
      .../...
   <character id="character_Snoopy">

(library6.xml)

The references are defined using XLink simple link syntax.

   <book id="book_0836217462">
        <isbn>0836217462</isbn>
        <title>Being a Dog Is a Full-Time Job</title>
        <author xlink:href="#author_Charles-M.-Schulz" 
           xlink:type="simple">Charles-M. Schulz</author>
        <character xlink:href="#character_Peppermint-Patty" 
           xlink:type="simple">Peppermint Patty</character>
        <character xlink:href="#character_Snoopy" 
           xlink:type="simple">Snoopy</character>
        <character xlink:href="#character_Schroeder" 
           xlink:type="simple">Schroeder</character>
        <character xlink:href="#character_Lucy" 
           xlink:type="simple">Lucy</character>
    </book>

(library6.xml)

Since we are using bare-names XPointers, we are just specifying the values of the IDs we use as targets after the # separator used to separate the document URI from the fragment identifier. Additionally we could have altered the appearance of the link in a user agent using the XLink behavior attributes xlink:show (new, replace, embed, other or none) and xlink:actuate (onLoad, onRequest, other or none).

The expansion of the links can now be done by a template with a single instruction general to all nodes in our documents containing non-null xlink:href attributes.

<xsl:template match="*[@xlink:href]">
    <xsl:copy-of select="id(substring-after(@xlink:href, '#'))"/>
</xsl:template>

(expand6.xsl)

Link Validation

The situation is less appealing when we want to validate these links. Since the id value (character_Snoopy) doesn't match the reference (#character_Snoopy) because of the "#" separator, none of the DTD or XML Schema technologies can be used.

XML Schema's keyref is highly flexible, but unfortunately its field element must specify an XPath expression pointing to a node of the document. In this case, we would need to give the result of a function -- substring-after(@xlink:href, '#').

Checking the validity of XLink links is a tricky issue, mentioned in the XPointer requirements, which the XML Linking WG has avoided. In our case it should be possible, as indicated by Eve Maler, to use rule-based validators such as Schematron to perform this kind of validity check or to use a stylesheet using a similar approach as our expander (expand6.xsl).

The main benefit of using XLink simple links is the future ability of rendering tools to present these links to the users when using the XLink behavior attributes. An XLink-enabled processor might also automatically expand the document for us when xlink:show is set to embed and xlink:actuate is set to onLoad.

What we've lost by using simple XLink links is a level of abstraction, since we are again relying on physical IDs. When we write xlink:href="#character_Peppermint-Patty" we specify the node from the current document whose id is "character_Peppermint-Patty," regardless of whatever content this node might have -- we're not specifying either the character whose name is "Peppermint Patty" (user-defined logical links), or even the resource whose identifier is http://my.library/character/Peppermint-Patty (RDF).

Extended XLinks

The loss of abstraction suffered with simple XLinks is inherited from the HTML a/@href links that simple XLinks are meant to replace, and it's one of the reasons why XLink extended links have been introduced. The other reason is to allow links to live independently of the structures they link.

Please note that I don't know of any validation tool or services for extended XLinks, and that the following example has only been validated through a thorough reading of the XLink CR.

Since we will remove the links from the structure, the book elements can be simplified.

   <book id="book_0836217462">
        <isbn>0836217462</isbn>
        <title>Being a Dog Is a Full-Time Job</title>
   </book>

(library7.xml)

The character and author elements are the same as in our previous example.

For each extended link, XLink requires that we declare the participating resources. These can either local or external to the extended link. We also need to define the relations (arcs) between these resources.

To keep things simple in our example, we'll define a single extended link with all the relations between the books, author, and characters in our book catalog.

The link container is defined as

   <links xlink:type="extended">

(library7.xml)

A common characteristic of all the XLink components is that the names of the elements are not significant for XLink. The specification shows how one can take advantage of this by defining default values for the attributes, allowing a terser syntax. In this case, we could have declared in the DTD that the default value of the xlink:type attribute in the links element is extended and, thus, avoided the pain of writing it here.

While I agree that it can aid in the authoring of such documents, I have found that it makes the examples more difficult to read and understand, and I have chosen to keep a complete syntax in the example presented here.

The participants in the link are all external resources (where "external" means external to the extended link, even if in our case they are located in the same document) and the xlink:type to declare them has to be "locator." The declaration of the resources requires assigning a local label to each of them, and we'll reuse the identifier to keep things simple.

The declaration of a book is

   <book xlink:type="locator"
         xlink:href="#book_0836217462"
         xlink:role="http://my.library/roles/book"
         xlink:label="book_0836217462"/>

(library7.xml)

Similar declarations need to be made for authors.

   <author xlink:type="locator"
           xlink:href="#author_Charles-M.-Schulz"
           xlink:role="http://my.library/roles/author"
           xlink:label="author_Charles-M.-Schulz"/>

(library7.xml)

And for characters:

   <character xlink:type="locator"
              xlink:href="#character_Snoopy"
              xlink:role="http://my.library/roles/character"
              xlink:label="character_Snoopy"/>

(library7.xml)

After all the resources have been declared, we can describe all the relations between them, including relations between the books and their author.

   <arc xlink:type="arc"
        xlink:arcrole="http://my.library/roles/writen-by"
        xlink:from="book_0836217462"
        xlink:to="author_Charles-M.-Schulz"/>

(library7.xml)

And those between the books and the characters,

   <arc xlink:type="arc"
        xlink:arcrole="http://my.library/roles/featuring"
        xlink:from="book_0836217462"
        xlink:to="character_Lucy"/>

(library7.xml)

We see that we can mix physical pointers (xlink:href="#book_0836217462") with metadata which provides more information about the pointers (xlink:role="http://my.library/roles/character", or (xlink:arcrole="http://my.library/roles/featuring").

We could also have added more metadata through xlink:title attributes and used the behavior attributes (xlink:show and xlink:actuate) to provide additional information to rendering agents.

Of course, we are still using bare-names XPointers, and we need to keep the same minimal DTD then we had in our previous document.

<!DOCTYPE library [ 
<!ATTLIST author id ID #IMPLIED>
<!ATTLIST book id ID #IMPLIED>
<!ATTLIST character id ID #IMPLIED>]>

(library7.xml)

Expanding the Extended Link

Expansion is still possible using XSLT, but it requires us to go through the different steps of indirection. I have chosen to use a short XSLT template for each of these steps. Since the same element and attribute names are used at several locations in the structure with different meaning, we are using a mode (links1, link2 and links3) for each of these steps.

First, we need to go from the book node to the XLink locator that defines its label, using the book ID as a link.

<xsl:template match="lib:book">
    <xsl:copy>
        <xsl:apply-templates/>
        <xsl:variable name="id" select="concat('#', @id)"/>
        <xsl:apply-templates 
           select="/lib:library/lib:links/lib:book[@xlink:href=$id]" 
           mode="links1"/>
    </xsl:copy>
</xsl:template>

(expand7.xsl)

Then we need to go from the book locator to the different arcs in which it is involved.

<xsl:template match="lib:book" mode="links1">
    <xsl:variable name="label" select="@xlink:label"/>
    <xsl:apply-templates 
       select="../lib:arc[@xlink:from=$label]"
       mode="links2"/>
</xsl:template>

(expand7.xsl)

Then to the locators defining the resource that is linked.

<xsl:template match="lib:arc" mode="links2">
    <xsl:variable name="label" select="@xlink:to"/>
    <xsl:apply-templates 
       select="../*[@xlink:label=$label and @xlink:type='locator']"
       mode="links3"/>
</xsl:template>

(expand7.xsl)

And finally, we can copy the resource itself.

<xsl:template match="*[@xlink:type='locator']" mode="links3">
    <xsl:copy-of
       select="id(substring-after(@xlink:href, '#'))"
       mode="links"/>
</xsl:template>

(expand7.xsl)

We could have eliminated two of the four steps by using our knowledge that the labels were equal to the IDs, but we would have been dependent on this design, which would have been more error prone in the long run.

We see that, even though more complex, the manipulation of these links is still possible using conventional XML tools such as XSLT.

The validation of the links is impossible using a DTD or XML Schema because of the XPointer syntax itself. The benefit of this way of defining our links is that we have completely dissociated the elements from the links, just as we would have done in the entity-relation models used in relational databases. This dissociation has been made without losing the semantics carried by RDF: an interesting study available as a W3C Note has shown how RDF statements can be extracted from extended XLinks.

The ability to define links between nodes without modifying these nodes is key for defining links between resources you don't own on the Web. This is one of the reasons why extended XLinks is the technology of choice for topic maps. This ability can be shared by RDF though, and we could have designed our RDF example to achieve this separation between objects and relations -- we could even have used XPointers in RDF.

The difference between the extended links and RDF appears to be mostly a subtle difference of focus. RDF focuses on assertions about links; XLink on links carrying assertions.

Acknowledgements and References

Many thanks to Henry Thompson and Eve Maler for their patient answers to my emails on the W3C mailing lists, to Didier Martin whose presentation during XML Europe 2000 was the starting point of this work, and to the contributors to the RSS-DEV mailing list for their enlightening messages about RDF related issues.

Related links: