XML Linking
This document specifies a simple set of constructs that may be inserted into XML documents to describe links between objects and to support addressing into the internal structures of XML documents. It is a goal to use the power of XML to create a structure that can describe the simple unidirectional hyperlinks of today's HTML as well as more sophisticated multi-ended, typed, self-describing links.
Note that some of the work that is still be done in this particular draft is described in Appendix A.This work is part of the W3C SGML Activity (for current status, see http://www.w3.org/MarkUp/SGML/Activity).
The existence of links is asserted by the presence of elements contained in XML documents. They may or may not reside at the locations of, or in the same documents with, the objects which they serve to connect.
HTML
resource
A, HyTime clink, and TEI XREF are all examples of in-line links.
link relationships
XML-LINK. Possible values are SIMPLE, EXTENDED, LOCATOR, GROUP, and DOCUMENT, signalling in each case that the element in whose start-tag the attribute appears is to be treated as an element of the indicated type, as described in this specification.An example of such a link follows:
<A XML-LINK="SIMPLE" HREF="http://www.w3.org/"The W3C</A>
XML-ATTRIBUTES attribute.
This attribute must contain an even number of white-space-separated names, which are treated as pairs. In each pair, the first name must be one of those described in this specification: (ROLE, HREF, TITLE, SHOW, INLINE, CONTENT-ROLE, CONTENT-TITLE, ACTUATE, BEHAVIOR, STEPS). The second name, when recognized in the document, will be treated as though it were playing the role assigned to the first. For example, consider a DTD with the declaration shown in Example 1.
<!ELEMENT TEXT-BOOK ANY>
<!ATTLIST TEXT-BOOK TITLE CDATA #IMPLIED
ROLE (PRIMARY|SUPPORTING) #IMPLIED>
If it were desired to use this as a simple link, it would be necessary to remap a couple of attributes, which could be accomplished in the internal subset shown in Example 2.
<!ATTLIST TEXT-BOOK XML-LINK CDATA #FIXED "SIMPLE"
XML-ATTRIBUTES CDATA #FIXED
"TITLE XL-TITLE ROLE XL-ROLE">
Then in the document, Example 3 would be recognized as a simple link.
<TEXT-BOOK TITLE="Compilers: Principles, Techniques, and Tools"
ROLE="PRIMARY" XL-TITLE="Primary Textbook for the Course"
XL-ROLE="ONLINE-PURCHASE"
HREF="/cgi/auth-search?q="+Aho+Sethi+Ullman"/>
XML-LINK and XML-ATTRIBUTES attributes with a linking element. The simplest is to provide it explicitly. However, this practice is verbose, and would be not only cumbersome but wasteful of network bandwidth in the case where there are large numbers of linking elements. Fortunately, XML's facilities for declaring default attribute values can be used to address this problem. For example, the following would accomplish the declaration of the A element as an XML SIMPLE link:
<!ATTLIST A XML-LINK CDATA #FIXED "SIMPLE">
Such a declaration may be placed in either the external or the internal subset of the Document Type Declaration. Placing it in both subsets would be the obvious thing to do for convenient network operation. So doing, at the time of creation of this specification, would cause the document to fail to be valid. Note that the successful completion of the current work on a technical corrigendum to ISO 8879 that is in the process of international ballot would resolve this problem and allow this practice in valid documents. However, for interoperability, the declaration should not be placed in both subsets.A element. Second, a much more general extended link which may be either in-line or out-of-line and may be used for multi-directional links, links into read-only data, and so on. Role
ROLE attribute is used to provide both link and resource roles.
HREF attribute, as described in section 5, "Addressing."
TITLE attribute. This specification does not require that applications make any particular use of the title.
SHOW and ACTUATE attributes may be used by an author to communicate general policies concerning the traversal behavior of the link; this specification defines a small set of policies for this purpose. The BEHAVIOR attribute may be used to communicate detailed instructions for traversal behavior; this specification does not constrain the contents, format, or meaning of this attribute.
INLINE attribute may be used to communicate whether the linking element is in-line or not.
ANY.
Any element may be recognized as a linking element based on use of the XML-LINK attribute; in a valid document, each such element must conform to the constraints expressed in its governing DTD.
XREF elements, but with more general reference capabilities. A simple link may contain only one locator; thus there is no necessity for a separate child element, and the locator attributes are attached directly to the linking elements.
Example 4 is a sample declaration for an XML simple link; note that the element type need not be SIMPLE, since the linking element will be recognized based on the value of the XML-LINK attribute.
<!ELEMENT SIMPLE ANY>
<!ATTLIST SIMPLE
XML-LINK CDATA #FIXED "SIMPLE"
ROLE CDATA #IMPLIED
HREF CDATA #REQUIRED
TITLE CDATA #IMPLIED
INLINE (TRUE|FALSE) "TRUE"
CONTENT-ROLE CDATA #IMPLIED
CONTENT-TITLE CDATA #IMPLIED
SHOW (EMBED|REPLACE|NEW) "REPLACE"
ACTUATE (AUTO|USER) "USER"
BEHAVIOR CDATA #IMPLIED
>
A extended link's locators are contained in child elements of the linking element, each with its own set of attributes. Once again, in the sample declaration in Example 5, the element types need not be EXTENDED and LOCATOR; recognition depends on the XML-LINK attribute.
<!ELEMENT EXTENDED ANY>
<!ELEMENT LOCATOR ANY>
<!ATTLIST EXTENDED
XML-LINK CDATA #FIXED "EXTENDED"
ROLE CDATA #IMPLIED
TITLE CDATA #IMPLIED
INLINE (TRUE|FALSE) "TRUE"
CONTENT-ROLE CDATA #IMPLIED
CONTENT-TITLE CDATA #IMPLIED
SHOW (EMBED|REPLACE|NEW) "REPLACE"
ACTUATE (AUTO|USER) "USER"
BEHAVIOR CDATA #IMPLIED
<!ATTLIST LOCATOR
XML-LINK CDATA #FIXED "LOCATOR"
ROLE CDATA #IMPLIED
HREF CDATA #REQUIRED
TITLE CDATA #IMPLIED
SHOW (EMBED|REPLACE|NEW) "REPLACE"
ACTUATE (AUTO|USER) "USER"
BEHAVIOR CDATA #IMPLIED
>
The declared content of ANY for the linking element is perhaps misleading; the idea is that locator elements should appear as children of the linking elements, along with any other content that is appropriate.
Note that many of the attributes may be provided for both the parent linking element and the child locator element. If any such attribute is provided in the linking element but not in a locator element, the value provided in the linking element is to be used in processing the locator element. In other words, the attributes provided in the linking element may serve as defaults for the (possibly many) locator elements.
INLINE attribute can take the values TRUE and FALSE. The value TRUE, which is the default, means that all of the content of the linking element is to be considered a resource of the link, except for any child locator elements (which are considered part of the linking element machinery).
When the link is in-line, the CONTENT-ROLE and CONTENT-TITLE attributes may be used to provide the title and role information for this "content" resource. If INLINE is FALSE, it is not an error to provide the CONTENT-TITLE or CONTENT-ROLE attributes, but they have no effect.
SHOW and ACTUATE. These are used to express policies rather than mechanisms; programs which are processing links in XML documents are free to devise their own mechanisms, best suited to the user environment and processing mode, to implement the requested policies.
In many cases, there will be a requirement for much finer control over the details of traversal behavior; existing hypertext software typically provides such control. Such fine control of link traversal is outside the scope of this specification; however, the BEHAVIOR attribute is provided as a standard place for authors to provide, and in which programs should look for, such detailed behavioral instructions.
SHOW attribute is used to express a policy as to the context in which a resource that is traversed to should be displayed or processed. It may take one of three values:
EMBED
REPLACE
NEW
ACTUATE attribute is used to express a policy as to when traversal of a link should occur. It may take one of two values:
AUTO
USER
# and a fragment identifier, with the query interpreted by the host providing the indicated resource, and the interpretation of the fragment identifier dependent on the data type of the indicated resource. Thus, when a locator in an XML linking element identifies a resource that is not an XML document (for example, an HTML or PDF document), this specification does not constrain the syntax or semantics of the query nor of the fragment identifier.
[1] Locator ::= URL
| Connector (XPointer | Name)
| URL Connector (XPointer | Name)
[2] Connector ::= '#' | '|'
[3] URL ::= URLchar*
In this discussion, the term designated resource refers to the resource participating in the link which the locator serves to locate. The following rules apply:
ID(Name)"; i.e., the sub-resource is the element in the containing resource that has an XML ID attribute whose value matches the Name. This shorthand is to encourage use of the robust ID addressing mode.
#", this signals an intent that the containing resource is to be fetched as a whole from the host that provides it, and that the XPointer processing to extract the sub-resource is to be performed on the client, that is to say on the same system where the linking element is recognized and processed.
[4] Query ::= 'XML-XPTR=' (XPointer | Name) |
The basic form of an XPointer is a series of location terms, each of which specifies a location, either absolute or (more frequently) relative to the prior one. Each term has a keyword such as ID, CHILD, ANCESTOR, and so on, and can be qualified by parameters such as an instance number, element type, or attribute.. For example, the locator string
CHILD(2,CHAP)(4,SEC)(3)
refers to the third child of the fourth SEC within the second CHAP within the referenced document.The syntax for TEI Extended Pointers has been adjusted in order to allow them to be packaged naturally with URLs without requiring URL-escaping of space characters:
FROM and TO attributes into the locator syntax.
A locator can contain either one or two XPointers; if there are two, they are separated by the string "..". For a locator with one XPointer, the designated resource is the element or location selected by the sequence of location terms it contains. With two XPointers, the designated resource is all of the text from the location, or start of the element, selected by the first, through to the location, or the end of the element, selected by the second.
Note that the implementation of traversal to a resource is not constrained by this specification. In particular, handling a resource designated by a span is probably highly application-dependent. In a display-oriented application, such traversal might simply be implemented by highlighting the designated characters. In particular, it should be noted that a span cannot safely be treated as a set of elements; most spans will include partial elements.
A location term is an atomic unit of addressing information; XPointers consist of combinations of location terms. Location terms are grouped into absolute terms, relative terms, and string-match terms. Absolute terms select one or more elements or locations in an XML document; if an XPointer contains only an absolute term, that term identifies its designated resource. If the absolute term is followed by any relative or string-match terms, the elements or locations that it designates are termed a location source and serve as a starting point for the operations of the location terms in Absolute, Relative, and String-match.
If an XPointer omits any leading absolute location terms (i.e., consists only of relative and string-match terms) it is assumed to have a leading ROOT() absolute location term.
The empty parentheses after ROOT, HERE, and DITTO are for consistency with other keywords and to avoid ambiguous interpretation of an extended pointer containing just the string "ROOT" or "HERE".
ROOT(), the location source is the root element of the containing resource. This is the default behavior. ROOT keyword has no effect on the interpretation of the locator; it exists in the interests of design clarity.HERE(), the location source for the first location term of that series is the linking element containing the locator rather than the default root element. This allows extended pointers to select items such as "the paragraph immediately preceding the one within which this pointer occurs." It is an error to use HERE in a locator where a URL is also provided and identifies a resource different from the document which contains the linking elements.
[5] XPointer ::= First ('..' Second)? |
[6] First ::= AbsTerm? RelTerm* StringTerm? |
[7] Second ::= AbsTermOrDitto? RelTerm* StringTerm? |
[8] AbsTerm ::= 'ROOT()' | 'HERE()' | IdLoc | HTMLAddr |
[9] AbsTermOrDitto ::= 'DITTO()' | AbsTerm |
[10] IdLoc ::= 'ID(' Name ')' |
[11] HTMLAddr ::= 'HTML(' SkipLit ')' |
DITTO(), the location source for its first location term is the location source specified by the entire first XPointer in order to facilitate relative specification of a span.ID(Name), the location source for the first location term is the element in the containing resource which has an attribute of type ID with a value matching the given Name. For example, the location specification
ID(a27)
chooses the necessarily unique element of the containing resource which has an attribute declared to be of type ID whose value is a27.HTML(NAMEVALUE) selects the first element whose type is A and which has a NAME attribute whose value is the same as the supplied NAMEVALUE; this is exactly the function performed by the "#"-fragment in the context of an HTML document.The keyword selects zero or more elements relative to the location source, which are referred to as candidate locations. Each keyword summarized here is described in detail in following sections.
CHILD
DESCENDANT
ANCESTOR
PRECEDING
PSIBLING
FOLLOWING
FSIBLING
[12] RelTerm ::= Keyword Arguments+ |
[13] Keyword ::= 'CHILD' | 'DESCENDANT' | 'ANCESTOR' | 'PRECEDING' | 'PSIBLING' |
[14] Arguments ::= '(' Instance ',' ElType (',' Attr ',' Val)* ')' |
Multiple argument lists are a shorthand in which the keyword is considered to have been repeated between each of the steps. That is to say, the following two XPointers are equivalent:
CHILD(2,SECTION)(1,SUBSECTION)CHILD(2,SECTION)CHILD(1,SUBSECTION)
When the value of Instance is the number N, it selects the Nth of the candidate locations. If the special value ALL is given, then all the candidate locations are selected. Negative numbers count from the last candidate location to the first; numbers out of range constitute an error.
The ElType gives an XML element type; only elements of that type will be selected from among the candidate locations. For example, the location term
CHILD(3,DIV1)(4,DIV2)(29,P)selects the 29th paragraph of the fourth sub-division of the third major division of the location source.
The XPointer
DESCENDANT(-1,EXAMPLE)>selects the last example in the document.
Selection by type is strongly recommended because it makes links more perspicuous and more robust. It is perspicuous because humans typically refer to things by type: as "the second section," "the third paragraph," etc. It is robust because it increases the chance of detecting breakage if (due to document editing) the target originally pointed at no longer exists.
The type may be specified by Name or by using one of the values ".", "*CDATA", or "*". If the type is specified as ".", candidate elements of any type are matched. If the type is specified as "*CDATA", the location term selects only untagged sub-portions of an element with mixed content (these are generally referred to as pseudo-elements). Finally, * selects among child elements and pseudo-elements.
Consider the following example:
<SPEECH ID="a27"><SPEAKER>Polonius </SPEAKER><DIRECTION>crossing downstage </DIRECTION>Fare you well,my lord. <DIRECTION>To Ros. </DIRECTION>You go to seek Lord Hamlet? There he is.</SPEECH>
ID(a27),CHILD(2,DIRECTION)
DIRECTION" element, "To Ros."
ID(a27),CHILD(2,.)
crossing downstage".
ID(a27),CHILD(2,*CDATA)
SPEAKER" and "DIRECTION" elements is the first), "Fare you well, my lord."
ID(a27),CHILD(2,*)
SPEAKER" and "DIRECTION" elements.
[15] InstanceOrAll ::= 'ALL' | Instance |
[16] Instance ::= ('+' | '-')? Digit+ |
[17] ElType ::= '*CDATA' /* selects text pseudo-elements */ |
| '*' /* elements and pseudo-elements */ |
| '.' /* elements only */ |
| Name /* elements of this type */ |
The Attr and Val are used to provide attribute names and values to use in selecting among candidates.
If specified within quotation marks, the attribute-value parameter is case-sensitive; otherwise not.
As with generic identifiers, attribute names may be specified as * in location terms in the (unlikely) event that an attribute value constitutes a constraint regardless of what attribute name it is a value for.
For example, the location term
CHILD(1,*,TARGET,*)selects the first child of the location source for which the attribute
TARGET has a value.The location specification
CHILD(1,*,N,2)(1,*,N,1)chooses an element using the
N attribute. Beginning at the location source, the first child (whatever element type it is) with an N attribute having the value 2 is chosen; then that element's first child element having the value 1 for the same attribute is chosen.The location specification
CHILD(1,FS,RESP,*IMPLIED)selects the first child of the location source which is an
FS element for which the RESP attribute has been left unspecified.
Note that the HTML keyword is a synonym for a very specific instance of attribute-based addressing such that the following two XPointers are equivalent:
HTML(Sec3.2) ROOT()DESCENDANT(1,*,A,"Sec3.2")
ID(a23)DESCENDANT(2,TERM,LANG,DE)selects the second
TERM element with a LANG attribute whose value is DE occurring within the element with an ID attribute whose value is A23. The search for matching elements occurs in the same order as the XML data stream (depth-first, left-to-right). If an instance number is negative, the search is depth-first right-to-left, in which the right-most, deepest matching element is numbered -1, etc. The location specification
[18] Attr ::= '*' /* any attribute name */ |
| Name |
[19] Val ::= '*IMPLIED' /* no value specified, no default */ |
| '*' /* any value */ |
| Name /* case and space normalized */ |
| SkipLit /* exact match */ |
ROOT()DESCENDANT(-1,NOTE)
thus chooses the last NOTE element in the document, that is, the one with the rightmost start-tag. ANCESTOR location term selects an element from among the direct ancestors of the location source. The parameters are for CHILD. However, the ANCESTOR keyword selects elements from the list of containing elements or "ancestors" of the location source, counting upwards from the parent of the location source (which is ancestor number 1) to the root of the document instance (which is ancestor number -1).For example, the location term
ANCESTOR(1,*,N,1)(1,DIV)first chooses the smallest element properly containing the location source and having attribute
N with value 1 and then the smallest DIV element properly containing it.
Note that the ANCESTOR keyword's second (element type) argument cannot be * or *CDATA.
PRECEDING keyword selects an element or pseudo-element from among those which precede the location source. The set of elements and pseudo-elements which may be selected is the set of all those in the entire document which occur or begin before the location source. (For purposes of the keywords PRECEDING and FOLLOWING, elements are interpreted as occurring where they start.) The result of the PRECEDING keyword is not guaranteed to be a subset of its location source.
The instance number in the location value of a preceding term designates the nth element or pseudo-element preceding the location source, counting from most recent to less recent. The XPointer
ID(a23)PRECEDING(5,.)
thus designates the fifth element or pseudo-element before the element with an ID of a23. Negative instance numbers also designate preceding elements or pseudo-elements counting from the eldest to the youngest. The value ALL may be used to select the entire portion of the document preceding the beginning of the location source. PSIBLING keyword selects an element or pseudo-element from among those which precede the location source within the same parent element. We speak of the elements and pseudo-elements contained by the same parent element as siblings; those which precede the location in the document are its elder siblings; those which follow it are its younger siblings.
The instance number in the location value of a PSIBLING term designates the nth elder sibling of the location source, counting from most recent to less recent. The location source must have at least as many elder siblings as the absolute value of the instance number; otherwise, the PSIBLING term fails.
ID(a23)PSIBLING(1,.)thus designates the element immediately preceding the element with an
ID of a23. Negative instance numbers also designate elder siblings, but counting from the eldest left sibling to the youngest. If the location source has at least one elder sibling, then the location term
PSIBLING(-1,.)designates the very eldest sibling and is synonymous with
ANCESTOR(1,.)CHILD(1,.)The value
ALL may be used to select the entire range of elder siblings of an element:
ID(a23)PSIBLING(ALL,.)thus designates the set of elements preceding the element with an
ID of a23 and contained by the same parent. FOLLOWING behaves like PRECEDING but selects from the portion of the document following the location source, not preceding it. FSIBLING behaves like PSIBLING but selects from the younger siblings of the location source, not the elder siblings. The XPointer
ID(a23)FSIBLING(1,.)thus designates the element immediately following the element which has an
ID of A23. Negative instance numbers designate younger siblings counting from the youngest sibling toward the location source. If the location source has at least one younger sibling, then the location term
FSIBLING(-1,.)designates its youngest sibling.
In this case the designated resource is a location which is found by searching the textual content of the current location source for occurrences of the SkipLit string given in the second argument. The Index is a number which selects among these occurrences, and the Offset is a number which gives a character offset from the start of the match to the designated location. Thus, the XPointer
ROOT()STRING(3,"Thomas Pynchon",7)selects the letter
P (seven from the start of the string) in the third occurrence of the string "Thomas Pynchon".
ID(a27)STRING(5,'!',1)selects the character immediately following the fifth exclamation mark.
For purposes of string matching, the "text of the element" means all the character data in the element(s) in the current location source and descendant elements, all markup characters being ignored in the pattern matching. Thus in the example above, the string "Thomas Pynchon" would match and designate a reference in
<authname><first>Thomas</first> <family>Pynchon</family> </authname>The pattern matching is exact and character-for-character. No case, space, or combining-character normalization of any kind is to be performed. Thus, there would be no match to "Thomas Pynchon" in the following:
<example>thomas pynchon,<auth><first>Thomas</first> <family>Pynchon</family></auth>,ThomasPynchon</example>
In these cases, the Extended Link Group element may be used to store a list of links to other documents that together constitute an interlinked document group. Each such document is identified using the HREF attribute of an Extended Link Document element, which is a child element of the GROUP. The value of the HREF attribute is a locator, with the same interpretation as described above.
[20] StringTerm ::= 'STRING(' Instance ',' SkipLit ',' Offset ')' |
[21] Offset ::= Digit+ |
These elements, just as with EXTENDED, SIMPLE, or LOCATOR elements, are recognized by the use of the XML-LINK attribute with the value GROUP or DOCUMENT.
Example 6contains sample declarations for the GROUP and DOCUMENT elements.
<!ELEMENT GROUP (DOCUMENT*)>
<!ATTLIST GROUP
XML-LINK CDATA #FIXED "GROUP"
STEPS CDATA #IMPLIED
>
<!ELEMENT DOCUMENT EMPTY>
<!ATTLIST DOCUMENT
XML-LINK CDATA #FIXED "DOCUMENT"
HREF CDATA #REQUIRED
>
The STEPS attribute may be used by an author to help deal with the situation where an Extended Link Group directs a processor to another document, which proves to contain an Extended Link Group of its own. Clearly, there is a potential here for infinite regress, and yet there are situations where processing several levels of Extended Link Groups is useful. The STEPS attribute should have a numeric value that serves as a hint from the author to any link processor as to how many steps of Extended Link Group processing should be undertaken. It does not have any normative effect.
For example, should a group of documents be organized with a single "hub" document containing all the out-of-line links, it might well make sense for each non-hub document to have an Extended Link Group containing only one reference to the hub document. In this case, the best value for STEPS would be 2.
Steve J. DeRose
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.