XInclude Processing in XSLT

March 28, 2007

Assembling various parts of a document before processing the assembled document is a recurring theme in document processing. XML Inclusions (XInclude) is the W3C standard created to support this scenario, but since it is a standalone specification, it needs to be supported by a piece of software implementing this functionality. The XInclude Processor (XIPr), written in XSLT 2.0, implements XInclude and thus may help to reduce the dependency on numerous software packages if XInclude is used in an environment where XSLT 2.0 is used anyway. XIPr is implemented as a single XSLT 2.0 stylesheet. It can be used standalone in a publishing pipeline or as an imported module in some other XSLT code for integrated XInclude processing.

Compound Documents in XML

XML DTDs introduced the concept of entities, which could be used for assembling distributed physical structures of an XML document into one logical XML document. The XML processor has the task of assembling the various entities. Entities, however, were never very popular in the XML community (except among the SGML traditionalists) and thus were completely removed in XML Schema. As a replacement, the W3C came up with XML Inclusions (XInclude) [1], which is defined as a process of merging XML Infosets [2]. An increasing number of XML processors supports XInclude, but it is important to realize that XInclude is a separate step of an XML processing pipeline, not an integral part of XML parsing or transformation.

The new 2.0 version of XSLT [3] finally allows you to implement XInclude in XSLT (some XSLT processors, such as libxslt [4], support XInclude, but that is not a mandatory part of an XSLT processor), something which could not be done in the 1.0 version of the language, because in that version it was impossible to access plain-text files [5]. The XInclude Processor (XIPr) [6] is an implementation of XInclude in XSLT 2.0; it has been created as part of an XSLT-only tool that requires an inclusion facility (this tool is the XSLidy [7] presentation package, which uses XSLT to generate a set of Slidy presentations out of an XML document). Instead of defining and implementing a proprietary solution, XSLidy is now based on XIPr, which is available as a standalone XSLT stylesheet.

XIPr provides XInclude processing in XSLT-only environments, which can be useful if XSLT is already a required component of a processing environment where the goal is to minimize the number of required technologies required to support an application.

XInclude Processing Model

XInclude's processing model is pretty straightforward: it takes as input an Infoset (which includes XInclude elements) and produces as result an Infoset where all XInclude elements have been processed (i.e., expanded). This maps well to XSLT's model of transforming trees, so in the XSLT implementation of XInclude, the XInclude process starts with the tree of some input document and transforms it into a tree where all XInclude elements have been processed. XSLT's templates provide excellent support for this kind of processing, so the XSLT implementation essentially is an identity transform that contains templates for processing XInclude elements.

Since XInclude is defined on the Infoset, it needs to address all information items defined by the Infoset. This includes unparsed entities and notations, which are also handled by XInclude. However, since XIPr is implemented in XSLT and thus based on XSLT's stripped down data model (which, in terms of node kinds, is a subset of the Infoset), the implementation does not have to deal with unparsed entities and notations, which are stripped from the input document before processing begins (actually, unparsed entities are available from the input tree, but XSLT does not provide any facilities to produce unparsed entities in the result tree).

Resource Types

XInclude handles two types of resources—XML and plain-text documents. Each of these resources can be included in a document being processed. XML documents are included as a new fragment of the result tree, whereas plain-text documents are included as a text node. The type of the resource to be included is indicated using a parse attribute on an XInclude element, and permitted values are xml and text. This is the area where XSLT 1.0 is not able to support XInclude, because XSLT 1.0 is only able to access XML documents. XSLT 2.0 adds the unparsed-text() function, which provides access to plain-text files out of an XSLT stylesheet.

XInclude not only supports the inclusion of XML and plain-text documents, it also supports the inclusion of fragments of XML documents. This is very useful when assembling documents from parts of other documents.

Fragment Identifiers

While the identification of fragments in XML documents is a useful facility (and must be supported by every XInclude implementation), the history and current status of the language for doing this is less than perfect. The XML Pointer Language (XPointer) [8] was created in a effort to create a hypertext-friendly environment of XML technologies, using the XML Linking Language (XLink) for an XML-based hyperlink notation and XPointer as the counterpart for addressing fragments within XML documents. XPointer's goal was to identify arbitrary ranges within XML documents (basically, everything users might mark with a mouse selection). This turned out to be a hard problem. Finally, the XPointer language was split into multiple parts and the basic functionality was finalized; the more advanced range locations were never finished.

The XPointer framework itself specifies shorthand pointers, which are equivalent to HTML's fragment identifiers. They consist of a single name after the # separating the resource name from the fragment identifier, and they are resolved to the element with the ID with that name (the IDness of an attribute can be inferred from a DTD, an XML Schema, or some other source of information—for example, if it is specified in an xml:id attribute).

http://www.w3.org/TR/2006/REC-xinclude-20061115/REC-xinclude-20061115.xml#xml-included-items

A more advanced form of fragment identification is specified in the XPointer element() scheme [9] and must be supported by an XInclude implementation. First of all, the shorthand notation of the XPointer framework is also supported, but with a different syntax:

http://www.w3.org/TR/2006/REC-xinclude-20061115/REC-xinclude-20061115.xml#element(xml-included-items)

The interesting concept of the element scheme, though, is that of child sequences. They allow the identification of elements that have not been assigned an explicit ID, by navigating to them as a path expression (similar to XPath, but more limited and with a different syntax) of child steps. For example, the named fragment used in the previous examples could also be identified by the following XPointer:

http://www.w3.org/TR/2006/REC-xinclude-20061115/REC-xinclude-20061115.xml#element(/1/2/4/6)

This XPointer child sequence is equivalent to the XPath /*[1]/*[2]/*[4]/*[6] and identifies the sixth child (the xml-included-items div2) of the fourth child (the div1) of the second child (the body) of the second child (the spec) of the root node. The advantage of child sequences is that they work for elements that do not have an ID. The obvious disadvantage is that they are rather brittle and break easily when the document is modified. For somewhat improved stability of XPointers using child sequences, the ID mechanism and the child sequences can be combined, resulting in XPointers like this one:

http://www.w3.org/TR/2006/REC-xinclude-20061115/REC-xinclude-20061115.xml#element(xml-included-items/1)

In this case, the XPointer selects the fragment, which is located by first finding an element with the specified ID and then navigating the child sequence relative to that element (in this example, this fragment consists of the head of the identified part of the XML document).

Using XInclude

While a longer article about XInclude has already looked at how to use XInclude [2], the following examples briefly illustrate its main usages:

<xi:include href="example.xml"/>	Include the XML document `example.xml` from the same location as the source document.
<xi:include href="example.xml" xpointer="element(id754)"/>	Include the `element(id754)` fragment (i.e., the element with the ID `id754`) of the XML document `example.xml` from the same location as the source document.
<xi:include href="example.txt" parse="text" [ encoding="US-ASCII" ] />	Include the text document `example.txt` from the same location as the source document; if necessary, the character encoding of the text document can be explicitly specified.
<xi:include href="example.xml"> <xi:fallback>could not include "example.xml"</xi:fallback> </xi:include>	Include the XML document `example.xml` from the same location as the source document. If the document cannot be included, use the content of the `fallback` element instead. (Without `fallback`, failure to include a resource results in a fatal error.)

Using XIPr

XIPr is implemented as a single standalone XSLT 2.0 stylesheet. It can be used as a standalone XSLT implementation of XInclude, or it can be integrated into XSLT code for integrated XInclude processing. XIPr contains a template that by default initiates XInclude processing starting at the document element of the input document:

<xsl:template match="/*">

  <xsl:apply-templates select="." mode="xipr"/>

</xsl:template>

When using XIPr within other stylesheets, it should be imported so that XSLT's conflict resolution assigns a lower import precedence to XIPr's templates. XInclude processing can then be initiated at any node at any time, by following the pattern of XIPr's built-in template:

<xsl:template match="/*">

  <!-- do something ... -->

  <xsl:apply-templates select="$xinclude-candidates" mode="xipr"/>

  <!-- do something else ... -->

</xsl:template>

XIPr produces messages using XSLT's message instruction, which produces messages on the console or some similar output device (not in the result document, though). When encountering fatal errors (as defined by the XInclude specification), XIPr terminates processing using message as well.

XIPr depends on the stylesheet processor's ability to access and retrieve documents, so if documents other than files in the filesystem have to be accessed, it is important to check that the XSLT processor of choice supports the required URI schemes.

The important thing to notice is that XIPr processing has to be initiated using the xipr mode so that the templates in the XIPr stylesheet are used for processing the selected nodes.