XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


XInclude Processing in XSLT

March 28, 2007

Assembling various parts of a document before processing the assembled document is a recurring theme in document processing. XML Inclusions (XInclude) is the W3C standard created to support this scenario, but since it is a standalone specification, it needs to be supported by a piece of software implementing this functionality. The XInclude Processor (XIPr), written in XSLT 2.0, implements XInclude and thus may help to reduce the dependency on numerous software packages if XInclude is used in an environment where XSLT 2.0 is used anyway. XIPr is implemented as a single XSLT 2.0 stylesheet. It can be used standalone in a publishing pipeline or as an imported module in some other XSLT code for integrated XInclude processing.

Compound Documents in XML

XML DTDs introduced the concept of entities, which could be used for assembling distributed physical structures of an XML document into one logical XML document. The XML processor has the task of assembling the various entities. Entities, however, were never very popular in the XML community (except among the SGML traditionalists) and thus were completely removed in XML Schema. As a replacement, the W3C came up with XML Inclusions (XInclude) [1], which is defined as a process of merging XML Infosets [2]. An increasing number of XML processors supports XInclude, but it is important to realize that XInclude is a separate step of an XML processing pipeline, not an integral part of XML parsing or transformation.

The new 2.0 version of XSLT [3] finally allows you to implement XInclude in XSLT (some XSLT processors, such as libxslt [4], support XInclude, but that is not a mandatory part of an XSLT processor), something which could not be done in the 1.0 version of the language, because in that version it was impossible to access plain-text files [5]. The XInclude Processor (XIPr) [6] is an implementation of XInclude in XSLT 2.0; it has been created as part of an XSLT-only tool that requires an inclusion facility (this tool is the XSLidy [7] presentation package, which uses XSLT to generate a set of Slidy presentations out of an XML document). Instead of defining and implementing a proprietary solution, XSLidy is now based on XIPr, which is available as a standalone XSLT stylesheet.

XIPr provides XInclude processing in XSLT-only environments, which can be useful if XSLT is already a required component of a processing environment where the goal is to minimize the number of required technologies required to support an application.

XInclude Processing Model

XInclude's processing model is pretty straightforward: it takes as input an Infoset (which includes XInclude elements) and produces as result an Infoset where all XInclude elements have been processed (i.e., expanded). This maps well to XSLT's model of transforming trees, so in the XSLT implementation of XInclude, the XInclude process starts with the tree of some input document and transforms it into a tree where all XInclude elements have been processed. XSLT's templates provide excellent support for this kind of processing, so the XSLT implementation essentially is an identity transform that contains templates for processing XInclude elements.

Since XInclude is defined on the Infoset, it needs to address all information items defined by the Infoset. This includes unparsed entities and notations, which are also handled by XInclude. However, since XIPr is implemented in XSLT and thus based on XSLT's stripped down data model (which, in terms of node kinds, is a subset of the Infoset), the implementation does not have to deal with unparsed entities and notations, which are stripped from the input document before processing begins (actually, unparsed entities are available from the input tree, but XSLT does not provide any facilities to produce unparsed entities in the result tree).

Resource Types

XInclude handles two types of resources—XML and plain-text documents. Each of these resources can be included in a document being processed. XML documents are included as a new fragment of the result tree, whereas plain-text documents are included as a text node. The type of the resource to be included is indicated using a parse attribute on an XInclude element, and permitted values are xml and text. This is the area where XSLT 1.0 is not able to support XInclude, because XSLT 1.0 is only able to access XML documents. XSLT 2.0 adds the unparsed-text() function, which provides access to plain-text files out of an XSLT stylesheet.

XInclude not only supports the inclusion of XML and plain-text documents, it also supports the inclusion of fragments of XML documents. This is very useful when assembling documents from parts of other documents.

Fragment Identifiers

While the identification of fragments in XML documents is a useful facility (and must be supported by every XInclude implementation), the history and current status of the language for doing this is less than perfect. The XML Pointer Language (XPointer) [8] was created in a effort to create a hypertext-friendly environment of XML technologies, using the XML Linking Language (XLink) for an XML-based hyperlink notation and XPointer as the counterpart for addressing fragments within XML documents. XPointer's goal was to identify arbitrary ranges within XML documents (basically, everything users might mark with a mouse selection). This turned out to be a hard problem. Finally, the XPointer language was split into multiple parts and the basic functionality was finalized; the more advanced range locations were never finished.

The XPointer framework itself specifies shorthand pointers, which are equivalent to HTML's fragment identifiers. They consist of a single name after the # separating the resource name from the fragment identifier, and they are resolved to the element with the ID with that name (the IDness of an attribute can be inferred from a DTD, an XML Schema, or some other source of information—for example, if it is specified in an xml:id attribute).


A more advanced form of fragment identification is specified in the XPointer element() scheme [9] and must be supported by an XInclude implementation. First of all, the shorthand notation of the XPointer framework is also supported, but with a different syntax:


The interesting concept of the element scheme, though, is that of child sequences. They allow the identification of elements that have not been assigned an explicit ID, by navigating to them as a path expression (similar to XPath, but more limited and with a different syntax) of child steps. For example, the named fragment used in the previous examples could also be identified by the following XPointer:


Pages: 1, 2

Next Pagearrow