July 31, 2002
Elliotte Rusty Harold is the coauthor of XML in a Nutshell, 2nd edition
It's often convenient to divide long documents into multiple files. The classic example is a book, which is customarily divided in chapters. Each chapter may be further subdivided into sections. Traditionally one has used XML external entity references to support document division. For example, this book has three chapters, each stored in a separate file:
<?xml version="1.0"?> <!DOCTYPE book SYSTEM "book.dtd" [ <!ENTITY chapter1 SYSTEM "malapropisms.xml"> <!ENTITY chapter2 SYSTEM "mispronunciations.xml"> <!ENTITY chapter3 SYSTEM "madeupwords.xml"> ]> <book> <title>The Wit and Wisdom of George W. Bush</title> &chapter1; &chapter2; &chapter3; </book>
However, external entity references have a number of limitations. Among them:
The individual component files cannot be used independently of the master document. They are not themselves complete, well-formed XML documents. For instance, they cannot have XML declarations or document type declarations and often do not have a single root element.
If any of the pieces are missing, then the entire document is malformed. There's no option for error recovery.
An entity reference cannot point to a plain text file such as an example Java program or HTML document. Only well-formed XML can be included.
XInclude is an emerging W3C specification for building large XML documents out of multiple well-formed XML documents, independently of validation. Each piece can be a complete XML document, a fragmentary XML document, or a non-XML text document like a Java program or an e-mail message.
XInclude reference external documents to be included with
include elements in
http://www.w3.org/2001/XInclude namespace. The prefix
customary though not required. Each
xi:include element has an
attribute that contains a URL pointing to the file to include. For example, the previous
book example can be rewritten like this:
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <title>The Wit and Wisdom of George W. Bush</title> <xi:include href="malapropisms.xml"/> <xi:include href="mispronunciations.xml"/> <xi:include href="madeupwords.xml"/> </book>
Of course you can also use absolute URLs where appropriate:
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <title>The Wit and Wisdom of George W. Bush</title> <xi:include href="http://www.whitehouse.gov/malapropisms.xml"/> <xi:include href="http://www.whitehouse.gov/mispronunciations.xml"/> <xi:include href="http://www.whitehouse.gov/madeupwords.xml"/> </book>
XInclude processing is recursive. That is, an included document can itself include another document. For example, a book might be divided into front matter, back matter, and several parts:
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="frontmatter.xml"/> <xi:include href="part1.xml"/> <xi:include href="part2.xml"/> <xi:include href="part3.xml"/> <xi:include href="backmatter.xml"/> </book>
Each part might be further divided into a part intro and several chapters:
<?xml version="1.0"?> <part xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="intro1.xml"/> <xi:include href="ch01.xml"/> <xi:include href="ch02.xml"/> <xi:include href="ch03.xml"/> <xi:include href="ch04.xml"/> </part>
There's no limit to how deep this can go. Only circular inclusion (Document A includes Document B which includes, directly or indirectly, Document A) is forbidden. When an XInclude processor reads an XML document it resolves all references and returns a document that contains no XInclude elements.
Technical articles like this one often need to include example code: programs, XML
documents, e-mail messages, and so on. Within these examples characters like < and
should be treated as raw text rather than parsed as markup. To include a document
text, you have to add a
parse="text" attribute to the
element. For example, this fragment loads the source code for the Java program
SpellChecker.java from the examples directory into a
<code> <xi:include parse="text" href="examples/SpellChecker.java" /> </code>
Processes that are downstream from the XInclusion will see the complete text of the
SpellChecker.java like they would any other text. For instance, such data would be
passed to a SAX
characters() method. This
is pretty much exactly the way a parser would treat the content if it were typed in
Servers crash. Network connections fail. The domain name system gets congested. For
reasons and many others, documents included from remote servers may be temporarily
unavailable. The default action for an XInclude processor in such a case is simply
up and report a fatal error. However, the
xi:include element may contain an
xi:fallback element which contains alternate content to be used if the
requested resource cannot be found. For example, this
xi:include element tries
to load the file at http://www.whitehouse.gov/malapropisms.xml. However, if
somebody deletes that file, then it provides some literal content instead:
<xi:include href="http://www.whitehouse.gov/malapropisms.xml"> <xi:fallback> <para> This administration is doing everything we can to end the stalemate in an efficient way. We're making the right decisions to bring the solution to an end. </para> </xi:fallback> </xi:include>
xi:fallback element can even include another
element. For example, this
xi:include element begins by attempting to include
the document at http://www.whitehouse.gov/malapropisms.xml. However, if somebody
deletes that file, then it will try
<xi:include href="http://www.whitehouse.gov/malapropisms.xml"> <xi:fallback> <xi:include href="http://politics.slate.msn.com/default.aspx?id=76886l" /> </xi:fallback> </xi:include>
xi:fallback element is not used if the document can be located but is
malformed. That is always a fatal error.
XInclusion is not part of XML 1.0 or the XML Infoset. XML parsers do not perform inclusions
automatically. To resolve XIncludes, a document must be passed through an XInclude
that replaces the
xi:include elements with the documents they point to. This
may be done automatically by a server side process or it might be done on the client
an XInclude-aware browser. It may be hooked into a custom SAX program using a SAX
that resolves the XIncludes. However, if you want this to happen, you need to ask
for it and
install the necessary software to make it possible.
One of the most common questions about XInclude is how inclusion interacts with validation, XSL transformation, and other processes that may be applied to an XML document. The short answer is that it doesn't. XInclusion is not part of any other XML process. It is a separate step which you may or may not perform when and where it is useful to you.
For example, consider validation against a schema. A document can be validated before
inclusion, after inclusion, or both. If you validate the document before the
xi:include elements are replaced, then the schema has to declare the
xi:include elements just like it would declare any other element. If you
validate the document after the
xi:include elements are replaced, then the
schema has to declare the replacement elements. Inclusion and validation are separate,
orthogonal processes that can be performed in any order which is convenient in the
Current support for XInclude is limited, though that is slowly changing. In particular,
Libxml, the XML C library for Gnome, <http://xmlsoft.org/> includes fairly complete support for XInclude.
The Apache Cocoon application server <http://xml.apache.org/cocoon/index.html> can resolve XIncludes in a document before sending it to a client. Processing instructions in the document's prolog control the exact operations performed and the order they're applied in.
The 4Suite XML library for Python <http://4suite.org/> has an option to resolve XIncludes when parsing.
GNU JAXP <http://www.gnu.org/software/classpathx/jaxp/> includes a SAX filter that resolves XIncludes, provided no XPointers are used.