Elliotte Rusty Harold is the coauthor of XML in a Nutshell, 2nd edition
It's often convenient to divide long documents into multiple files. The classic example is a book, which is customarily divided in chapters. Each chapter may be further subdivided into sections. Traditionally one has used XML external entity references to support document division. For example, this book has three chapters, each stored in a separate file:
<?xml version="1.0"?> <!DOCTYPE book SYSTEM "book.dtd" [ <!ENTITY chapter1 SYSTEM "malapropisms.xml"> <!ENTITY chapter2 SYSTEM "mispronunciations.xml"> <!ENTITY chapter3 SYSTEM "madeupwords.xml"> ]> <book> <title>The Wit and Wisdom of George W. Bush</title> &chapter1; &chapter2; &chapter3; </book>
However, external entity references have a number of limitations. Among them:
The individual component files cannot be used independently of the master document. They are not themselves complete, well-formed XML documents. For instance, they cannot have XML declarations or document type declarations and often do not have a single root element.
If any of the pieces are missing, then the entire document is malformed. There's no option for error recovery.
An entity reference cannot point to a plain text file such as an example Java program or HTML document. Only well-formed XML can be included.
XInclude is an emerging W3C specification for building large XML documents out of multiple well-formed XML documents, independently of validation. Each piece can be a complete XML document, a fragmentary XML document, or a non-XML text document like a Java program or an e-mail message.
XInclude reference external documents to be included with
include elements in the
http://www.w3.org/2001/XInclude namespace. The prefix
xi is customary though not required. Each
element has an
href attribute that contains a URL pointing to the
file to include. For example, the previous book example can be rewritten like
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <title>The Wit and Wisdom of George W. Bush</title> <xi:include href="malapropisms.xml"/> <xi:include href="mispronunciations.xml"/> <xi:include href="madeupwords.xml"/> </book>
Of course you can also use absolute URLs where appropriate:
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <title>The Wit and Wisdom of George W. Bush</title> <xi:include href="http://www.whitehouse.gov/malapropisms.xml"/> <xi:include href="http://www.whitehouse.gov/mispronunciations.xml"/> <xi:include href="http://www.whitehouse.gov/madeupwords.xml"/> </book>
XInclude processing is recursive. That is, an included document can itself include another document. For example, a book might be divided into front matter, back matter, and several parts:
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="frontmatter.xml"/> <xi:include href="part1.xml"/> <xi:include href="part2.xml"/> <xi:include href="part3.xml"/> <xi:include href="backmatter.xml"/> </book>
Each part might be further divided into a part intro and several chapters:
<?xml version="1.0"?> <part xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="intro1.xml"/> <xi:include href="ch01.xml"/> <xi:include href="ch02.xml"/> <xi:include href="ch03.xml"/> <xi:include href="ch04.xml"/> </part>
There's no limit to how deep this can go. Only circular inclusion (Document A includes Document B which includes, directly or indirectly, Document A) is forbidden. When an XInclude processor reads an XML document it resolves all references and returns a document that contains no XInclude elements.
Technical articles like this one often need to include example code:
programs, XML and HTML documents, e-mail messages, and so on. Within these
examples characters like < and & should be treated as raw text rather
than parsed as markup. To include a document as plain text, you have to add a
parse="text" attribute to the
element. For example, this fragment loads the source code for the Java program
SpellChecker.java from the examples directory into a
<code> <xi:include parse="text" href="examples/SpellChecker.java" /> </code>
Processes that are downstream from the XInclusion will see the complete
text of the file SpellChecker.java like they would any other text. For
instance, such data would be passed to a SAX code>ContentHandler
characters() method. This is pretty much exactly the way
a parser would treat the content if it were typed in a CDATA section.
Servers crash. Network connections fail. The domain name system gets
congested. For these reasons and many others, documents included from remote
servers may be temporarily unavailable. The default action for an XInclude
processor in such a case is simply to give up and report a fatal
error. However, the
xi:include element may contain an
xi:fallback element which contains alternate content to be used
if the requested resource cannot be found. For example, this
xi:include element tries to load the file at
http://www.whitehouse.gov/malapropisms.xml. However, if somebody
deletes that file, then it provides some literal content instead:
<xi:include href="http://www.whitehouse.gov/malapropisms.xml"> <xi:fallback> <para> This administration is doing everything we can to end the stalemate in an efficient way. We're making the right decisions to bring the solution to an end. </para> </xi:fallback> </xi:include>
xi:fallback element can even include another
xi:include element. For example, this
element begins by attempting to include the document at
http://www.whitehouse.gov/malapropisms.xml. However, if somebody
deletes that file, then it will try
<xi:include href="http://www.whitehouse.gov/malapropisms.xml"> <xi:fallback> <xi:include href="http://politics.slate.msn.com/default.aspx?id=76886l" /> </xi:fallback> </xi:include>
xi:fallback element is not used if the document can be
located but is malformed. That is always a fatal error.
XInclusion is not part of XML 1.0 or the XML Infoset. XML parsers do not
perform inclusions automatically. To resolve XIncludes, a document must be
passed through an XInclude processor that replaces the
elements with the documents they point to. This may be done automatically by a
server side process or it might be done on the client side by an
XInclude-aware browser. It may be hooked into a custom SAX program using a SAX
filter that resolves the XIncludes. However, if you want this to happen, you
need to ask for it and install the necessary software to make it possible.
One of the most common questions about XInclude is how inclusion interacts with validation, XSL transformation, and other processes that may be applied to an XML document. The short answer is that it doesn't. XInclusion is not part of any other XML process. It is a separate step which you may or may not perform when and where it is useful to you.
For example, consider validation against a schema. A document can be
validated before inclusion, after inclusion, or both. If you validate the
document before the
xi:include elements are replaced, then the
schema has to declare the
xi:include elements just like it would
declare any other element. If you validate the document after the
xi:include elements are replaced, then the schema has to declare
the replacement elements. Inclusion and validation are separate, orthogonal
processes that can be performed in any order which is convenient in the local
Current support for XInclude is limited, though that is slowly changing. In particular,
Libxml, the XML C library for Gnome, <http://xmlsoft.org/> includes fairly complete support for XInclude.
The Apache Cocoon application server <http://xml.apache.org/cocoon/index.html> can resolve XIncludes in a document before sending it to a client. Processing instructions in the document's prolog control the exact operations performed and the order they're applied in.
The 4Suite XML library for Python <http://4suite.org/> has an option to resolve XIncludes when parsing.
GNU JAXP <http://www.gnu.org/software/classpathx/jaxp/> includes a SAX filter that resolves XIncludes, provided no XPointers are used.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.