Mastering DocBook Indexes
These days DocBook is considered to be a standard documentation format. Good documentation should be accompanied by a good index. This article will show you how to create professional indexes in DocBook and how to deal with indexes in languages other than English.
DocBook succeeded because it is supported by plenty of tools, many of them free. XML editors matured over the years and now they offer comfortable editing environments. There are even free WYSIWYG editors available like XMLmind XML Editor. Freely available DocBook XSL stylesheets allow conversion from DocBook to a variety of output formats including HTML, XHTML, print (PDF, PostScript), HTML Help, and JavaHelp. Moreover, output from the stylesheets can be easily customized by a lot of parameters.
Usability of a document, especially a printed document, can be boosted by a good index. Creating an index is a very laborious task often performed by specialists. Unfortunately in the area of open-source projects, samizdat publications, or books targeted to small language markets, you often have to cope with very limited resources, both financial and personal. For that reason I am going to show you how to create and process an index in a DocBook document yourself. In the remainder of this article you will see how to generate indexes for non-English languages, how to put several indexes into an individual document, and finally how to turn semantic markup into index entries quite easily.
Marking up Index Entries
The most difficult part of creating an index must be done
manually and consists of marking up index entries in a
document. In DocBook this is done by placing the
indexterm elements
wherever you write about the given topic. The
content of the indexterm
is not displayed as a part of a document flow; it is used
later when building the index.
<para>Wealth of a modern societies is built upon information
<indexterm><primary>information</primary></indexterm>.</para>
The indexterm element can
also hold multilevel entries:
<indexterm>
<primary>information</primary>
</indexterm>
...
<indexterm>
<primary>information</primary>
<secondary>retrieval</secondary>
</indexterm>
...
<indexterm>
<primary>information</primary>
<secondary>dissemination</secondary>
</indexterm>
...
<indexterm>
<primary>information</primary>
<secondary>dissemination</secondary>
<tertiary>oral</tertiary>
</indexterm>
Such index terms will result in the following index output (the page numbers are, of course, for illustration only):
information, 13
dissemination, 17
oral, 25
retrieval, 15
If there is a large document chunk corresponding to a certain
topic, we can use a special mode of indexterm to assign a range in a
document to the topic. In this scenario, two indexterms elements are used for
marking the start and the end of the range. A unique
identifier is used to set up the relation between these two
elements.
<indexterm class="startofrange" id="ix.xml.history">
<primary>XML</primary>
<secondary>history</secondary>
</indexterm>
... other DocBook markup and text describing history of XML ...
<indexterm class="endofrange" startref="ix.xml.history"/>
In the resulting index we will see something like this:
XML
history, 27–42
If an entry should be sorted in a different way than it is
displayed, then we can use the sortas attribute. During index
grouping and sorting, the text of the entry is ignored and the
sortas attribute is then
used instead. This can be useful in situations when an index
entry contains special symbols that should sort differently;
for example, based on their phonetic representation. The
following example creates an index entry that will result in
a Greek letter Ω displayed in the index, but this
letter will be put in the place of word "Omega."
<indexterm>
<primary sortas="Omega">Ω</primary>
</indexterm>
If some occurrences of a term in the index should be emphasized (e.g., the number of a page with the term definition should be bold) then we can specify significance of each entry.
<indexterm significance="preferred">
<primary>information</primary>
</indexterm>
If an index term should not point to a particular page number,
or an anchor in HTML output, but rather to a different term,
we can utilize the see and
seealso elements.
<indexterm>
<primary>DTD</primary>
</indexterm>
<indexterm>
<primary>document type definition</primary>
<see>DTD</see>
</indexterm>
<indexterm>
<primary>XML Schema</primary>
<seealso>DTD</seealso>
</indexterm>
Which results in:
- D -
document type definition, see DTD
DTD, 42
- X -
XML Schema, 81, see also DTD
Up to this point we covered most DocBook capabilities
in marking up index entries. I left out the zone attribute that can be used to
place index entries outside the document flow. I personally
do not consider this method to be useful for handmade
indexes, but if you are interested you can read more about it
in the documentation.