XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Mastering DocBook Indexes

July 14, 2004

These days DocBook is considered to be a standard documentation format. Good documentation should be accompanied by a good index. This article will show you how to create professional indexes in DocBook and how to deal with indexes in languages other than English.

DocBook succeeded because it is supported by plenty of tools, many of them free. XML editors matured over the years and now they offer comfortable editing environments. There are even free WYSIWYG editors available like XMLmind XML Editor. Freely available DocBook XSL stylesheets allow conversion from DocBook to a variety of output formats including HTML, XHTML, print (PDF, PostScript), HTML Help, and JavaHelp. Moreover, output from the stylesheets can be easily customized by a lot of parameters.

Usability of a document, especially a printed document, can be boosted by a good index. Creating an index is a very laborious task often performed by specialists. Unfortunately in the area of open-source projects, samizdat publications, or books targeted to small language markets, you often have to cope with very limited resources, both financial and personal. For that reason I am going to show you how to create and process an index in a DocBook document yourself. In the remainder of this article you will see how to generate indexes for non-English languages, how to put several indexes into an individual document, and finally how to turn semantic markup into index entries quite easily.

Marking up Index Entries

The most difficult part of creating an index must be done manually and consists of marking up index entries in a document. In DocBook this is done by placing the indexterm elements wherever you write about the given topic. The content of the indexterm is not displayed as a part of a document flow; it is used later when building the index.


<para>Wealth of a modern societies is built upon information
<indexterm><primary>information</primary></indexterm>.</para>
                                                            

The indexterm element can also hold multilevel entries:


<indexterm>
<primary>information</primary>
</indexterm>
...
<indexterm>
<primary>information</primary>
<secondary>retrieval</secondary>
</indexterm>
...
<indexterm>
<primary>information</primary>
<secondary>dissemination</secondary>
</indexterm>
...
<indexterm>
<primary>information</primary>
<secondary>dissemination</secondary>
<tertiary>oral</tertiary>
</indexterm>

Such index terms will result in the following index output (the page numbers are, of course, for illustration only):

 information, 13
  dissemination, 17
    oral, 25
  retrieval, 15

If there is a large document chunk corresponding to a certain topic, we can use a special mode of indexterm to assign a range in a document to the topic. In this scenario, two indexterms elements are used for marking the start and the end of the range. A unique identifier is used to set up the relation between these two elements.


<indexterm class="startofrange" id="ix.xml.history">
<primary>XML</primary>
<secondary>history</secondary>
</indexterm>
  ... other DocBook markup and text describing history of XML ...
<indexterm class="endofrange" startref="ix.xml.history"/>

In the resulting index we will see something like this:


            XML
  history, 27–42

If an entry should be sorted in a different way than it is displayed, then we can use the sortas attribute. During index grouping and sorting, the text of the entry is ignored and the sortas attribute is then used instead. This can be useful in situations when an index entry contains special symbols that should sort differently; for example, based on their phonetic representation. The following example creates an index entry that will result in a Greek letter Ω displayed in the index, but this letter will be put in the place of word "Omega."


<indexterm>
<primary sortas="Omega">&Omega;</primary>
</indexterm>

If some occurrences of a term in the index should be emphasized (e.g., the number of a page with the term definition should be bold) then we can specify significance of each entry.


<indexterm significance="preferred">
<primary>information</primary>
</indexterm>

If an index term should not point to a particular page number, or an anchor in HTML output, but rather to a different term, we can utilize the see and seealso elements.


<indexterm>
<primary>DTD</primary>
</indexterm>

<indexterm>
<primary>document type definition</primary>
<see>DTD</see>
</indexterm>

<indexterm>
<primary>XML Schema</primary>
<seealso>DTD</seealso>
</indexterm>

Which results in:


            - D -
document type definition, see DTD
DTD, 42

- X -
XML Schema, 81, see also DTD

Up to this point we covered most DocBook capabilities in marking up index entries. I left out the zone attribute that can be used to place index entries outside the document flow. I personally do not consider this method to be useful for handmade indexes, but if you are interested you can read more about it in the documentation.

Pages: 1, 2, 3

Next Pagearrow