The Making of the DocBook DTD
October 20, 1999
The DocBook DTD is a mature SGML DTD for computer documentation, which makes it very useful. The story of its development is also valuable, as it shows what is possible when a group of dedicated people get together to solve a common problem, a thread common to Open Source projects. The development of the DocBook DTD is also a success story that has additional interest to the XML development community; many of those who developed DocBook have gone on to become major contributors to the development of XML.
The volunteers who produced the DocBook DTD first came together as The Davenport Group around 1990. I organized the Davenport Group initially, along with several members of a documentation team from HaL Computer Systems. Davenport meetings were informal sessions intended to explore how software manuals could be exchanged between companies more easily. Up until then, these companies, most of whom were UNIX vendors, were shipping printed manuals with computer workstations. CD-ROM was beginning to emerge and companies such as Sun Microsystems, HP and SGI were investing heavily in online documentation systems.
As an editor at O'Reilly, I saw how several of the UNIX vendors shipped our books in hard copy and were beginning to ask to receive the books electronically. O'Reilly produced most of its books using troff, a batch typesetting program that was a standard part of UNIX as developed at Bell Labs. We used a custom macro package to mark up documentation and the markup was both structural and presentational. We could send our books as a set of files to a vendor but this would not work unless they were going to use the same program we did, and try to produce the exact same result in print. SGML existed, and while there was some resistance to basing a solution on SGML, there really was no other alternative. However, deciding to use SGML was perhaps the easiest part, just as it is today in deciding to use XML. The hard part is creating a common DTD that represents a consensus of interested parties.
Table 1. Sample DocBook Projects
|Hewlett Packard||A recent release of the HP-UX docset has ~ 90 MBytes of DocBook and ~250 titles (some books very short).|
|ROX Software, Inc.||All documentation is DocBook|
|The XML FAQ and Dublin University's Research Bibliography: Peter Flynn||The XML FAQ and our university's Research Bibliography uses DocBook, served out as HTML.|
|The Swarm Project||Four books around 650 pages|
|UUNET||At least 16 DocBook documents: a small but growing collection.|
|Largest German ISP||Nine books and a set of other papers, in excess of 360,000 lines of source|
|Red Hat||All of the Red Hat manuals are in DocBook|
|KDE desktop project||Recently converted to DocBook, but don't know to what extent.|
|The GNOME desktop project||All of the GNOME documentation|
|The Casbah Project||API and Reference docs and about 40 separate docs that total about 800k of HTML.|
|Bioanalytical Systems, Inc.||Engineering designs and using XML to auto document source code. Some docs are still tex, but there are hundreds of pages of DocBook.|
|onShore, Inc.||Technical analysis and most operations documentation. Currently probably around 500 pages|
In 1991, the first version of DocBook was developed. Just to be clear, I had no part in its design and development. The design team consisted of people who studied SGML closely and viewed DTD development as a craft. They met regularly to review their work as part of the Davenport Group, eventually creating a more formal organization in 1994. Sometime last year, the Davenport Group merged into OASIS, an SGML-XML industry consortium that provides support for its activities.
DocBook's original design goal was to enable the interchange of computer documentation. It was not primarily an authoring DTD, although some organizations have used DocBook or a variant for authoring. By focusing on interchange, the DTD tried to describe a variety of content models, such as manpages. Within each model or type of documentation, there were numerous variations. (You'd think that manpages were a fairly conventional format but there were lots of "irreconcilable" differences.) DocBook, for the most part, remains what I like to call a descriptive DTD; it reflects differences found in the way different organizations created their documentation. DocBook would be a simpler DTD, but perhaps less useful, if it were more prescriptive, by proposing a unified content model. But the latter would require organizations to change how they prepared their documentation and reaching unanimous agreement on a unified model did not seem realistic.
DocBook had its moments of political conflict. One of the early drivers of interest in data interchange for documentation was the Open Software Foundation, an industrial consortium whose mission was to develop a standard UNIX operating system that could be supported by members such as IBM, Digital, and HP. Members not only contributed software to the OSF; they also contributed specifications and documentation. Getting those documents into a standard format was an important initiative for the OSF and its members but the OSF lobbied for a rigid, prescriptive model that alienated many of its members, who ended up working quite well together in the Davenport Group to create DocBook independently of the OSF. (The OSF, many of you will remember, eventually ran out of steam and achieved only a modest return on a huge investment by its members. Linus Torvalds ended up accomplishing a lot more on his own.)
Today, DocBook enjoys wide usage, according to Norm Walsh. Among the companies using DocBook are Sun Microsystems for their Solaris documentation and Novell for their Netware documentation. RedHat Linux uses DocBook as well as FreeBSD. For other projects, see the Table 1, "Sample DocBook Projects." O'Reilly does produce some of its books using DocBook and its CD-ROM library is generated from DocBook sources. However, in what always seemed ironic to me, the tools group at O'Reilly ended up processing DocBook source into troff to produce a printed book. SGML tools never quite seemed to do the job we wanted them to do.
In many ways, the Web sidetracked the Davenport Group and efforts like DocBook. I know I jumped off the SGML train when I saw what I could do with HTML and URLs. Others scuttled complex, expensive and proprietary online publishing systems for simpler, Web-based solutions. In the short term, customers were happy with Web-based documentation and company needed to deliver it. The whole SGML community, after disparaging HTML for a while, finally caught on to what a tremendous business opportunity it was for them. While HTML did provide important immediate benefits, many realized that there were significant issues in how this information was to be managed in the long-term. HTML was not an ideal way to manage hundreds of pages of documents. We needed more of what SGML provided -- first of all, an extensible tagset. But some thought we didn't need all of what made SGML hard to work with. Following this thread led to XML, a mid-point between HTML and SGML led to XML.
In retrospect, DocBook served as a training ground for many who have gone on to leadership roles in XML development. Jon Bosak represented Novell originally in the Davenport Group before he moved to Sun, and from there he has masterfully organized and continues to lead the XML Working Group at the W3C. Dave Hollander who represented HP for many years in the Davenport Group is now at CommerceNet. He was co-editor with Tim Bray of the XML Namespaces specification and he is currently co-chair of the XML Schema Group. The other co-chair of the the Schema group is Murray Maloney who represented SCO in the Davenport Group. Eve Maler, who was at Digital before moving on to Arbortext, was one of the major architects of the DocBook DTD. Among many contributions to XML is her work as co-editor of the first XLink specification. She has been involved Conleth O'Connell, who was one of the members of the HaL team to found the Davenport Group and a member of the original DocBook design team, went to work at Vignette where he has been one of the lead developers of the ICE syndication specification. All of the above were members of the original XML Working Group that produced XML 1.0.
There are, of course, many other contributors to DocBook whom I haven't mentioned, several of whom worked at O'Reilly. Norm Walsh, who worked in the O'Reilly tools group before moving on to Arbortext, has been maintaining the DocBook distribution and its mailing list while writing a book on DocBook with Lenny Muellner of O'Reilly. Norm is also involved in the XSL Working Group. Terry Allen worked for me at O'Reilly for several years before going to what is now CommerceOne. An Arabic scholar in his spare time, Terry made major contributions to DocBook as a member of the design team. He is also heading up efforts through OASIS to organize an XML DTD repository to be hosted on XML.org.
The DocBook DTD and the DocBook development project deserve some consideration as a hybrid Open Source software project. While a DTD is not a traditional software component, it is at once a specification for software and a means to test that documents conform to rigorous standard for interchange. It makes possible for multiple uses a valuable body of information that is essential for software is to be understood by developers. It is fitting then that DocBook is becoming a backbone for Open Source documentation projects such as Linux.
|For more information on Docbook, you can visit the DocBook DTD at OASIS as well as Norm Walsh's DocBook.org site. In addition, there is an XML version of the DocBook DTD, and Norm wrote an article for XML.com on what kind of changes were necessary in "Converting an SGML DTD to XML." His current article on XML.com is "Customizing the DocBook DTD."|