Overcoming Objections to XML-based Authoring Systems

March 21, 2001

Brian Buehling

During a recent development effort, one of our clients was alarmed at the conversion costs of the proposed XML-based content management system compared to the existing MS Word-based process. This was just one instance of an alarming trend of balking at XML-based systems in favor of using public web folders, indexed by some full-text search engine, as part of a local intranet. In the short run, these edit, drop, and index solutions have some appealing features, including low development and conversion costs. But they are short-lived systems that either wither from lack of functionality or rapidly outgrow their design.

Fortunately, the initial objections to the cost of building an XML-based content repository have become fairly predictable. In most cases they are based on misconceptions about XML or on an overly optimistic view of alternative approaches.

Even though implementing an XML-based content management system is not always the best approach for an organization, any architectural decision should be made only after thoroughly overcoming the common misconceptions of the technology involved. The list of questions below is intended to be a guide for IT professionals to discuss intelligently the pros and cons of developing an XML document repository.

Why does my small group of authors need an XML solution?

Although it is true that the value of an XML-based content management system increases with the number of authors and document complexity, even a small authoring group can benefit from an XML solution. The core benefits involve document standardization, profiling, and growth. Authoring in XML provides a natural structure to ensure documents adhere to corporate guidelines through the use of DTDs or schemas.

By separating the style of a document from its content and structure, XML repositories can distribute different views of documents to different audiences, all from a single source document. In MS Word or HTML systems, a separate version of the document is needed for each view. Lastly, XML authoring systems provides a solid foundation for future growth as they are platform independent and can be upgraded with relative ease.

Isn't XML used for business-to-business exchanges as a replacement for EDI?

One of the main reasons for the surge of XML-based technology in the marketplace is its application in business-to-business supply chain management. As companies continually look to streamline electronic commerce solutions, XML has emerged as a perfect mechanism to handle the requirements of exchanging product and transaction information. One unexpected side effect of the use of XML for supply chain management is that new users of the technology are led to believe that XML is suited only for this application.

However, XML evolved from SGML, which was designed to manage large volumes of textual information. For more than fifteen years, SGML has been used in publishing, telecommunications, and manufacturing companies to solve the same content management problems that XML addresses today. Even though XML does not support all of SGML's functionality, it enjoys wider acceptance, which positions it well to solve most of today's content management problems.

Why do my authors have to learn a programming language to create documents?

Authoring effectively in XML requires a sound understanding of the content as well as the structure of documents. Since authors no longer have to worry about document styling, they can concentrate on the core content and structure of their work. Getting started with XML is becoming easier since applications that use textual markup are now commonplace (HTML, etc.). Adapting a controlled set of styles is a much less daunting task than learning a programming language.

Why are XML consultants so expensive?

The two main contributors to the cost associated with using XML consultants are specialized knowledge and risk. Often an XML conversion project requirements some initial design work that, if done correctly, will not have to be modified very frequently. Companies should not be creating DTDs, designing XML authoring platforms, or configuring search engines more than a few times a year.

Consequently, it does not make sense for most organizations to include those skills in its full time staff. Additionally, any architectural mistake committed during the design stage of these projects could result in very expensive rework down the road well after implementation. For these reasons, despite the additional cost, it is often wise to utilize outside expertise during the critical stages of an XML project.

Why don't we just use MS Word?

For all of the benefits of MS Word and for all it has done in the world of word processing and office automation, it could be the single biggest obstacle to wide commercial acceptance of XML authoring solutions. Microsoft markets its style templates and HTML conversion features as the only packages that customers need for enterprise authoring. Additionally, with MS Word available on virtually every corporate desktop, you can quickly and inexpensively start authoring documentation with it. But the fact of the matter is that Word does not support XML directly, and it cannot be easily integrated into a structured content repository. However, add-ins to Word to support XML export are starting to appear on the market.

Why don't we author directly in HTML?

Authoring directly in HTML may be quickest way to get new documentation onto a web site. However, in addition to many of the same consistency problems that exist when authoring in MS Word, there are some additional limitations as well. First, creating a repository of HTML files optimized for web viewing will inevitably create problems for printed output. HTML does not support many standard page layout, font, and line formatting features necessary for a production print environment. In short, if an organization chooses to create a repository of HTML documents, it will either have to greatly scale back its print capabilities or utilize only a subset of HTML features that will look appropriate when printed. Or they will be forced to create two versions of HTML, one for online viewing and one for printing.

Even if an organization does not need to support document printing, creating a corporate knowledge base in HTML has other limitations. Authors will not be able to identify document components explicitly for specialized searching or formatting. Additionally, there is no way to enforce business rules for creating documentation by verifying HTML against an approved document type definition.

Why do we need XML if we are already using a relational database to track documents?

Using a relational database to store large volumes of textual content along with its metadata provides a solid infrastructure to build the security, version control, and workflow components needed in a content management system. However, this technique alone does not ensure that the structure of distributed documentation will be preserved.

In conclusion, with the content management marketplace becoming competitive, the costs of XML-based authoring and repository systems are going to be questioned more than ever. This scrutiny will not be based on the uncertainty of the benefits of these systems, as in the past, but, rather, on the growing number of low cost substitute systems touting comparable features. Only after understanding the clouded atmosphere surrounding XML-based systems, and anticipating the common misconceptions about the technology, can one justify such a system in a business setting.