XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


OAXAL: Open Architecture for XML Authoring and Localization

February 21, 2007


XML is now acknowledged as the best format for authoring technical documentation. Its wide support, extensible nature, separation of form and content, and ability to publish in a wide variety of output formats such as PDF, HTML, and RTF make it a natural choice. In addition, the costs associated with implementing an XML publishing solution have decreased significantly. Nevertheless, there are some clear do's and don'ts when authoring in XML, some of which are detailed in Coping with Babel, a paper from the XML 2004 conference.

XML, thanks to its extensible nature and rigorous syntax, has also spawned many standards that allow the exchange of information between different systems and organizations, as well as new ways of organizing, transforming, and reusing existing assets. For publishing and translation, this has created a new way of using and exploiting existing documentation assets, known as Open Architecture for XML Authoring and Localization (OAXAL).

OAXAL takes advantage of the arrival of some core XML-related standards:

  • DITA (Darwin Information Typing Architecture from OASIS)

  • xml:tm (XML-based text memory from LISA OSCAR)

DITA is a very well thought-out way of introducing object-oriented concepts into document construction. It introduces the concepts of reuse and granularity into publishing within an XML vocabulary. It is having a big impact on the document publishing industry.

xml:tm is also a pivotal standard that provides a unified environment within which other localization standards can be meaningfully integrated, thus providing a complete environment for OAXAL. OAXAL allows system builders to create an elegant and integrated environment for document creation and localization.

Object 1
Figure 1. Object 1

  • W3C ITS is an XML vocabulary that defines the rules for translatability for a given XML document type. It states which elements have translatable text, which elements are "inline" and thus do not cause segment breaks, which form subflows, and which attributes are translatable.
  • Unicode TR29 is part of the main Unicode standard that defines word and sentence boundaries.
  • SRX (Segmentation Rules eXchange) is an XML vocabulary for defining segmentation rules for a given language.
  • GMX (Global Information Management Metrics Exchange) is a new LISA OSCAR standard for word and character counts, as well as a vocabulary for exchanging metrics information. It is part of a three-part standard that will also tackle complexity and quality.
  • XLIFF (XML Localization Interchange File Format) is an OASIS standard for exchanging localization data in an XML vocabulary.
  • TMX (Translation Memory eXchange) is a LSIA OSCAR standard for exchanging translation memories.


The OASIS DITA (Darwin Information Typing Architecture) Standard has certainly raised the profile of XML in the technical authoring field. It has done so for two reasons:

  • It is a very intelligent and well thought-out approach to the authoring and publication of any technical manual.

  • It substantially reduces the costs associated with introducing a component-based publishing system.

The original architects of DITA looked at the most effective way of writing technical documentation and came to the conclusion that a topic-based authoring approach was the best way of achieving this. Rather than writing a publication as a monolithic set of chapters, they found that deconstructing a publication into a set of topics was a far better way of creating technical documentation. It meant that authors could work independently on their own topics without impeding one another. It also meant that topics could be reused across many different publications. DITA is also an excellent way of authoring web-based and other types of electronic documents, such as Help topics, as it encourages greater granularity. The core messages of DITA are granularity and reuse.

Figure 2. Topics

The topic concept provides DITA with a built-in component model that can be readily understood and implemented: write once, translate once, reuse many times. There are many analogies to this approach in the automotive industry, where the same components are reused across many model ranges. This substantially reduces costs and increases the availability of spare parts. Having a DITA document for discrete operations, such as the removal of a cylinder head for a given engine type, means that this procedure can be reused across all of the publications relating to models that share that engine.

DITA also comes with a built-in concept of extensibility. The original DITA architects realized that topics can have different formats, and that not all topics are equal, so they worked out an extension mechanism that allows topics to be specialized. The standard DITA "topic" is available in the form of the three most common implementations:

  • Reference

  • Task

  • Concept

These can then, in turn, be specialized further, so "Task" can have "Maintenance Task" and "Repair Task," "Disassembly Task" and "Assembly Task" specializations using the simple DITA extension mechanism.

Object 2
Figure 3. Object 2

Pages: 1, 2, 3, 4, 5

Next Pagearrow