XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

OAXAL: Open Architecture for XML Authoring and Localization
by Andrzej Zydron | Pages: 1, 2, 3, 4, 5

The OAXAL document lifecycle for a DITA document looks like this:

  1. Check out document for authoring. Check if xml:tm namespace is present; if not, seed first instance.
  2. During authoring, the writer can make use of Author Memory from the leveraged Translation Memory database from within his authoring editor.
  3. Check in document from authoring. Compare the previous version of the xml:tm namespace with updated document and identify changes. Allocate new text unit identifiers as required.
  4. Check out document for translation. Check out previous source and target versions of the document and complete xml:tm-based matching. Next, do any database translation memory matching. Create output XLIFF file and DITA skeleton file for translated text.
  5. Check in document from translation. Populate skeleton file with translated text from the XLIFF file, thus creating the target version of the document. Load translated text into the leveraged memory database.

The xml:tm namespace can be made transparent to any editing, printing, or transformation tools by means of a very simple technique.

xml:tm and the Other Open Standards

xml:tm provides a pivotal role within OAXAL, allowing all the other related standards to interoperate within one elegant architecture.

xml:tm
Figure 9. xml:tm

  • xml:tm mandates the use of the W3C ITS document rules to define which elements contain translatable text, which elements are "within text" and so do not constitute a segment break, which "within text" elements are subflows (such as footnotes) that do not form part of the linguistic integrity of the surrounding text, and, finally, which attributes are translatable. W3C ITS document rules are used by xml:tm when seeding the namespace into the DITA document.
  • xml:tm recommends the use of Unicode TR29 to enable effective tokenization of text into words. This is a prerequisite to segmentation and word and character counts.
  • xml:tm mandates the use of the LISA OSCAR SRX segmentation rules standard to define how blocks of text are segmented into individual sentences. Heartsome and XML-INTL have donated their respective rule sets to LISA OSCAR in order to establish an industry-wide set of SRX rules for given languages:
  • xml:tm mandates the use of the LISA OSCAR GMX word and character count standard for maintaining authoring and translation statistics.
  • xml:tm mandates the use of the OASIS XLIFF standard for extracting the text for the actual translation process. Documents seeded with the xml:tm namespace are in effect "pre-digested" for extraction. The use of XLIFF allows translators to choose from competing translator tools that support XLIFF—one of the many benefits of Open Standards.
  • xml:tm enables the easy creation of LISA OSCAR TMX files for the exchange of translation memories between localization service suppliers and customers. TMX prevents vendor lock-in that was prevalent in the localization-tools community prior to the introduction of Open Standards.

The Benefits of OAXAL

If we look at the authoring aspect of publishing, OAXAL offers the following significant benefits:

  • DITA provides an excellent framework for reducing costs by introducing granularity into the authoring process.
    • Multiple authors can work on different topics at the same time, thus improving delivery times.
    • Topics can be reused many times in different publications.
    • xml:tm Author Memory takes the reuse principle down to sentence level.
      • Authors are encouraged to write consistently. Left to our own devices, we will quite often write the same sentence in different ways. This is compounded when there are multiple authors.
      • If a sentence has been reused, then it will already have an entry in translation memory for other languages. This will reduce translation costs.

    DITA benefits
    Figure 10. DITA benefits

    From the translation point of view, OAXAL offers the following benefits:

    • Once a topic has been translated it can be reused multiple times, as long as the source document has not been changed.
    • Greater granularity means that translation does not have to wait for the completion of a publication. Individual topics are ready for localization as soon as each one is ready.
  • xml:tm adds the ability to significantly increase the scope and effectiveness of translation memory:
    • xml:tm allows for standards-based implementation of ICE matching. Exact matches virtually eliminate the need for proofing these types of matches.
    • The xml:tm structure of a document allows for a much more "focused" approach to leveraged and fuzzy matching:
      • In-document leveraged matching: xml:tm enables matching of identical sentences within a document. These matches will be of a higher order, as they are from the same document.
      • In-document fuzzy matching: xml:tm allows for the easy identification of previous versions of modified sentences and can present these as "in-document fuzzy matches." These will perforce be of a higher order than database sources fuzzy matches, as they are based on the previous version of the same sentence.
      • Database source leveraged matching: the xml:tm view of a document represents a predigested version that is ready for translation memory operations.
      • Nontranslatable text: xml:tm flags nontranslatable text as part of the namespace seeding, thus making it easier for any matching software to automatically process such text accordingly.
  • Pages: 1, 2, 3, 4, 5

    Next Pagearrow