XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Translating XML Documents with xml:tm
by Andrzej Zydron | Pages: 1, 2, 3

Translation Memory

When an xml:tm namespace document is ready for translation, the namespace itself specifies the text that is to be translated. The tm namespace can be used to create an XLIFF document for translation.

XLIFF

XLIFF is an OASIS standard: XML Localization Interchange File Format. XLIFF is another XML format that is optimized for translation. Using XLIFF you can protect the original document from accidental corruption during the translation process. In addition you can supply other relevant information to the translator such as translation memory and preferred terminology.

The following is an example of an XLIFF document based on the previous example:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE xliff PUBLIC "-//XML-INTL XLIFF-XML 1.0//EN" "file:xliff.dtd">
<xliff version="1.0">
<file datatype="xml" source-language="en-USA" target-language="es-ESP">
<header>
<count-group name="Totals">
<count count-type="TextUnits" unit="transUnits">40</count>
<count count-type="TotalWordCount" unit="words">416</count>
</count-group>
</header>
<body>
<trans-unit id="t1">
<source> Xml:tm</source>
<target ts="matched"> Xml:tm </target>
</trans-unit>
<trans-unit id="t2">
<source> Xml:tm is a revolutionary technique for dealing with the problems of translation memory for XML documents by using XML techniques and embedding memory directly into the XML documents themselves. </source>
<target> Xml:tm is a revolutionary technique for dealing with the problems of translation memory for XML documents by using XML techniques and embedding memory directly into the XML documents themselves. </target>
</trans-unit>
<trans-unit id="t3">
<source> It makes extensive use of XML namespace. </source>
<target> It makes extensive use of XML namespace. </target>
</trans-unit>
<trans-unit id="t4">
<source> The “tm” stands for “text memory”. </source>
<target> The “tm” stands for “text memory”. </target>
</trans-unit>
<trans-unit id="t5">
<source> There are two aspects to text memory: </source>
<target> There are two aspects to text memory: </target>
</trans-unit>
<trans-unit id="t6">
<source> Author memory </source>
<target> Author memory </target>
</trans-unit>
<trans-unit id="t7">
<source> Translation memory </source>
<target> Translation memory </target>
</trans-unit>
...

The magenta colored text signifies where the translated text will replace the source language text as shown below:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE xliff PUBLIC "-//XML-INTL XLIFF-XML 1.0//EN" "xliff.dtd">
<xliff version="1.0">
<file datatype="xml" source-language="en-USA" target-language="es-ESP">
<header>
<count-group name="Totals">
<count count-type="TextUnits" unit="transUnits">40</count>
<count count-type="TotalWordCount" unit="words">416</count>
</count-group>
</header>
<body>
<trans-unit id="t1">
<source> Xml:tm</source>
<target> Xml:tm </target>
</trans-unit>
<trans-unit id="t2">
<source> Xml:tm is a revolutionary technique for dealing with the problems of translation memory for XML documents by using XML techniques and embedding memory directly into the XML documents themselves. </source>
<target> Xml:tm es un técnica revolucionaria que trata los problemas de memoria de traducción en documentos XML usando técnicas XML e incluyendo la memoria en el documento mismo. </target>
</trans-unit>
<trans-unit id="t3">
<source> It makes extensive use of XML namespace. </source>
<target>E sta técnica hace extensor uso de XML namespace. </target>
</trans-unit>
<trans-unit id="t4">
<source> The “tm” stands for “text memory”. </source>
<target> “tm” significa “memoria de texto”. </target>
</trans-unit>
<trans-unit id="t5">
<source> There are two aspects to text memory: </source>
<target> Hay dos aspectos de memoria de texto: </target>
</trans-unit>
<trans-unit id="t6">
<source> Author memory </source>
<target> Memoria de autor </target>
</trans-unit>
<trans-unit id="t7">
<source> Translation memory </source>
<target> Memoria de traducción </target>
</trans-unit>
...

When the translation has been completed the target language text can be merged with the original document to create a new target language version of that document. The net result is a perfectly aligned source and target language document.

The following is an example of a translated xml:tm document:

<?xml version="1.0" encoding="UTF-8" ?>
<office:document-content
xmlns:text="http://openoffice.org/2000/text"
xmlns:tm="urn:xmlintl-tm-tags" xmlns:xlink="http://www.w3.org/1999/xlink">
<tm:tm>
...
<text:p text:style-name="Text body">
<tm:te id="e1" tuval="2">
<tm:tu id="u1.1"> Xml:tm es un técnica revolucionaria que trata los problemas de memoria de traducción en documentos XML usando técnicas XML e incluyendo la memoria en el documento mismo. </tm:tu>
<tm:tu id="u1.2">E sta técnica hace extensor uso de XML namespace. </tm:tu>
</tm:te>
</text:p>
<text:p text:style-name="Text body">
<tm:te id="e2">
<tm:tu id="u2.1"> “tm” significa “memoria de texto”. </tm:tu>
<tm:tu id="u2.2"> Hay dos aspectos de memoria de texto: </tm:tu>
</tm:te>
</text:p>
<text:ordered-list text:continue-numbering="false" text:style-name="L1">
<text:list-item>
<text:p text:style-name="P3">
<tm:te id="e3">
<tm:tu id="u3.1"> Memoria de autor </tm:tu>
</tm:te>
</text:p>
</text:list-item>
<text:list-item>
<text:p text:style-name="P3">
<tm:te id="e4">
<tm:tu id="u4.1"> Memoria de traducción</tm:tu>
</tm:te>
</text:p>
</text:list-item>

This is an example of the composed translated text:

Composed translated document
Figure 3: Composed Translated Document

The source and target text is linked at the sentence level by the unique xml:tm identifiers. When the document is revised new identifiers are allocated to modified or new text units. When extracting text for translation of the updated source document, the text units that have not changed can be automatically replaced with the target language text. The resulting XLIFF file will look like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE xliff PUBLIC "-//XML-INTL XLIFF-XML 1.0//EN" "xliff.dtd">
<xliff version="1.0">
<file datatype="xml" source-language="en-USA" target-language="es-ESP">
<header>
<count-group name="Totals">
<count count-type="TextUnits" unit="transUnits">40</count>
<count count-type="TotalWordCount" unit="words">416</count>
</count-group>
</header>
<body>
<trans-unit id="t1">
<source> Xml:tm</source>
<target ts="matched"> Xml:tm </target>
</trans-unit>
<trans-unit id="t2">
<source> Xml:tm is a revolutionary technique for dealing with the problems of translation memory for XML documents by using XML techniques and embedding memory directly into the XML documents themselves. </source>
<target ts="matched"> Xml:tm es un técnica revolucionaria que trata los problemas de memoria de traducción en documentos XML usando técnicas XML e incluyendo la memoria en el documento mismo. </target>
</trans-unit>
<trans-unit id="t3">
<source> It makes extensive use of XML namespace. </source>
<target ts="matched">E sta técnica hace extensor uso de XML namespace. </target>
</trans-unit>
<trans-unit id="t4">
<source> The “tm” stands for “text memory”. </source>
<target ts="matched"> “tm” significa “memoria de texto”. </target>
</trans-unit>
<trans-unit id="t5">
<source> There are two aspects to text memory: </source>
<target ts="matched"> Hay dos aspectos de memoria de texto: </target>
</trans-unit>
<trans-unit id="t6">
<source> Author memory </source>
<target ts="matched"> Memoria de autor </target>
</trans-unit>
<trans-unit id="t7">
<source> Translation memory </source>
<target ts="matched"> Memoria de traducción </target>
</trans-unit>
...

Perfect Matching

The matching described in the previous section is called “perfect” matching. Because xml:tm memories are embedded within an XML document they have all the contextual information that is required to precisely identify text units that have not changed from the previous revision of the document. Unlike leveraged matches, perfect matches do not require translator intervention, thus reducing translation costs.

The following diagram shows how Perfect Matching is achieved:

Perfect matching mechanism diagram
Figure 4: Perfect Matching Mechanism Diagram

Pages: 1, 2, 3

Next Pagearrow