Translating XML Documents with xml:tm
by Andrzej Zydron
|
Pages: 1, 2, 3
Translation Memory
When an xml:tm namespace document is ready for translation, the namespace itself specifies the text that is to be translated. The tm namespace can be used to create an XLIFF document for translation.
XLIFF
XLIFF is an OASIS standard: XML Localization Interchange File Format. XLIFF is another XML format that is optimized for translation. Using XLIFF you can protect the original document from accidental corruption during the translation process. In addition you can supply other relevant information to the translator such as translation memory and preferred terminology.
The following is an example of an XLIFF document based on the previous example:
<?xml
version="1.0" encoding="UTF-8" ?>
<!DOCTYPE
xliff PUBLIC "-//XML-INTL XLIFF-XML 1.0//EN"
"file:xliff.dtd">
<xliff
version="1.0">
<file
datatype="xml" source-language="en-USA"
target-language="es-ESP">
<header>
<count-group
name="Totals">
<count
count-type="TextUnits" unit="transUnits">40</count>
<count
count-type="TotalWordCount" unit="words">416</count>
</count-group>
</header>
<body>
<trans-unit
id="t1">
<source>
Xml:tm</source>
<target
ts="matched">
Xml:tm
</target>
</trans-unit>
<trans-unit
id="t2">
<source>
Xml:tm
is a revolutionary technique for dealing with the problems of
translation memory for XML documents by using XML techniques and
embedding memory directly into the XML documents themselves.
</source>
<target>
Xml:tm
is a revolutionary technique for dealing with the problems of
translation memory for XML documents by using XML techniques and
embedding memory directly into the XML documents themselves.
</target>
</trans-unit>
<trans-unit
id="t3">
<source>
It
makes extensive use of XML namespace.
</source>
<target>
It
makes extensive use of XML namespace.
</target>
</trans-unit>
<trans-unit
id="t4">
<source>
The
“tm” stands for “text memory”.
</source>
<target>
The
“tm” stands for “text memory”.
</target>
</trans-unit>
<trans-unit
id="t5">
<source>
There
are two aspects to text memory:
</source>
<target>
There
are two aspects to text memory:
</target>
</trans-unit>
<trans-unit
id="t6">
<source>
Author
memory
</source>
<target>
Author
memory
</target>
</trans-unit>
<trans-unit
id="t7">
<source>
Translation
memory
</source>
<target>
Translation
memory
</target>
</trans-unit>
...
The magenta colored text signifies where the translated text will replace the source language text as shown below:
<?xml
version="1.0" encoding="UTF-8" ?>
<!DOCTYPE
xliff PUBLIC "-//XML-INTL XLIFF-XML 1.0//EN" "xliff.dtd">
<xliff
version="1.0">
<file
datatype="xml" source-language="en-USA"
target-language="es-ESP">
<header>
<count-group
name="Totals">
<count
count-type="TextUnits" unit="transUnits">40</count>
<count
count-type="TotalWordCount" unit="words">416</count>
</count-group>
</header>
<body>
<trans-unit
id="t1">
<source>
Xml:tm</source>
<target>
Xml:tm
</target>
</trans-unit>
<trans-unit
id="t2">
<source>
Xml:tm
is a revolutionary technique for dealing with the problems of
translation memory for XML documents by using XML techniques and
embedding memory directly into the XML documents themselves.
</source>
<target>
Xml:tm
es un técnica revolucionaria que trata los problemas de
memoria de traducción en documentos XML usando técnicas
XML e incluyendo la memoria en el documento mismo.
</target>
</trans-unit>
<trans-unit
id="t3">
<source>
It
makes extensive use of XML namespace.
</source>
<target>E
sta
técnica hace extensor uso de XML namespace.
</target>
</trans-unit>
<trans-unit
id="t4">
<source>
The
“tm” stands for “text memory”.
</source>
<target>
“tm”
significa “memoria de texto”.
</target>
</trans-unit>
<trans-unit
id="t5">
<source>
There
are two aspects to text memory:
</source>
<target>
Hay
dos aspectos de memoria de texto:
</target>
</trans-unit>
<trans-unit
id="t6">
<source>
Author
memory
</source>
<target>
Memoria
de autor
</target>
</trans-unit>
<trans-unit
id="t7">
<source>
Translation
memory
</source>
<target>
Memoria
de traducción
</target>
</trans-unit>
...
When the translation has been completed the target language text can be merged with the original document to create a new target language version of that document. The net result is a perfectly aligned source and target language document.
The following is an example of a translated xml:tm document:
<?xml
version="1.0" encoding="UTF-8" ?>
<office:document-content
xmlns:text="http://openoffice.org/2000/text"
xmlns:tm="urn:xmlintl-tm-tags"
xmlns:xlink="http://www.w3.org/1999/xlink">
<tm:tm>
...
<text:p
text:style-name="Text body">
<tm:te
id="e1" tuval="2">
<tm:tu
id="u1.1">
Xml:tm es un
técnica revolucionaria que trata los problemas de memoria de
traducción en documentos XML usando técnicas XML e
incluyendo la memoria en el documento mismo.
</tm:tu>
<tm:tu
id="u1.2">E
sta técnica
hace extensor uso de XML namespace.
</tm:tu>
</tm:te>
</text:p>
<text:p
text:style-name="Text body">
<tm:te
id="e2">
<tm:tu
id="u2.1">
“tm”
significa “memoria de texto”.
</tm:tu>
<tm:tu
id="u2.2">
Hay dos aspectos
de memoria de texto:
</tm:tu>
</tm:te>
</text:p>
<text:ordered-list
text:continue-numbering="false" text:style-name="L1">
<text:list-item>
<text:p
text:style-name="P3">
<tm:te
id="e3">
<tm:tu
id="u3.1">
Memoria de
autor
</tm:tu>
</tm:te>
</text:p>
</text:list-item>
<text:list-item>
<text:p
text:style-name="P3">
<tm:te
id="e4">
<tm:tu
id="u4.1">
Memoria de
traducción</tm:tu>
</tm:te>
</text:p>
</text:list-item>
This is an example of the composed translated text:
![]() |
| Figure 3: Composed Translated Document |
The source and target text is linked at the sentence level by the unique xml:tm identifiers. When the document is revised new identifiers are allocated to modified or new text units. When extracting text for translation of the updated source document, the text units that have not changed can be automatically replaced with the target language text. The resulting XLIFF file will look like this:
<?xml
version="1.0" encoding="UTF-8" ?>
<!DOCTYPE xliff PUBLIC "-//XML-INTL XLIFF-XML 1.0//EN"
"xliff.dtd">
<xliff version="1.0">
<file datatype="xml" source-language="en-USA"
target-language="es-ESP">
<header>
<count-group name="Totals">
<count count-type="TextUnits"
unit="transUnits">40</count>
<count count-type="TotalWordCount"
unit="words">416</count>
</count-group>
</header>
<body>
<trans-unit id="t1">
<source>
Xml:tm</source>
<target ts="matched">
Xml:tm
</target>
</trans-unit>
<trans-unit id="t2">
<source>
Xml:tm is a revolutionary
technique for dealing with the problems of translation memory for XML
documents by using XML techniques and embedding memory directly into
the XML documents themselves.
</source>
<target ts="matched">
Xml:tm
es un técnica revolucionaria que trata los problemas de
memoria de traducción en documentos XML usando técnicas
XML e incluyendo la memoria en el documento mismo.
</target>
</trans-unit>
<trans-unit id="t3">
<source>
It makes extensive use of
XML namespace.
</source>
<target ts="matched">E
sta
técnica hace extensor uso de XML namespace.
</target>
</trans-unit>
<trans-unit id="t4">
<source>
The “tm”
stands for “text memory”.
</source>
<target ts="matched">
“tm”
significa “memoria de texto”.
</target>
</trans-unit>
<trans-unit id="t5">
<source>
There are two aspects to
text memory:
</source>
<target ts="matched">
Hay
dos aspectos de memoria de texto:
</target>
</trans-unit>
<trans-unit id="t6">
<source>
Author memory
</source>
<target ts="matched">
Memoria
de autor
</target>
</trans-unit>
<trans-unit id="t7">
<source>
Translation
memory
</source>
<target ts="matched">
Memoria
de traducción
</target>
</trans-unit>
...
Perfect Matching
The matching described in the previous section is called “perfect” matching. Because xml:tm memories are embedded within an XML document they have all the contextual information that is required to precisely identify text units that have not changed from the previous revision of the document. Unlike leveraged matches, perfect matches do not require translator intervention, thus reducing translation costs.
The following diagram shows how Perfect Matching is achieved:
|
| Figure 4: Perfect Matching Mechanism Diagram |
