Sign In/My Account | View Cart  
advertisement

Article:
 Translating XML Documents with xml:tm
Subject: More on Mapping
Date: 2004-01-13 02:59:21
From: Don Smith

I'm interested in understanding better the relationship between the original XML document and the mapped xml:tm document you illustrate in Figure 1.


Does xml:tm assume that people create content in XML using customized document types or in xml:tm itself? If the former, then I assume that moving from my own document type to xml:tm is a straight XSLT transformation. In that case, I have a question about sentence segmentation.


Most document types do not use markup to distinguish sentences in the original customized document type. Does xml:tm assume that a customized document type will segment sentences? (Does the <text> element in Figure 1 perform the function of segmenting sentences in the source document type?)


Also, the PDF at http://www.xml-intl.com/docs/xml-tm-whitepaper.pdf appears to be bad since when I try to download it I got a seven page document with nothing in it and a locked-up web browser.


Previous Message Previous Message   Next Message Next Message


Titles Only Titles Only Newest First
  • More on Mapping
    2004-01-13 12:47:17 Andrzej Zydron

    In answer to some of the other points that you raised:


    The allocation of the xml:tm namespace should be implemented automatically by a program designed for that purpose. The maintenance of the xml:tm namespace should also be done by program. The xml:tm namespace should not be visible for any authoring or printing operations. It should be stripped out for these purposes as described in my other answer.


    Regarding the "text" in Figure 1 it represents the #PCDATA text of the element, as the the "sentence" components. The difference being that there is no identifiable segmentation for the "text" into separate sentences, as for example would be the case for a section title.


  • More on Mapping
    2004-01-13 12:25:10 Andrzej Zydron

    Thank you very much for your feedback.


    xml:tm is best suited for use within a content management system (CMS). The way to use xml:tm within XML documents is to only hold the namespace data within the CMS. When the document is checked out for authoring the namespace should be stripped out. On checking in to the CMS the namespace data is updated by inserting the xml:tm namespace (via the segmentation process) into the new version of the document and comparing the two namespace versions of the document – a process called DOM differencing. Similarly with printing – the namespace should be stripped out when the document is sent for printing via FOP or XEP for instance. In this way the xml:tm namespace is transparent to the authoring or printing environments and does not impinge in any way. Stripping out the xml:tm namespace is a trivial operation using XSLT.


    I have checked the PDF file again with IE 6.0, Netscape 7.1 and Mozilla Firebird 0.6.1 and it works in all three browsers. It is rather big, and can cause problems if you try and open it before it has fully downloaded in your browser. Wait until you can see the first page being displayed in the Acrobat browser plugin., otherwise it will complain, or alternatively try downloading it rather than displaying it directly in your browser, using the "shift+click" option available in most browsers.


    If you have any more questions please do not hesitate in raising another feedback


    Regards,


    A.Zydron



Sponsored By: