XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 From Word to XML
Subject: WorX Studio by HyperVision
Date: 2004-01-12 16:30:25
From: Chango Valtchev

Pleased to introduce our new product, WorX Studio… Its purpose is, exactly, automated structuring of Microsoft Word documents (as well as any textual content that can be imported into Word). Conversion can target any custom XML Schema (XSD) that models the logical structure of the document. Hence, meaningful/"semantic" markup can be derived, not just the formatting/typographical kind. Word 2003's native XML markup is supported directly. Older versions are supported just as well based on our other Word add-in, WorX for Word, which augments Word 2000/2002 to become a full-fledged XML authoring tool. [Yeah, we did this three years ahead of Microsoft…] WorX Studio is expressly designed for and fully integrated into the workspace of Microsoft Word. It provides a GUI environment for the development and execution of document-type-specific conversion definitions. Nearly all formatting features supported by Word can be used to define XML element recognition patterns. In addition, literal text-, wildcard-, and regular expression patterns are supported, as well as arbitrarily complex logical (boolean) combinations of all primitive pattern types. Another novel feature of WorX Studio is the conversion model it utilizes. All the "intelligence" encoded in the supplied XML Schema is extracted and used to guide the document conversion process. (No ad-hoc style-to-element mapping like what is seen in some simplistic conversion approaches.) Identifying and defining appropriate recognition patterns only for what is called baseline elements in the document (usually leaf-level or near-leaf-level elements) enables the conversion engine to create the markup for these elements as well as the markup for all higher-level elements, automatically, "for free", by abiding all nesting and repetition rules from the schema. Thus, deep/granular markup can be easily obtained. Another advantage offered by the schema-guided approach is that the individual baseline element patterns can be relatively simple and loose. Patterns are tested only in the context where valid matches are expected/likely to occur, thus avoiding many spurious matches and speeding up the whole conversion process. The result of conversion is the given Word document, with all its original text and formatting intact, but with the XML element tags embedded in it. (This is something that a command-line or "streaming"-approach conversion tool cannot offer.) Pure, standard XML compliant with the custom schema can be exported at any time. Completed conversion definitions can be run even by non-XML users within Microsoft Word. A batch processor is also provided, as well as an API to the conversion engine, which can be used to add automated conversion/structuring capabilities to custom authoring solutions based on Microsoft Word (especially in the paradigm of "smart documents" introduced with Word 2003). For more general information and a detailed feature list, please visit www.hvltd.com. (Fact Sheet: http://www.hvltd.com/misc/WorXStudioFactsheet.pdf.)

Previous Message Previous Message   Next Message Next Message

Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938