|
...anything you can print to a PostScript or PDF file, that is... I am biased, as an Exegenix employee, but am also proud of our technology (the Exegenix Conversion System, or ECS) and the way it analyses the page geometry, and goes a farther than most "to XML" conversion utilities. Output of ECS is not 'just' an XML version of captured formatting information (even WordML is 'just' that); the output is richly structured, employing a DTD that we call a "superset of DocBook"... that is, if an object on the page looks like a "Section Title", we tag the object as a <title>, inside a <section>. If you're going to be post-processing your documents via XSLT, you don't have to rely on exact formatting codes... if it looks like a title, it will be tagged as a <title>, without pre-configuring it to recognise particular formatting as a title... no matter which particular typeface or point size is used in that particular document. Don't worry though, if you're inclined to write scripts that act on formatting information, all that formatting information IS part of ECS XML output, so it's all there for you to use... I could go on :-) but suggest you check our website at www.exegenix.com if this type of structured output is of interest to you. |