|
I don't know if its related to OpenOffice these days as well, but the importer used by Abiword was called wvware, and does the word->xml job quite well across a wide variety of platforms (its in C, but doesn't depend on win32 apis). I'm slightly biased in their favour as I contributed code some years back, however it does seem to have a couple of advantages compared to some of the tools discussed:
- its command line, not point and click. That makes it easy to build it into a processing pipeline where human intervention isn't possible.
- the xml export format preserves (as I recall) all the useful info from the OLE envelope and Word doc, letting you control post-processing into html, mif, tex, etc (e.g. fitting documents into templates, splitting across pages, using a corporate stylesheet instead of an autogenerated one preserving the formatting, etc). You can't get this by converting to RTF first, for example.
Its worth mentioning that the only reason I was involved was that the large corporate I worked for at the time internally surveyed all the tools available at the time (Y2K?) and rated wvware as the best available. You can find it here:
http://sourceforge.net/projects/wvware
|