Moving to OpenOffice: Batch Converting Legacy Documents
by Bob DuCharme
|
Pages: 1, 2
Running It
Running the macro from a shell prompt should work whether you leave OpenOffice open or quit out of it first. The following shows the basic command line for converting a Word file to OpenOffice on a Windows computer, split onto two lines to fit here:
"C:\Program Files\OpenOffice.org 2.0\program\soffice"
-invisible macro:///Standard.MyConversions.SaveAsOOO(c:\temp\sample.doc)
I don't have the soffice.exe executable in my path, so I had to include the full path to it enclosed in quotes because of the space in the Program Files directory name. The -invisible switch tells OpenOffice not to bother with the startup screen, a default document, or any of the GUI. (Try starting up soffice.exe from the command line with a single parameter of -? to see a list of interesting options.) The macro is named in a URL-like format, with the path down the macro tree structure to the macro to be run, and the file to be converted is included in parentheses as a parameter to the macro. There's no need to provide an output file name, because the macro infers it from the input filename and the requested action.
Because the macro code adds http:// as a prefix to turn the input filename into a URL, you must include the complete path to it, as shown above, or you'll get the error message "URL seems to be an unsupported one."
The linux version of the command line (again, split here) needs to use a different binary name. The OpenOffice installation on my Ubuntu distribution put the ooffice2 binary in my path, so I didn't have to say where it was when starting it. I did enclose the call to the macro in quotes, because otherwise the parentheses confused the shell. Otherwise, the exact same macros installed with the procedure described above worked perfectly:
ooffice2 -invisible
"macro:///Standard.MyConversions.SaveAsOOO(/home/bob/temp/sample.doc)"
I tried converting several different files. The sample.doc file is a test file I've kept around for a few years to test the mettle of any program or service that claims to convert Word files to XML. It uses built-in and newly-created block and newline styles, nested bulleted lists, a BMP file, a table with spanning cells, an embedded spreadsheet, and a few other things that can throw off a conversion program. SaveAsOOO did fine with it.
Go Forth and Convert MS Office Files
Now that you've got a free, multi-platform tool that can convert new and old (well, at least as old as Office 97) MS Office files to an open XML standard, how can you best put it to good use? Anything that can be run from a command line can be used in an unattended, "lights out" workflow. A Perl script can take a list of filenames and create a batch file or shell script with a series of commands like those shown above to convert those files. If the raw XML is really what you're after, a script can also pull that XML out of the OpenOffice zip file and rename it to correspond with the input file, like in this shell script:
# Remember to include full path with
# filename for $1 and to omit extension
ooffice2 -invisible "macro:///Standard.MyConversions.SaveAsOOO($1.doc)"
unzip -o $1.odt content.xml
cp content.xml $1.xml
Windows batch file version:
REM Remember to include full path with
REM filename for %1 and to omit extension
set OooExe="C:\Program Files\OpenOffice.org 2.0\program\soffice"
%OOOExe% -invisible macro:///Standard.MyConversions.SaveAsOOO(%1.doc)
unzip -o %1.odt content.xml
copy content.xml %1.xml
If you're going to make high volume conversion part of an ongoing daily workflow, this restarting of OpenOffice for every conversion will slow you down. In Windows, starting up soffice.exe in quickstart mode (with the -quickstart switch on the command line) before doing your conversions should make those conversions go faster. To go a few steps further, the -accept switch specifies a Universal Network Objects string that lets you communicate with the running OpenOffice process via an API from a program written in C++, OpenOffice Basic, Python, Java, or other languages and pass input documents to your OpenOffice process using API calls.
To me, the exciting part about this is not the ability to convert new Word or Excel files that people send me to OpenOffice XML, but the ability to convert old files. How many old Microsoft Office files do you have access to? What new applications would be possible if you could unlock the information in them by converting those files to a well-documented XML format and then using XML tools to mine that information? Considering that we can do all this with free software that runs on both Windows and Linux, there should be huge new opportunities to explore.
|
Related Reading OpenOffice.org Writer |
Share your experience in our forums.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- How to convert to .txt?
2007-12-04 06:54:09 kaplun [Reply]
Hello, this is great! OpenOffice is nowadays very good in converting from whatever Office documents. Thus, it can be worth try integrating this into a search engine. I'm trying this but what I'm actually missing is how to convert document into plain text file. I.e. what is the name of the converter?
Or any hint on where should I look?
Best regards
- How to convert to .txt?
2007-12-04 10:11:52 Bob DuCharme [Reply]
I just did a Google search on the two formats mentioned in the article (g writer_pdf_Export "MS WinWord 6.0") and found http://svn.rpmforge.net/svn/trunk/tools/unoconv/docs/formats.txt ,which shows after the last | on each line the various values you can use, including the one for txt files.
Bob
- How to convert to .txt?
- An amavisd-new filter to convert attachements to ODF
2007-11-07 13:14:08 rsandu [Reply]
Hello,
This tutorial is VERY useful, INVALUABLE - so useful that I thought to ask here for some further ideas to expand it.
On Linux, the amavisd-new (http://www.ijs.si/software/amavisd/) is a Perl script used as a link between the mailserver itself (say Postfix) and other various tools: antivirus filters, spam filters, etc.
Is it possible to expand this converter and create an amavisd „filter” from it (suposing we have OOo installed on the same Linux box)?
My idea is the following:
1. An e-mail enters the mailserver, having a Microsoft file attached to it (.doc, .xls, etc.)
2. The MIME attachements are passed through amavisd and are analysed, like regular virus checks;
3. When a Microsoft file is found in attach, it is deattached from the original message;
4. The deattached Microsoft file is converted to its OpenDocument counterpart;
5. The converted OpenDocument file is reattached to the original mail body and sent on the way.
Properly implemented, this would be a GREAT tool to filter out Microsoft proprietary attachements without information loss.
Anyone please know if this would be technically feasible ? Any ideas of some lines of code ?
Regards,
Răzvan
- An amavisd-new filter to convert attachements to ODF
2007-11-07 16:24:05 Bob DuCharme [Reply]
You're probably better off asking on one of the OpenOffice.org mailing lists.
Bob
- An amavisd-new filter to convert attachements to ODF
2007-06-20 21:28:57 Leontius [Reply]
I've created a script which is based on this great idea (and supposedly easier to use): http://leon.info.tm/en/mso2ooo-batch-convert-microsoft-office-documents-openoffice-documents
- Very Helpful
2007-05-05 15:31:09 mannym [Reply]
The Document Convertor plugin only works for MS files not with Lotus files, so this really helps for converting all of our legacy Lotus files. I had to change the script a little bit to work with the 123 files we had, but the macro is basically the same. I put it through a bash script and away it went.
Neskie
- runs in the background in Linux
2007-02-09 08:13:23 akaihola [Reply]
This is a great solution to a common problem!
However, my Ubuntu 6.10 Edgy Linux box always runs the "ooffice -invisible" command in the background. The script for extracting content.xml doesn't work for this reason, since the new OpenDocument file doesn't yet exist right after executing the conversion.
Is there an OpenOffice flag for forcing it not to go into the background?
- runs in the background in Linux
2007-02-09 08:17:33 akaihola [Reply]
And, as usual, the solution was found right after posting the comment. Just don't use the "ooffice" binary but "soffice" instead to launch the conversion.
- runs in the background in Linux
- Problem with command line conversion
2007-01-24 00:12:23 jestarovic [Reply]
When I try to convert doc document to pdf with your macro form command line on UBUNTU (OpenOffice 2.0)
the macro doesn't nothihng and I have these errors:
(process:6040): GLib-GObject-CRITICAL **: gtype.c:2240: initialization assertion failed, use IA__g_type_init() prior to this function
(process:6040): Gdk-CRITICAL **: gdk_screen_get_font_options: assertion `GDK_IS_SCREEN (screen)' failed
And if I use from the program I have this error:
Wrong number of parameters!
Someone can help me?
Many Thanks
- Problems with filename
2006-11-07 03:22:42 paai@uvt.nl [Reply]
This macro promises to be the solution to my problems, but it does not work with me. I cut-and-pasted the macro to the OO environment, and used the command-line just as advertised (no wildcards, just a filename), but OO keeps complaining about the unsupported URL. I use Linux, and tried all possible variations of the path.
/usr/lib/ooo-2.0/program/soffice -invisible "macro:///Standard.MyConversions.SaveAsOOO('/home/paai/Breda/breda.doc')"
I really would appreciate suggestions on what I did wrong.
Paai
- Problems with filename
2006-11-07 03:26:19 paai@uvt.nl [Reply]
To answer my own question below: omitting the single quotes around the filename did the trick! Sorry.
Paai
- Problems with filename
- great
2006-10-24 08:15:10 meatron [Reply]
Just what I was looking for. I have one problem though, how would I batch convert several documents, the "*" wildcard doesn't seem to work. On linux>
openoffice.org-2.0 -invisible "macro:///Standard.MyConversions.SaveAsOOO($PWD/*.doc)"
*.doc is the problem, specifying the file name converts ok
I have a small script preparing a directory structure and moving the files into the appropriate folders. It would help a lot if the conversion was included. Any help is very appreciated, thanks in advance.
- great
2006-10-24 08:40:40 Bob DuCharme [Reply]
I haven't had a chance to try this on a Linux box. Have you tried a backslash before the asterisk? If that doesn't work, I'd try to find an OOo mailing list with Linux users and ask there.
- great
2006-10-24 09:30:18 meatron [Reply]
Thanks for the quick answer. The backslash doesn't do. I'll post if I find a solution. Once again, really a very useful tool, a month ago I was asking for something like this on OO forums.
Best regards
- great
- great
- How to specify the macro source file without defining it inside OO ?
2006-09-20 10:52:29 pyPeton [Reply]
Hello,
Batch conversion utilities of OO would be very useful to me if I could only write the macro basic in a seperate file that I use as a batch command line argument. correct me if I'm wrong, but this method suppose you'll have to run the batch on the machine on which you wrote the code ... ?
- Excellent Tutorial
2006-04-21 03:29:26 kyiyer [Reply]
Hi
Started using it in two minutes fast - and learnt how to programme OpenOffice to boot.
Quick question - can PDFs thus created be password protected programatically?
Thanks a million
Best wishes
Iyer
- My side
2007-06-27 06:01:06 Zolivier [Reply]
Hello,
Xvfb running
Do I have to have also soffice running in the background?
Launching
openoffice -headless -invisible -display :1 "macro:///Tools.Convert.SaveAsPDF(/tmp/200702.doc)"
just returns nothing, no pdf, no errors.
Running openoffice in background display :1 gives the same results.
I don't have openoffice with gui, I then copied the code source, added the headers found in other macros, restart openoffice.
Questions :
Is there a way to debug on this mode?
Is it the right way to implement the macro.
Kind regards,
Olivier
- Excellent Tutorial
2006-04-21 04:34:17 Bob DuCharme [Reply]
I don't know. Try an OpenOffice mailing list.
Bob
- My side
- Error for Comma in Filenames When TXT -> ODT
2006-04-14 10:35:30 ParetoJ [Reply]
Hello everyone,
OpenOffice barfs when converting files that have _commas_ from RTF or TXT to ODT under Windows XP. File names with _NO commas_ work excellently. Does anyone know how any solutions when converting files with commas? Here are errors and lines where the problems exist.
This line gives an error when a comma is used in the file name when converting from TXT to ODT:
oDoc = StarDesktop.loadComponentFromURL( cURL, "_blank", 0,
The error is:
BASIC runtime error.
An exception occured
Type: com.sun.star.lang.IllegalArguementException
Message: URL seems to be an unsupported one..
The problem seems to be in the 'Calls' section bottom right where a new '= ' wants to appear in the new filename.
Author, Filename.txt --> Author, = Filename.txt
The command I was using was:
"C:\Program Files\OpenOffice.org 2.0\program\soffice" -invisible macro:///Standard.MyConversions.SaveAsOOO("C:\test\Author , Filename.txt")
Thanks for your help!
- Reusage of existing Microsoft Macros in Legacy Documents
2006-01-17 10:04:14 SvanteSchubert [Reply]
For the sake of completeness, in case there are legacy documents, which contain Macros which are still needed after a migration, I would like to point out the MigrationTools of OpenOffice.org's sister application StarOffice [1].
Microsoft documents are automatically saved to their OpenDocument counterpiece, Macros earlier using MS Office functionality will now depend on a StarOffice library, which wraps the StarOffice API and offers a MS Office function interface (at least a good part of it). Manual adaption of Macros is reduced, but not eleminiated, as the Macro might still use seldom used functionality (not yet implemented) or - in case of a migration to Linux - might use platform (Windows) dependent functionality, which has to be replaced.
In upcoming versions of StarOffice the majority of Microsoft Macros should be runnable after simply loading a document, without the need of a MigrationWizard.
[1] - http://www.sun.com/software/star/staroffice/enterprise_tools.jsp
- Thank you!
2006-01-12 16:11:18 J David Eisenberg [Reply]
This is exactly what I need. I'll be converting to PDF, but having the code and the specifications for how to run it from the command line has saved me a lot of time and trouble.

