<?xml version="1.0" encoding="UTF-8"?>

<office:document office:class="text" office:version="0.9" xmlns:office="http://openoffice.org/2000/office" xmlns:style="http://openoffice.org/2000/style" xmlns:text="http://openoffice.org/2000/text" xmlns:table="http://openoffice.org/2000/table" xmlns:draw="http://openoffice.org/2000/drawing" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="http://openoffice.org/2000/meta" xmlns:number="http://openoffice.org/2000/datastyle" xmlns:svg="http://www.w3.org/2000/svg" xmlns:chart="http://openoffice.org/2000/chart">
 <office:meta>
  <meta:generator>StarOffice/5.2 (Linux)</meta:generator><!--605b(Build:5605)-->
  <dc:title>Adventures with Open Office and XML</dc:title>
  <meta:initial-creator>Matt Sergeant</meta:initial-creator>
  <meta:creation-date>2000-11-13T20:48:48</meta:creation-date>
  <dc:date>2001-01-23T13:31:07</dc:date>
  <meta:keywords>
   <meta:keyword>Open</meta:keyword>
   <meta:keyword>Office</meta:keyword>
   <meta:keyword>Sun</meta:keyword>
   <meta:keyword>XML</meta:keyword>
   <meta:keyword>XPathScript</meta:keyword>
  </meta:keywords>
  <dc:language>en-US</dc:language>
  <meta:editing-cycles>173</meta:editing-cycles>
  <meta:editing-duration>P3DT7H23M34S</meta:editing-duration>
  <meta:user-defined meta:name="Info 0"></meta:user-defined>
  <meta:user-defined meta:name="Info 1"></meta:user-defined>
  <meta:user-defined meta:name="Info 2"></meta:user-defined>
  <meta:user-defined meta:name="Info 3"></meta:user-defined>
 </office:meta>
 <office:styles>
  <style:style style:name="Standard" style:family="paragraph" style:class="text"/>
  <style:style style:name="Text body" style:family="paragraph" style:parent-style-name="Standard" style:class="text">
   <style:properties fo:margin-top="0inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="List" style:family="paragraph" style:parent-style-name="Text body" style:class="list">
   <style:properties fo:font-family="&apos;Times New Roman&apos;" style:font-style-name=""/>
  </style:style>
  <style:style style:name="Caption" style:family="paragraph" style:parent-style-name="Standard" style:class="extra">
   <style:properties fo:font-family="&apos;Times New Roman&apos;" style:font-style-name="" fo:font-size="10pt" fo:font-style="italic" fo:margin-top="0.0835inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="Index" style:family="paragraph" style:parent-style-name="Standard" style:class="index">
   <style:properties fo:font-family="&apos;Times New Roman&apos;" style:font-style-name=""/>
  </style:style>
  <style:style style:name="Heading" style:family="paragraph" style:parent-style-name="Standard" style:next-style-name="Text body" style:class="text">
   <style:properties fo:font-family="Arial" style:font-style-name="" fo:font-size="14pt" fo:margin-top="0.1665inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="Heading 1" style:family="paragraph" style:parent-style-name="Heading" style:next-style-name="Text body" style:class="text">
   <style:properties fo:font-size="18pt" fo:font-weight="bold"/>
  </style:style>
  <style:style style:name="Title" style:family="paragraph" style:parent-style-name="Heading" style:next-style-name="Subtitle" style:class="chapter">
   <style:properties fo:font-size="18pt" fo:font-weight="bold" fo:text-align="center" style:justify-single-word="false"/>
  </style:style>
  <style:style style:name="Subtitle" style:family="paragraph" style:parent-style-name="Heading" style:next-style-name="Text body" style:class="chapter">
   <style:properties fo:font-size="14pt" fo:font-style="italic" fo:text-align="center" style:justify-single-word="false"/>
  </style:style>
  <style:style style:name="List Heading" style:family="paragraph" style:parent-style-name="Standard" style:next-style-name="List Contents" style:class="html">
   <style:properties fo:margin-left="0inch" fo:margin-right="0inch" fo:text-indent="0inch"/>
  </style:style>
  <style:style style:name="List Contents" style:family="paragraph" style:parent-style-name="Standard" style:class="html">
   <style:properties fo:margin-left="0.3937inch" fo:margin-right="0inch" fo:text-indent="0inch"/>
  </style:style>
  <style:style style:name="Heading 2" style:family="paragraph" style:parent-style-name="Heading" style:next-style-name="Text body" style:class="text">
   <style:properties fo:font-size="14pt" fo:font-style="italic" fo:font-weight="bold"/>
  </style:style>
  <style:style style:name="Footnote" style:family="paragraph" style:parent-style-name="Standard" style:class="extra">
   <style:properties fo:font-size="10pt" fo:margin-left="-0.3929inch" fo:margin-right="0inch" fo:text-indent="-0.1965inch"/>
  </style:style>
  <style:style style:name="Source Code" style:family="paragraph" style:parent-style-name="Text body">
   <style:properties fo:color="#800000" fo:font-family="Courier" style:font-style-name="Medium" style:font-family-generic="modern" style:font-pitch="variable" fo:font-size="11pt" fo:font-weight="normal" fo:margin-left="0.2402inch" fo:margin-right="0inch" fo:text-indent="0inch"/>
  </style:style>
  <style:style style:name="Text body indent" style:family="paragraph" style:parent-style-name="Text body" style:class="text">
   <style:properties fo:margin-left="0.1965inch" fo:margin-right="0inch" fo:text-indent="0inch"/>
  </style:style>
  <style:style style:name="Heading 3" style:family="paragraph" style:parent-style-name="Heading" style:next-style-name="Text body" style:class="text">
   <style:properties fo:font-size="14pt" fo:font-weight="bold"/>
  </style:style>
  <style:style style:name="Hanging indent" style:family="paragraph" style:parent-style-name="Text body" style:class="text">
   <style:properties fo:margin-left="0inch" fo:margin-right="0inch" fo:text-indent="0inch"/>
  </style:style>
  <style:style style:name="Footnote Symbol" style:family="text"/>
  <style:style style:name="Endnote Symbol" style:family="text"/>
  <style:style style:name="Bullet Symbols" style:family="text">
   <style:properties fo:font-family="starbats" style:font-style-name="" fo:font-size="9pt"/>
  </style:style>
  <style:style style:name="Numbering Symbols" style:family="text"/>
  <style:style style:name="Footnote anchor" style:family="text">
   <style:properties style:text-position="sub 100%"/>
  </style:style>
  <text:outline-style>
   <text:outline-level-style text:level="1" style:num-format=""/>
   <text:outline-level-style text:level="2" style:num-format=""/>
   <text:outline-level-style text:level="3" style:num-format=""/>
   <text:outline-level-style text:level="4" style:num-format=""/>
   <text:outline-level-style text:level="5" style:num-format=""/>
   <text:outline-level-style text:level="6" style:num-format=""/>
   <text:outline-level-style text:level="7" style:num-format=""/>
   <text:outline-level-style text:level="8" style:num-format=""/>
   <text:outline-level-style text:level="9" style:num-format=""/>
   <text:outline-level-style text:level="10" style:num-format=""/>
  </text:outline-style>
  <text:footnotes-configuration text:num-prefix="" text:num-suffix="" text:citation-style-name="Footnote Symbol" text:default-style-name="" text:page-master-name="Footnote" style:num-format="1" text:offset="0" text:footnotes-position="page" text:start-numbering-at="document">
   <text:quo-vadis></text:quo-vadis>
   <text:ergo-sum></text:ergo-sum>
  </text:footnotes-configuration>
  <text:endnotes-configuration text:num-prefix="" text:num-suffix="" text:citation-style-name="Endnote Symbol" text:default-style-name="" text:page-master-name="Endnote" style:num-format="i" text:offset="0"/>
 </office:styles>
 <office:automatic-styles>
  <style:style style:name="P1" style:family="paragraph" style:parent-style-name="Heading 1">
   <style:properties fo:font-family="Arial" style:font-style-name="" fo:font-size="18pt" fo:font-weight="bold" fo:margin-top="0.1665inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P2" style:family="paragraph" style:parent-style-name="Subtitle">
   <style:properties fo:font-family="Arial" style:font-style-name="" fo:font-size="14pt" fo:font-style="italic" fo:text-align="center" style:justify-single-word="false" fo:margin-top="0.1665inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P3" style:family="paragraph" style:parent-style-name="Text body">
   <style:properties fo:margin-top="0inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P4" style:family="paragraph" style:parent-style-name="Heading 2">
   <style:properties fo:font-family="Arial" style:font-style-name="" fo:font-size="14pt" fo:font-style="italic" fo:font-weight="bold" fo:margin-top="0.1665inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P5" style:family="paragraph" style:parent-style-name="Text body" style:list-style-name="L1">
   <style:properties fo:margin-top="0inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P6" style:family="paragraph" style:parent-style-name="Source Code">
   <style:properties fo:color="#800000" fo:font-family="Courier" style:font-style-name="Medium" style:font-family-generic="modern" style:font-pitch="variable" fo:font-size="11pt" fo:font-weight="normal" fo:margin-left="0.2402inch" fo:margin-right="0inch" fo:text-indent="0inch" fo:margin-top="0inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P7" style:family="paragraph" style:parent-style-name="Text body" style:list-style-name="L2">
   <style:properties fo:margin-top="0inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P8" style:family="paragraph" style:parent-style-name="Text body indent">
   <style:properties fo:margin-left="0.1965inch" fo:margin-right="0inch" fo:text-indent="0inch" fo:margin-top="0inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P9" style:family="paragraph" style:parent-style-name="Heading 3">
   <style:properties fo:font-family="Arial" style:font-style-name="" fo:font-size="14pt" fo:font-weight="bold" fo:margin-top="0.1665inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="P10" style:family="paragraph" style:parent-style-name="Text body" style:list-style-name="L3">
   <style:properties fo:margin-top="0inch" fo:margin-bottom="0.0835inch"/>
  </style:style>
  <style:style style:name="T1" style:family="text">
   <style:properties fo:font-weight="bold"/>
  </style:style>
  <style:style style:name="T2" style:family="text">
   <style:properties fo:font-weight="normal"/>
  </style:style>
  <style:style style:name="T3" style:family="text">
   <style:properties fo:font-style="italic" fo:font-weight="normal"/>
  </style:style>
  <style:style style:name="T4" style:family="text">
   <style:properties fo:font-style="normal" fo:font-weight="normal"/>
  </style:style>
  <style:style style:name="T5" style:family="text">
   <style:properties fo:font-family="Helmet" style:font-style-name="" style:font-family-generic="modern" style:font-pitch="variable"/>
  </style:style>
  <style:style style:name="T6" style:family="text">
   <style:properties fo:color="#800000" fo:font-family="Courier" style:font-style-name="" style:font-family-generic="modern" style:font-pitch="variable"/>
  </style:style>
  <style:style style:name="T7" style:family="text">
   <style:properties fo:font-family="Courier" style:font-style-name="" style:font-family-generic="modern" style:font-pitch="variable"/>
  </style:style>
  <style:style style:name="T8" style:family="text">
   <style:properties fo:font-style="italic"/>
  </style:style>
  <style:style style:name="T9" style:family="text">
   <style:properties fo:font-family="Times" style:font-style-name="" style:font-family-generic="roman" style:font-pitch="variable"/>
  </style:style>
  <text:list-style style:name="L1">
   <text:list-level-style-number text:level="1" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="2" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="0.1972inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="3" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="0.3937inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="4" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="0.5909inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="5" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="0.7874inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="6" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="0.9846inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="7" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="1.1815inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="8" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="1.3787inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="9" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="1.5752inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
   <text:list-level-style-number text:level="10" text:style-name="Numbering Symbols" style:num-suffix="." style:num-format="1">
    <style:properties text:space-before="1.7724inch" text:min-label-width="0.1965inch"/>
   </text:list-level-style-number>
  </text:list-style>
  <text:list-style style:name="L2">
   <text:list-level-style-bullet text:level="1" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="2" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.1972inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="3" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.3937inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="4" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.5909inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="5" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.7874inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="6" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.9846inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="7" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="1.1815inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="8" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="1.3787inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="9" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="1.5752inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="10" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="1.7724inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
  </text:list-style>
  <text:list-style style:name="L3">
   <text:list-level-style-bullet text:level="1" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="2" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.1972inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="3" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.3937inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="4" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.5909inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="5" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.7874inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="6" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="0.9846inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="7" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="1.1815inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="8" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="1.3787inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="9" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="1.5752inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
   <text:list-level-style-bullet text:level="10" text:style-name="Bullet Symbols" style:num-suffix="." text:bullet-char="">
    <style:properties text:space-before="1.7724inch" text:min-label-width="0.1965inch" fo:font-family="starbats" style:font-charset="x-symbol"/>
   </text:list-level-style-bullet>
  </text:list-style>
 </office:automatic-styles>
 <office:master-styles/>
 <office:body>
  <text:sequence-decls>
   <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
   <text:sequence-decl text:display-outline-level="0" text:name="Table"/>
   <text:sequence-decl text:display-outline-level="0" text:name="Text"/>
   <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
  </text:sequence-decls>
  <text:h text:style-name="P1" text:level="1">Adventures with Open Office and XML</text:h>
  <text:p text:style-name="P2">Matt Sergeant</text:p>
  <text:p text:style-name="P3">At the Open Source conference in Monterey last year, Sun announced their plans to open source the current source code for Star Office, re-dubbed as <text:span text:style-name="T1">OpenOffice</text:span><text:span text:style-name="T2">. In October they followed up on their plans, releasing both the source code, and binaries for the current code (then OpenOffice build 605). One of the features added since Star Office 5.2 was the ability to </text:span><text:span text:style-name="T3">Save As</text:span><text:span text:style-name="T4"> XML.</text:span></text:p>
  <text:p text:style-name="P3">Saving as XML makes OpenOffice truly open. Aside from being just open source, XML<text:span text:style-name="T5">’</text:span>s self documenting nature allows us to dive into the document format, without having to dive into C++. And more significantly, XML allows us to use simple, freely available tools to manipulate the documents themselves.</text:p>
  <text:p text:style-name="P3">In this article we will examine the structure of the format. We will not go into great detail, as Sun have already done so in a lengthy (~400 pages) specification, instead we will focus on using the XML to generate something of potential interest to web developers and content editors.</text:p>
  <text:p text:style-name="P3">As a sidenote it is important to say that OpenOffice is not really ready to be a usable word processor for most people for their every day work. Many components (such as printing and spell checking!) were removed in the migration to open source because these components did not belong to Sun. I expect they will be added back by the open source community as time goes by (and when Sun releases Star Office 6 I expect they will include the proprietary spell checker and print engine again). Also worth noting is that OpenOffice is relatively unstable at the moment. I experienced several crashes and times when my work just could not be loaded while working on this article. Thanks to Daniel Vogelheim of Sun for helping me out when that happened!</text:p>
  <text:h text:style-name="P1" text:level="1">XML Requirements</text:h>
  <text:p text:style-name="P3">Sun had a specific set of requirements when designing the XML format for OpenOffice. Having migrated the project to open source, Sun have done the right thing and also opened their development process, so all of their decisions are open to public discussion on the OpenOffice mailing lists. The short list of requirements can be found on the OpenOffice XML web site, at http://xml.openoffice.org, but we repeat the list here:</text:p>
  <text:h text:style-name="P4" text:level="2">Core Requirements (absolutely necessary):</text:h>
  <text:ordered-list text:style-name="L1">
   <text:list-item>
    <text:p text:style-name="P5">The file format must be capable of being used as an office programs native file format. The format must be &quot;non-lossy&quot; and must support (at least) the full capability of a StarOffice/OpenOffice document. The format is likely to be used for document interchange but that alone is not enough.</text:p>
   </text:list-item>
   <text:list-item>
    <text:p text:style-name="P5">Structured content should make use of XMLs structuring capabilities and be represented in terms of XML elements and attributes.</text:p>
   </text:list-item>
   <text:list-item>
    <text:p text:style-name="P5">The file format must be fully documented and have no &quot;secret&quot; features.</text:p>
   </text:list-item>
   <text:list-item>
    <text:p text:style-name="P5">OpenOffice must be the reference implementation for this file format.</text:p>
   </text:list-item>
  </text:ordered-list>
  <text:p text:style-name="P3">I have skipped the list of &quot;Core Goals&quot; here, but you can find them on the above web site.</text:p>
  <text:p text:style-name="P3">Sun plans for XML to become the default save format. This is not the case at the moment, with OpenOffice build 605 (note that OpenOffice build 613 is available but has many known problems), as I have to select &quot;Save As&quot; to export to XML, but when OpenOffice is &quot;released&quot; expect it to be the default save format.</text:p>
  <text:h text:style-name="P4" text:level="2">Packaging</text:h>
  <text:p text:style-name="P3">OpenOffice documents can be compound - that is they can contain multiple documents of different formats. Sun have examined the different ways of packaging up compound documents using XML and picked ZIP files as the format. Initially they surprised me by this choice, as I have always thought the most standard way of storing binary data in XML was to base64 encoding the data, however their decision is fully explained in detail on the page at http://xml.openoffice.org/package.html, where they talk about ZIPs indexing ability, and the importance of being able to load/save on demand. That page is worth a read to understand this. Ultimately it means that an OpenOffice file will be a Zip file, containing at least one XML file, along with other files of relevance to the document (such as images, and possibly other OpenOffice files).</text:p>
  <text:h text:style-name="P1" text:level="1">The format itself</text:h>
  <text:p text:style-name="P3">OK, so lets get down to the &quot;nitty-gritty&quot; of the XML format itself. The PDF document detailing the format is at http://xml.openoffice.org/xml_specification_draft.pdf, although this is a large document so we hope to distill some of that information here.</text:p>
  <text:h text:style-name="P4" text:level="2">Document Root</text:h>
  <text:p text:style-name="P6">&lt;office:document&gt;</text:p>
  <text:p text:style-name="P3">The document root element is <text:span text:style-name="T6">&lt;office:document&gt;</text:span> (note we will leave out namespaces URIs in this document, please see the above PDF file, or the format itself for namespace URIs, however I will note that OpenOffice appears to use namespaces to its advantage in a very clean manner, unlike some other office suites I could mention).</text:p>
  <text:p text:style-name="P3">According to the specification, this is a generic document root, meaning that all OpenOffice documents have this as a document root. A spreadsheet and a word processing file will have the same document root, allowing us to do some generic processing.</text:p>
  <text:h text:style-name="P4" text:level="2">Metadata</text:h>
  <text:p text:style-name="P6">&lt;office:meta&gt;<text:line-break/> <text:s/>&lt;dc:title&gt;Adventures with Open Office and XML&lt;/dc:title&gt;</text:p>
  <text:p text:style-name="P3">Document metadata is perhaps one of the more interesting features of the OpenOffice XML format. OpenOffice metadata is enclosed in the <text:span text:style-name="T6">&lt;office:meta&gt;</text:span> tag at the top level of the document, immediately following the document root. Sun have chosen Dublin Core for the majority of thier metadata elements. Where Dublin Core did not have an available element for Sun to use, they have created elements in their &quot;meta&quot; namespace. These include the following tags:</text:p>
  <text:unordered-list text:style-name="L2">
   <text:list-item>
    <text:p text:style-name="P7">generator - the application that created this file</text:p>
   </text:list-item>
   <text:list-item>
    <text:p text:style-name="P7">initial-creator - the original author of the file (dc:creator is used for the person who last edited the file)</text:p>
   </text:list-item>
   <text:list-item>
    <text:p text:style-name="P7">creation-date - the date this file was first created (dc:date is used for the date of the last edit)</text:p>
   </text:list-item>
   <text:list-item>
    <text:p text:style-name="P7">keywords - you can edit the keywords in the document properties dialog</text:p>
   </text:list-item>
  </text:unordered-list>
  <text:p text:style-name="P3">A simple example of usage would be in creating a directory listing of files, and displaying their author. Here is a simple perl script that does just that:</text:p>
  <text:p text:style-name="P6">use XML::XPath;<text:line-break/>while (my $file = shift @ARGV) {<text:line-break/> <text:s/>next unless -f $file;<text:line-break/> <text:s/>eval {<text:line-break/> <text:s text:c="2"/>my $xp = XML::XPath-&gt;new(filename =&gt; $file);<text:line-break/> <text:s text:c="2"/>print $_, &quot;: &quot;, $xp-&gt;findvalue(&quot;//dc:creator&quot;), &quot;\n&quot;;<text:line-break/> <text:s/>};<text:line-break/>}</text:p>
  <text:p text:style-name="P3">If we name this script <text:span text:style-name="T6">dcdir</text:span>, the results on a directory full of OpenOffice XML files might be:</text:p>
  <text:p text:style-name="P6">$ dcdir *.sxw<text:line-break/>test.so.xml.sxw: Matt Sergeant</text:p>
  <text:p text:style-name="P3">This will work regardless of the type of OpenOffice file we are examining. With a little more work we can ensure that the file is an XML file of the OpenOffice format (at the moment, this script will crash when it comes across a non-XML file). See Kip Hampton<text:span text:style-name="T5">’</text:span>s regular column here for more details on using XML::XPath.</text:p>
  <text:h text:style-name="P4" text:level="2">Styles</text:h>
  <text:p text:style-name="P6">&lt;office:styles&gt;<text:line-break/> <text:s/>&lt;style:style style:name=&quot;Source Code&quot;<text:line-break/> <text:s text:c="5"/>style:family=&quot;paragraph&quot;<text:line-break/> <text:s text:c="5"/>style:parent-style-name=&quot;Text body&quot;&gt;<text:line-break/> <text:s text:c="3"/>&lt;style:properties fo:font-family=&quot;Courier&quot;<text:line-break/> <text:s text:c="7"/>fo:margin-left=&quot;0.25inch&quot;<text:line-break/> <text:s text:c="7"/>fo:font-size=&quot;11pt&quot;/&gt;</text:p>
  <text:p text:style-name="P3">OpenOffice formats text using text styles. This allows you to modify the styles and have the appearance of your entire document change with it. The XML format saves these styles along with the document. The list of defined styles is enclosed within the <text:span text:style-name="T7">&lt;office:styles&gt;</text:span> element.</text:p>
  <text:p text:style-name="P3">Each style, marked up with the <text:span text:style-name="T6">&lt;style:style&gt;</text:span> element, defines the following in attributes: a <text:span text:style-name="T8">style name</text:span>, a <text:span text:style-name="T8">style family</text:span> (for example a paragraph style or text (inline) style, equivalent to <text:span text:style-name="T6">&lt;div&gt;</text:span> or <text:span text:style-name="T6">&lt;span&gt;</text:span> in HTML), a <text:span text:style-name="T8">parent style</text:span> (because styles inherit their parent style<text:span text:style-name="T5">’</text:span>s attributes) and a <text:span text:style-name="T8">class</text:span>, which is used in the OpenOffice style dialog box to categorize styles.</text:p>
  <text:p text:style-name="P3">Within the style element itself are style properties. These are stored in the attributes of the <text:span text:style-name="T6">&lt;style:properties&gt;</text:span> empty element. As we already mentioned, the properties of a style are inherited from the ancestor styles, and only modified properties are stored (which saves space). The second interesting re-use of public XML schemas occurs here, in the use of XSL FO attributes (Formatting Objects, see last weeks XML.com for an article on XSL FO [EDD: Link]) to define style properties. Theoretically this means we should be able to do some formatting to produce an XSL FO document. Why would we want to do this when we can (or should ultimately be able to) print directly from OpenOffice though? Well I work in content management and application serving (see my previous article on AxKit), and some of my clients would like to be able to use an ordinary word processor to create content. By doing some pre-processing, and then passing the output to FOP or another XSL FO processor, we can generate PDF files automatically from content saved into the web hierarchy (note that this functionality is not available yet, but please contact me at AxKit.com if this sort of thing interests you!).</text:p>
  <text:p text:style-name="P3">It is again worth noting here that where XSL FO did not have an equivalent attribute to the internal implementation in OpenOffice, Sun have defined their own attributes in one of the OpenOffice namespaces.</text:p>
  <text:h text:style-name="P4" text:level="2">Automatic Styles</text:h>
  <text:p text:style-name="P6">&lt;office:automatic-styles&gt;<text:line-break/> &lt;style:style style:name=&quot;P1&quot; style:family=&quot;paragraph&quot; <text:line-break/> <text:s text:c="4"/>style:parent-style-name=&quot;Title&quot;&gt;<text:line-break/> <text:s/>&lt;style:properties fo:font-family=&quot;Arial&quot; <text:line-break/> <text:s text:c="5"/>fo:font-style-name=&quot;&quot; fo:font-size=&quot;18pt&quot; <text:line-break/> <text:s text:c="5"/>fo:font-weight=&quot;bold&quot; fo:text-align=&quot;end&quot; <text:line-break/> <text:s text:c="5"/>style:justify-single-word=&quot;false&quot; <text:line-break/> <text:s text:c="5"/>fo:margin-top=&quot;0.16inch&quot; <text:line-break/> <text:s text:c="5"/>fo:margin-bottom=&quot;0.0835inch&quot;/&gt;</text:p>
  <text:p text:style-name="P3">How do you allow people to use styles, yet allow local modifications to the fonts, weight, and so on? OpenOffice does this by injecting an &quot;Automatic style&quot; between the text and the real style, so that rather than:</text:p>
  <text:p text:style-name="P8">&quot;Some text&quot; -&gt; is of style -&gt; &quot;Title&quot;</text:p>
  <text:p text:style-name="P3">We have:</text:p>
  <text:p text:style-name="P8">&quot;Some text&quot; -&gt; is of auto style -&gt; &quot;P1&quot; -&gt; parent style -&gt; &quot;Title&quot;</text:p>
  <text:p text:style-name="P3">So OpenOffice has a section <text:span text:style-name="T6">&lt;style:automatic-styles&gt;</text:span> following the main styles definitions. Within the main body of the document, only automatic styles are used.</text:p>
  <text:p text:style-name="P3">There are some changes going on in this area at the moment. While the above was true of OpenOffice build 605, the current CVS builds at Sun only use automatic styles when local modifications to the formatting have been made. For example, suppose we have a paragraph and we make some text bold within that paragraph, OpenOffice uses the automatic style to define that hard formatting with a span (see below for information about spans). The result might look something like this:</text:p>
  <text:p text:style-name="P6">&lt;text:p text:style-name=&quot;Text body&quot;&gt;Some text with <text:line-break/>a &lt;text:span text:style-name=&quot;T1&quot;&gt;bold&lt;/text:span&gt; word <text:line-break/>in it.&lt;/text:p&gt;</text:p>
  <text:h text:style-name="P4" text:level="2">Main Body Text</text:h>
  <text:p text:style-name="P6">&lt;office:body&gt;</text:p>
  <text:p text:style-name="P3">Finally we get onto the main body of the document! Of all the sections, this is probably the simplest to follow. We will address each of the major tags in turn. Unlike other sections of an OpenOffice XML file, the body text is free-form, so the following tags can appear anywhere within the &lt;office:body&gt; section.</text:p>
  <text:h text:style-name="P9" text:level="3">Headings</text:h>
  <text:p text:style-name="P6">&lt;text:h text:style-name=&quot;P4&quot; text:level=&quot;1&quot;&gt;The format itself&lt;/text:h&gt;</text:p>
  <text:p text:style-name="P3">Headings are defined using the &lt;text:h&gt; tag. Here we see an automatic style (build 605 export), whereas with later builds that will likely be:</text:p>
  <text:p text:style-name="P6">&lt;text:h text:style-name=&quot;Heading 1&quot;&gt;The format itself&lt;/text:h&gt;</text:p>
  <text:h text:style-name="P9" text:level="3">Paragraphs</text:h>
  <text:p text:style-name="P6">&lt;text:p text:style-name=&quot;P3&quot;&gt;Some text&lt;/text:p&gt;</text:p>
  <text:p text:style-name="P3">Paragraphs of text are defined using the &lt;text:p&gt; tag. We can now start to see how some of the tags in the body are similar to HTML, albeit in a different namespace.</text:p>
  <text:h text:style-name="P9" text:level="3">Spans</text:h>
  <text:p text:style-name="P6">&lt;text:span text:style-name=&quot;T6&quot;&gt;spanned text&lt;/text:span&gt;</text:p>
  <text:p text:style-name="P3">Spans are exactly the same as spans in HTML. They delimit an inline section of a paragraph, applying alternate styling to that small section of text.</text:p>
  <text:h text:style-name="P9" text:level="3">Lists</text:h>
  <text:p text:style-name="P3">Lists are defined using tags of the same name as those used in DocBook to define lists. Specifically these are &lt;text:ordered-list&gt;, &lt;text:unordered-list&gt; and &lt;text:list-item&gt;.</text:p>
  <text:h text:style-name="P9" text:level="3">Graphics</text:h>
  <text:p text:style-name="P3">Vector graphics can be embedded directly into the document with OpenOffice, which is a nice feature, but you will be even more pleased to know that OpenOffice uses SVG as its native vector graphics format. And these vector graphics can just occur directly within the flow of the body document! However Daniel Vogelheim informed me that while mostly correct, the format is only &quot;mostly SVG&quot;, because there are some things that OpenOffice can do with graphics that SVG does not define. So again they have extended the format using elements in their own namespace.</text:p>
  <text:h text:style-name="P1" text:level="1">Putting it all together</text:h>
  <text:p text:style-name="P3">You can find the source XML file for the original of this article (which was written in OpenOffice build 605, saved as XML, and then transformed using the techniques below) here [EDD: Please make it so!].</text:p>
  <text:p text:style-name="P3">Now how can we put the XML generated here to good use? Well what we as XML geeks really want to see is a free WYSIWYG XML editor akin to XMetaL or Adept. And here it is. If we restrict ourselves (or our customers) to using defined styles, OpenOffice can truly be a structured XML editor - without ever knowing you are editing XML!</text:p>
  <text:p text:style-name="P3">By post processing the XML generated by OpenOffice, we can turn tags like <text:span text:style-name="T6">&lt;text:h text:style-name=&quot;P10&quot;&gt;</text:span> into something that is significantly easier to work with from a styling point of view, like <text:span text:style-name="T6">&lt;Heading_3&gt;</text:span>. And for structured XML, we really don<text:span text:style-name="T5">’</text:span>t need all the font and page settings. However some of the style information may be of interest to us, for example <text:span text:style-name="T6">&lt;span&gt;</text:span> tags may point to XSL FO styles, which are almost identical to CSS styles, so these might be useful in trying to get a similar look if we translate the page to HTML.</text:p>
  <text:p text:style-name="P3">We could do this transformation with XSLT, however I prefer XPathScript [EDD: Make that a link to the old article], because it is much more natural to me, in its ability to use variables, define functions and pass parameters.</text:p>
  <text:p text:style-name="P3">Also worth noting is that the code below will only work on current releases of OpenOffice (and probably works best on files saved from build 605), due to the aforementioned changes in the automatic styles functionality.</text:p>
  <text:h text:style-name="P4" text:level="2">From Automatic Style to the Real Style</text:h>
  <text:p text:style-name="P3">First we need to find an XPath expression that will take us from the text<text:span text:style-name="T5">’</text:span>s style name (which will be an automatic style name like &quot;P1&quot;) to the real style name. This is actually rather simple:</text:p>
  <text:p text:style-name="P6">/office:document<text:line-break/>/office:automatic-styles<text:line-break/>/style:style<text:line-break/>[@style:name=&quot;P1&quot;]<text:line-break/>/@style:parent-style-name</text:p>
  <text:p text:style-name="P3">This finds the style:parent-style-name attribute of the automatic style. We will call this the &quot;actual style&quot;.</text:p>
  <text:p text:style-name="P3">We can translate this &quot;actual style&quot; to a string we can use for an element name by removing spaces using XPath<text:span text:style-name="T9">&apos;</text:span>s translate() function. This will change, for example, &quot;Heading 1&quot; to &quot;Heading_1&quot;.</text:p>
  <text:h text:style-name="P4" text:level="2">A name mapping</text:h>
  <text:p text:style-name="P3">Next we need to setup a name mapping, to allow us to translate style names to something we would prefer instead. For example, we translate &quot;Text_body&quot; to &quot;para&quot;.</text:p>
  <text:p text:style-name="P3">Mappings are trivial in Perl (and hence XPathScript), we simply setup a hash:</text:p>
  <text:p text:style-name="P6">my %stylemap = (<text:line-break/> Text_body =&gt; &quot;para&quot;,<text:line-break/>);</text:p>
  <text:h text:style-name="P4" text:level="2">Adding the metadata</text:h>
  <text:p text:style-name="P3">Lets assume for now that we are only interested in Dublin Core metadata. To get this we use the simple XPath <text:span text:style-name="T6">/office:document/office:meta/dc:*</text:span>.</text:p>
  <text:h text:style-name="P4" text:level="2">Transformation Results</text:h>
  <text:p text:style-name="P3">You can access the full stylesheet here [EDD: link please]. The stylesheet can be run using the Perl module XML::XPathScript, which you can download from the CPAN, here [link to http://www.cpan.org/modules/by-module/XML]. This comes with a command line utility, xpathscript.</text:p>
  <text:p text:style-name="P3">The results of <text:s/>this transformation on a simple OpenOffice test document [EDD: link] is:</text:p>
  <text:p text:style-name="P6">&lt;article&gt;<text:line-break/> &lt;artheader xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&gt;<text:line-break/> <text:s/>&lt;dc:title&gt;Test Example&lt;/dc:title&gt;<text:line-break/> <text:s/>&lt;dc:creator&gt;Matt Sergeant&lt;/dc:creator&gt;<text:line-break/> <text:s/>&lt;dc:date&gt;2000-11-13T21:00:01&lt;/dc:date&gt;<text:line-break/> <text:s/>&lt;dc:language&gt;en-US&lt;/dc:language&gt;<text:line-break/> &lt;/artheader&gt;<text:line-break/> &lt;body&gt;<text:line-break/> <text:s/>&lt;Heading_1&gt;Test&lt;/Heading_1&gt;<text:line-break/> <text:s/>&lt;para&gt;Here is some text&lt;/para&gt;<text:line-break/> &lt;/body&gt;<text:line-break/>&lt;/article&gt;</text:p>
  <text:p text:style-name="P3">Much simpler than the original, and we can easily work with this to transform to HTML using more XPathScript, or XSLT.</text:p>
  <text:h text:style-name="P4" text:level="2">Flat Structure</text:h>
  <text:p text:style-name="P3">One thing to note about the above is that the document format follows HTMLs style of headings followed by text. This is not my personal preference. I prefer DocBook which provides the document in a tree structure - sections are contained within a <text:span text:style-name="T6">&lt;sect1&gt;</text:span> tag, and sub-sections are contained within the parent section, rather than just occuring in the main flow of tags. A tree structure makes it easier to manipulate the document. For example, generating a table of contents is a simple recursive loop. But with the flat format in OpenOffice it becomes more complex as we have to maintain information about the current heading levels. An ideal would be to write the stylesheet above to generate a tree shaped document.</text:p>
  <text:p text:style-name="P3">So that is what I did. Doing this requires maintaining some sort of state information about the current heading level. Again this is something XPathScript makes easy, because it is just Perl. You can find a stylesheet that gets very close to generating DocBook from OpenOffice XML files here [EDD: link]. This is the stylesheet I used to in providing this article to XML.com, followed by another transformation to generate HTML. I can do this in one step using AxKit pipelines - just save the file into the web document root, and AxKit transforms it to HTML for me.</text:p>
  <text:h text:style-name="P1" text:level="1">OpenOffice for Content Management</text:h>
  <text:p text:style-name="P3">As I mentioned earlier, the aim, from my point of view, is to use OpenOffice as the editing component for a content management system (specifically, as an add-on for AxKit). The one thing that has thrown a spanner in the works for this in the future plans for OpenOffice is the packaging format. You cannot pass Zip files to an XML parser (except for gnome<text:span text:style-name="T5">’</text:span>s libxml which will accept gzipped XML files, but that is not the same thing). However because AxKit (and other XML application servers such as Cocoon) allow the XML provider to be overriden, we can even reach into those Zip files to extract the XML, before further processing with stylesheets.</text:p>
  <text:p text:style-name="P3">So what is the conclusion of this?</text:p>
  <text:p text:style-name="P3">In November at XML Dev Con in San Jose I gave a talk about the current state of XML applications for web developers in the open source world (plug: I am giving this talk again at XML Dev Con in London next month). My conclusion was that while the server side of XML processing is very much right up there (if not better than) the respective proprietary products, the client/editor side of things was a long way off. But this changes everything. Now you really can edit a richly formatted document in a WYSIWYG word processor and publish it direct to the web using XML to style it to fit your web site. That is a huge step in the right direction for the open source community.</text:p>
  <text:p text:style-name="P3">However there is more to it than my needs. Here are some other thoughts I have had that you could implement:</text:p>
  <text:unordered-list text:style-name="L3">
   <text:list-item>
    <text:p text:style-name="P10">A presentation file could be converted to Suns XML slide format, and onwards to SVG using their toolkit.</text:p>
   </text:list-item>
   <text:list-item>
    <text:p text:style-name="P10">Stylesheets could take other XML formats like docbook or XHTML (or the output from the transformation above) and generate OpenOffice XML format, which would allow a form of round-trip editing.</text:p>
   </text:list-item>
   <text:list-item>
    <text:p text:style-name="P10">The stylesheet could generate XHTML directly, rather than producing an interim format.</text:p>
   </text:list-item>
  </text:unordered-list>
  <text:p text:style-name="P3">More suggestions are welcome, and I look forward to feedback from this article.</text:p>
  <text:h text:style-name="P1" text:level="1">Bio</text:h>
  <text:p text:style-name="P3">Matt Sergeant is Director and CTO of AxKit.com Ltd, a Scotland based startup who produce an open source XML Application Server. AxKit.com specialise in getting content onto the web with a minimum of fuss and maximum efficiency. Matt regularly speaks at XML, Apache and Perl conferences, although he always misses his dogs when he goes away.</text:p>
 </office:body>
</office:document>