Sign In/My Account | View Cart  
advertisement


Listen Print Discuss
From Excel to XML

From Excel to XML

by John E. Simpson
January 09, 2002

Q: How do I convert Microsoft Excel data to XML?

A: The easiest way is to use either a commercial product (such as Infoteria's iMaker for Excel -- $150 for a single developer license) or any of a number of publicly-available Visual Basic/Visual Basic for Applications (VB/VBA) modules (such as Scott Woods's contribution). If you're after not just a straight answer, but a deeper understanding, read on.

Converting a spreadsheet's data to XML is a specific form of the general question, "How do I convert tabular data to XML?" Depending on the spreadsheet in question -- and on the character of the desired XML output -- answering it can be extremely simple or mind-bendingly complex. Let's look at a simple example first. Consider the following table (i.e., "spreadsheet"):

  A B C D
1

1000.23

1206.79

1135.00

1394.61

2 113.19

12.36

159.12

34.78

3

 234.56

678.90

259.00

654.01

This table seems to be rich in content but devoid of anything like meaning. There's no way for a human reader, let alone a dumb machine, to make sense of these numbers. Any XML generated from such a table is likely to be similarly stupid, along these lines:

<table>
   <row>
      <column>1000.23</column>
      <column>1206.79</column>
      <column>1135.00</column>
      <column>1394.61</column>
   </row>
   <row>
      <column>113.19</column>
      <column>12.36</column>
      <column>159.12</column>
      <column>34.78</column>
   </row>
   <row>
      <column>234.56</column>
      <column>678.90</column>
      <column>259.00</column>
      <column>654.01</column>
   </row>
</table>

More likely, in a completed spreadsheet, you're likely to see the rows and columns labeled in some significant way. With labels, the example above might look like this:
  A B C D E
1   1Q2002 2Q2002 3Q2002 4Q2002
2 West

1000.23

1206.79

1135.00

1394.61

3 East 113.19

12.36

159.12

34.78

4 Central

 234.56

678.90

259.00

654.01

We still don't know what the numbers represent, exactly -- sales figures? average birth rates? average test scores? But what each row or column's data represents is becoming clearer. The rows evidently represent geographical subdivisions of a larger whole; the columns, a year's quarterly data. Given this understanding, we might imagine XML output such as

<table>
   <division name="West">
      <quarter name="1Q2002">1000.23</quarter>
      <quarter name="2Q2002">1206.79</quarter>
      <quarter name="3Q2002">1135.00</quarter>
      <quarter name="4Q2002">1394.61</quarter>
   </division>
   [etc.]
</table>

Finally, of course, the table's (or spreadsheet's) author may have helped us further by providing some kind of summary label, which resolves for certain what is represented by the numbers:

  A B C D E
1 Average Daily Ticket Sales by Region
2   1Q2002 2Q2002 3Q2002 4Q2002
3 West

1000.23

1206.79

1135.00

1394.61

4 East 113.19

12.36

159.12

34.78

5 Central

 234.56

678.90

259.00

654.01

Now our XML output can take a more helpful form, shorn of formatting and display notions like tables, rows, and columns:

<avg_ticket_sales>
   <region name="West">
      <quarter name="1Q2002">1000.23</quarter>
      <quarter name="2Q2002">1206.79</quarter>
      <quarter name="3Q2002">1135.00</quarter>
      <quarter name="4Q2002">1394.61</quarter>
   </region>
   [etc.]
</avg_ticket_sales>

I think most readers will intuitively grasp how this conversion might work, given the presence of a human to control the process. Not so obvious, perhaps, is how a machine might make the leap from (say) a root element named table to one named avg_ticket_sales.

Most automated conversion programs provide some mapping mechanism which instructs the software which cells are to be translated to element or attribute names and which to element content or attribute values. For example, you might have to supply a mapping-configuration file whose contents look something like

root == element(avg_ticket_sales)
row(1) == ignore
row(2) == element("quarter")attribute("name")
col(A) == element("region")attribute("name")

(This isn't real; I'm just demonstrating how, in general, the mapping might work.)

Comment on this article Share your experience with converting tabular data to XML in our forums.
Post your comments

If the target XML vocabulary comes with a DTD or Schema, the task becomes even easier (well, as such things go). These formal descriptions of a conforming document's allowable structure are relatively easy to automate in a drag-and-drop user interface, like a hierarchical tree of nodes. Such a tool would enable the user to define the table-to-DTD mapping simply by, as it were, "drawing a line" from some portion of a table or spreadsheet to its XML counterpart.

If you're interested in learning more about mapping table-type structures to XML structures, I encourage you to refer to any of Ron Bourret's relevant work. (XML.com published Bourret's thorough, multi-part treatment in May 2001: "Mapping Databases to DTDs".) Bourret's work on this and other XML-related subjects is worth knowing, as you can see from his web site.

XML naming constraints revisited

In last month's column, I discussed why element and attribute names cannot begin with a digit or other (mostly non-alphabetic) character. I talked some in that column about XML's having inherited the constraint from SGML, but it was largely speculative on my part (since I was barely in the neighborhood when XML was being born, let alone SGML).

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

After that column appeared, Rick Jelliffe sent me a note listing some real reasons for the limitation -- that is, more real than "because SGML says so, too". One of these reasons should be so obvious to an application developer that I'm embarrassed not to have thought of it. Once you open the door to an element or attribute named, say, 30DaySpan, then you must also allow an element or attribute named simply 30. And then performing mathematical or Boolean operations would become an excruciating guessing game for a human, let alone an XML parser: Does the expression 30 * 5 mean the literal number 30 times 5, or the value of the 30 element (whatever that value might be) times 5 (or, for that matter, the value of the 30 element times the value of the 5 element, or the literal value 30 times the value of the 5 element)? Does 200 > 30 evaluate to Boolean true in all cases, or does it depend on the value of the 30 element (or the 200 element, etc.)?

Jelliffe offered another reason: allowing unpronounceable names (like, say, %^*@, whether followed by something more pronounceable or not) would play havoc with speech synthesizers used by blind readers of the XML document in question.

Finally, Jelliffe said, there's a slippery-slope objection to allowing leading digits in particular. Once you allow digits, then someone will want dollar signs, and once you allow dollar signs then someone will come up with a reason for "needing" leading whitespace as part of an element or attribute name.

Besides, as Jelliffe concluded, in order to accomplish most of what people who want non-alphabetic leading characters really want, all you need to do is remove the requirement that ID-type attribute values must be legitimate XML names. Both XML Schema and Jelliffe's Schematron provide this flexibility.

Many thanks to Rick Jelliffe for filling in the gaps.


Comment on this articleShare your experience with converting tabular data to XML in our forums.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • XML to Excel
    2002-09-25 11:28:57 mathew jose [Reply]

    Can somebost please explain a real case where we need to convert XML which we have in some server page and getting that XML to an excel sheet in a clients desktop.
    To clarify the XML is generated by a java class in the server and from a page in the browser we need to be able to export that XML to the client machines excel application.
    Thanks in advance
    mat

    • XML to Excel
      2004-02-19 20:46:23 jeyanthi Murugan [Reply]

      How can i convert Xml to Excel sheet?


      Jeyanthi.

  • Food for thought?
    2002-03-11 00:49:45 wayne ke [Reply]

    Food for thought? I totally agree with the previous commenter. Sorry for being rude to the author who spent the effort to write the article. I suggest that the article be removed - save time and paper for someone who thought it might be useful.

  • And the answer is...
    2002-01-31 12:18:00 Chris Lovett [Reply]

    Excel 2000 and XP has XML support built in [ xml = ActiveSheet.Range("A1:E34").Value(xlRangeValueXMLSpreadsheet) ]. See also http://msdn.microsoft.com/library/en-us/dnexcl2k2/html/odc_xlws.asp and http://msdn.microsoft.com/library/en-us/dnexcl2k2/html/odc_xmlss.asp

  • I see the question but...
    2002-01-31 06:32:14 Tom Boshell [Reply]

    The question is obvious, but what was the answer. I think I may have missed it. Or maybe that is how the Q&As are done, I will have to check further. There was a product plug, suggestions on VB solutions, and some food-for-thought. I know that it is a tough topic to tackle, with no one single obvious solution. However, there has to be some kind of thinking strategy or general guidelines to tackle this type of problem. Ahh, never mind. The more I think about it the more I believe that this article was intended as 'food-for-thought.'