
From Excel to XML
by John E. SimpsonJanuary 09, 2002
Q: How do I convert Microsoft Excel data to XML?
Converting a spreadsheet's data to XML is a specific form of the general question, "How do I convert tabular data to XML?" Depending on the spreadsheet in question -- and on the character of the desired XML output -- answering it can be extremely simple or mind-bendingly complex. Let's look at a simple example first. Consider the following table (i.e., "spreadsheet"):
| A | B | C | D | |
| 1 |
1000.23 |
1206.79 |
1135.00 |
1394.61 |
| 2 | 113.19 |
12.36 |
159.12 |
34.78 |
| 3 |
234.56 |
678.90 |
259.00 |
654.01 |
This table seems to be rich in content but devoid of anything like meaning. There's no way for a human reader, let alone a dumb machine, to make sense of these numbers. Any XML generated from such a table is likely to be similarly stupid, along these lines:
<table>
<row>
<column>1000.23</column>
<column>1206.79</column>
<column>1135.00</column>
<column>1394.61</column>
</row>
<row>
<column>113.19</column>
<column>12.36</column>
<column>159.12</column>
<column>34.78</column>
</row>
<row>
<column>234.56</column>
<column>678.90</column>
<column>259.00</column>
<column>654.01</column>
</row>
</table>
| A | B | C | D | E | |
1 |
1Q2002 | 2Q2002 | 3Q2002 | 4Q2002 | |
2 |
West |
1000.23 |
1206.79 |
1135.00 |
1394.61 |
3 |
East | 113.19 |
12.36 |
159.12 |
34.78 |
| 4 | Central |
234.56 |
678.90 |
259.00 |
654.01 |
We still don't know what the numbers represent, exactly -- sales figures? average birth rates? average test scores? But what each row or column's data represents is becoming clearer. The rows evidently represent geographical subdivisions of a larger whole; the columns, a year's quarterly data. Given this understanding, we might imagine XML output such as
<table>
<division name="West">
<quarter
name="1Q2002">1000.23</quarter>
<quarter
name="2Q2002">1206.79</quarter>
<quarter
name="3Q2002">1135.00</quarter>
<quarter
name="4Q2002">1394.61</quarter>
</division>
[etc.]
</table>
Finally, of course, the table's (or spreadsheet's) author may have helped us further by providing some kind of summary label, which resolves for certain what is represented by the numbers:
| A | B | C | D | E | |
| 1 | Average Daily Ticket Sales by Region | ||||
| 2 | 1Q2002 | 2Q2002 | 3Q2002 | 4Q2002 | |
| 3 | West |
1000.23 |
1206.79 |
1135.00 |
1394.61 |
| 4 | East | 113.19 |
12.36 |
159.12 |
34.78 |
| 5 | Central |
234.56 |
678.90 |
259.00 |
654.01 |
Now our XML output can take a more helpful form, shorn of formatting and display notions like tables, rows, and columns:
<avg_ticket_sales>
<region name="West">
<quarter name="1Q2002">1000.23</quarter>
<quarter name="2Q2002">1206.79</quarter>
<quarter name="3Q2002">1135.00</quarter>
<quarter name="4Q2002">1394.61</quarter>
</region>
[etc.]
</avg_ticket_sales>
I think most readers will intuitively grasp how this conversion
might work, given the presence of a human to control the
process. Not so obvious, perhaps, is how a machine might
make the leap from (say) a root element named table to
one named avg_ticket_sales.
Most automated conversion programs provide some mapping mechanism which instructs the software which cells are to be translated to element or attribute names and which to element content or attribute values. For example, you might have to supply a mapping-configuration file whose contents look something like
root == element(avg_ticket_sales)
row(1) == ignore
row(2) == element("quarter")attribute("name")
col(A) == element("region")attribute("name")
(This isn't real; I'm just demonstrating how, in general, the mapping might work.)
|
|
| Post your comments |
If the target XML vocabulary comes with a DTD or Schema, the task becomes even easier (well, as such things go). These formal descriptions of a conforming document's allowable structure are relatively easy to automate in a drag-and-drop user interface, like a hierarchical tree of nodes. Such a tool would enable the user to define the table-to-DTD mapping simply by, as it were, "drawing a line" from some portion of a table or spreadsheet to its XML counterpart.
If you're interested in learning more about mapping table-type structures to XML structures, I encourage you to refer to any of Ron Bourret's relevant work. (XML.com published Bourret's thorough, multi-part treatment in May 2001: "Mapping Databases to DTDs".) Bourret's work on this and other XML-related subjects is worth knowing, as you can see from his web site.
XML naming constraints revisited
In last month's column, I discussed why element and attribute names cannot begin with a digit or other (mostly non-alphabetic) character. I talked some in that column about XML's having inherited the constraint from SGML, but it was largely speculative on my part (since I was barely in the neighborhood when XML was being born, let alone SGML).
|
Also in XML Q&A | |
After that column appeared, Rick Jelliffe sent me a note listing
some real reasons for the limitation -- that is, more real than
"because SGML says so, too". One of these reasons should be so obvious
to an application developer that I'm embarrassed not to have thought
of it. Once you open the door to an element or attribute named, say,
30DaySpan, then you must also allow an element or
attribute named simply 30. And then performing
mathematical or Boolean operations would become an excruciating
guessing game for a human, let alone an XML parser: Does the
expression 30 * 5 mean the literal number 30 times 5, or
the value of the 30 element (whatever that value
might be) times 5 (or, for that matter, the value of the
30 element times the value of the 5 element,
or the literal value 30 times the value of the 5
element)? Does 200 > 30 evaluate to Boolean
true in all cases, or does it depend on the value of the
30 element (or the 200 element, etc.)?
Jelliffe offered another reason: allowing unpronounceable names
(like, say, %^*@, whether followed by something more
pronounceable or not) would play havoc with speech synthesizers used
by blind readers of the XML document in question.
Finally, Jelliffe said, there's a slippery-slope objection to allowing leading digits in particular. Once you allow digits, then someone will want dollar signs, and once you allow dollar signs then someone will come up with a reason for "needing" leading whitespace as part of an element or attribute name.
Besides, as Jelliffe concluded, in order to accomplish most of what people who want non-alphabetic leading characters really want, all you need to do is remove the requirement that ID-type attribute values must be legitimate XML names. Both XML Schema and Jelliffe's Schematron provide this flexibility.
Many thanks to Rick Jelliffe for filling in the gaps.
Share your experience with converting tabular data to XML in our forums.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- XML to Excel
2002-09-25 11:28:57 mathew jose [Reply]
Can somebost please explain a real case where we need to convert XML which we have in some server page and getting that XML to an excel sheet in a clients desktop.
To clarify the XML is generated by a java class in the server and from a page in the browser we need to be able to export that XML to the client machines excel application.
Thanks in advance
mat
- XML to Excel
2004-09-20 05:09:06 mst02op [Reply]
please, tell me convert excel to xml,
Edi
- XML to Excel
2004-02-19 20:46:23 jeyanthi Murugan [Reply]
How can i convert Xml to Excel sheet?
Jeyanthi.
- XML to Excel
- XML Converter - Excel to XML, Access to XML
2002-05-10 14:46:34 Rustem Sabitov [Reply]
Just take a look at this:
http://rustemsoft.com/XMLConverter.htm
- XML Converter - Excel to XML, Access to XML
2005-07-11 22:07:52 div001 [Reply]
Hi,
how to convert excel to xml, without using any API.
Regards
Divya
- XML Converter - Excel to XML, Access to XML
- Food for thought?
2002-03-11 00:49:45 wayne ke [Reply]
Food for thought? I totally agree with the previous commenter. Sorry for being rude to the author who spent the effort to write the article. I suggest that the article be removed - save time and paper for someone who thought it might be useful.
- And the answer is...
2002-01-31 12:18:00 Chris Lovett [Reply]
Excel 2000 and XP has XML support built in [ xml = ActiveSheet.Range("A1:E34").Value(xlRangeValueXMLSpreadsheet) ]. See also http://msdn.microsoft.com/library/en-us/dnexcl2k2/html/odc_xlws.asp and http://msdn.microsoft.com/library/en-us/dnexcl2k2/html/odc_xmlss.asp
- I see the question but...
2002-01-31 06:32:14 Tom Boshell [Reply]
The question is obvious, but what was the answer. I think I may have missed it. Or maybe that is how the Q&As are done, I will have to check further. There was a product plug, suggestions on VB solutions, and some food-for-thought. I know that it is a tough topic to tackle, with no one single obvious solution. However, there has to be some kind of thinking strategy or general guidelines to tackle this type of problem. Ahh, never mind. The more I think about it the more I believe that this article was intended as 'food-for-thought.'
- How do I convert Microsoft Excel data to XML?
2002-01-25 02:28:23 Farooq Hameed [Reply]
But the question isn't answered in the article yet. I really am interested to know How do I convert Microsoft Excel data to XML?
- How do I convert Microsoft Excel data to XML?
2004-03-29 19:03:17 Mike Levin [Reply]
Just came across this question in researching the problem myself. The answer is...
Save As HTML from Excel. Strip out everything but the table structure. If it's well formed, then it's XML. The rest is searching & relpacing the TD's with elements of your choosing.
- How do I convert Microsoft Excel data to XML?
2005-02-08 18:57:02 phpwebdev@gmail.com [Reply]
Thanks
This help too much
DASA
- How do I convert Microsoft Excel data to XML?
2006-12-12 03:39:44 ammuchellam [Reply]
- How do I convert Microsoft Excel data to XML?
2007-02-20 03:33:43 Bharti [Reply]
- How do I convert Microsoft Excel data to XML?
- How do I convert Microsoft Excel data to XML?
- How do I convert Microsoft Excel data to XML?
- How do I convert Microsoft Excel data to XML?
