XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Converting Between XML and JSON
by Stefan Goessner | Pages: 1, 2, 3

Preserving order

JSON is built on two internal structures:

  • A collection of name/value pairs with unique names (associative array)
  • An ordered list of values (array)

An attempt to map a structured XML element...

<e>
  <a>some</a>
  <b>textual</b>
  <a>content</a>
</e>

...to the following JSON object:

"e": {
  "a": "some",
  "b": "textual",
  "a": "content"
}

yields an invalid result, since the name "a" is not unique in the associative array. So we need to collect all elements of identical names in an array. Using the patterns 5 and 6 above yields the following result:

"e": {
  "a": [ "some", "content" ],
  "b": "textual"
}

Now we have a structure that doesn't preserve element order. This may or may not be acceptable, depending on whether the above XML element order matters.

So, our general rules of thumb are:

A structured XML element can be converted to a reversible JSON structure, if

  • all subelement names occur exactly once, or …
  • subelements with identical names are in sequence.

and

A structured XML element can be converted to an irreversible but semantically equivalent JSON structure, if

  • multiple homonymous subelements occur nonsequentially, and …
  • element order doesn't matter.

If none of these two conditions apply, there is no pragmatic way to convert XML to JSON using the patterns above. Here, SVG and SMIL documents, which implicitly rely on element order, come to mind.

Semi-Structured XML

XML documents can contain semi-structured elements, which are elements with mixed content of text and child elements, usually seen in documentation markup. If the textual content is contiguous, as in:

<e>
  some textual
  <a>content</a>
</e>

we can apply pattern 7 and yield the following for this special case:

"e": {
  "#text": "some textual",
  "a":  "content",
}

But how do we convert textual content mixed up with elements? For example:

<e>
  some
  <a>textual</a>
  content
</e>

It obviously doesn't make sense in most cases to collect all text nodes in an array,

"e": {
  "#text": ["some", "content"],
  "a": "textual"
}

that doesn't preserve order or semantics.

So the best pragmatic solution is to treat mixed semi-structured content in JSON the same way as XML treats CDATA sections -- as unknown markup.

"e": "some <a>textual</a> content"

Another rule is that XML elements with

  • mixed content of text and element nodes and
  • CDATA sections

are converted to a reversible JSON string containing the complete XML markup according to pattern 2 or 4.

Pages: 1, 2, 3

Next Pagearrow