CDATA Sections and Binary Data

A lot of people would like a way to package up any old binary data and include it in an XML file. The conventional XML answer to this would be to store it separately and point at it with an unparsed entity. Which is fine, but that's not what people want; they want to include the data right in the file, which is a reasonable way to go if you're going to transmit it over the network.

When you look at CDATA, you might get the impression that you could maybe jam your binary data in a CDATA section. You'd be right, but you'd have to guarantee that it never included a byte sequence that looks like ]]>. There is a trick you can use to get around that, but it's awkward:

<![CDATA[Use *two* CDATA sections when you need to embed a "]]]]><![CDATA[>" in the data ]]>

Another way to go would be to encode the binary data in base64 or some other technique that's guaranteed never to contain a <; but if you're going to do that, you don't need a CDATA section; any old element would do. Perhaps this is a good use for XML's notation attributes.

Back-link to spec

Copyright © 1998, Tim Bray. All rights reserved.