Handling Binary Data in XML Documents

July 24, 1998

Lisa Rein

Jonathan Borden is a practicing neurosurgeon and assistant professor of neurosurgery at Tufts University whose background is in artificial intelligence, neuroscience and image processing. Borden's company JABR Technology, just launched the New England Neurosurgery Network which links New England hospitals and clinics with a network of neurosurgeons via traditional Internet and cable modems.

The company has just launched "Synapse", a web-based transaction processing medical multimedia system. Synapse invokes rule-sets on patterns of information that feed into the system via the web. It processes patient information, histories, lab data, images etc.

"In the medical world information takes many shapes. Text-based information including histories, descriptions, laboratory data and opinions are naturally represented in XML," explains Borden. "The only problem is that not all things can be expressed in a vector format. In most cases, medical images have a natural raster/bitmap format and so there is a real need to deal with the bitmap data type in XML. For interchange, we need a precise graphic but we also want to include additional structural information for use by other applications such as databases."

Images including x-rays, scans and photographs are primarily bitmaps, tagged with critical information including the patient's name, id, condition under which the image was taken, as well as body part imaged and orientation.

This has presented Borden with a challenge: to figure out how to convey binary image data in XML documents. Borden could point to the image data as files, of course. But he needed to include pixels in the same stream as the vector data because it might be an overlay on the pixel image. The overlay may not have any logical meaning by itself.

As Borden explains, "I'm not so concerned with finding a way to embed a link to an image in a web page as I am in standardizing on an image format which maintains metadata. Bitmaps (gif and jpeg) don't. TIFF and DICOM (Digital Image Communication in Medicine's standard medical format) do."

Borden would like to be able to embed binary data along with the XML tags that describe it. But can you just put "binary" data into an XML document? Meaning, "Could we just take an arbitrary stream of bits and plunk it into a document?" The answer is no because all the bits in an XML document must be legal characters in legal syntax in the same character encoding as the rest of the document. However, you can encode binary data into characters, and then put the result into an XML document.

Encoding Binary Data

Let's look at an example of encoding data in an XML document. Listing 1 corresponds to a 10 MB structure.


 <x-ray> <patient> <id>123</> <lastname>jones</>
        </> <diagnosis>abdominal pain</> <width>2048</>
        <height>2500</> <bit-depth>16</> <pixels> ... </>

The PIXELS element would contain the binary data encoded in some notation. (The notation might be indicated using an attribute.) The most common notation to use is Base64. (Base 64 is specified in an RFC.)

An arbitrary bitstream encoded in Base64 can be specified in an XML document as the content of an element, as long any special characters such as "<" are represented as entities ("&lt;"). An application reading the document would need to look for the element that contains the binary data, and decode the Base64 string to recover the original binary stream.

That's fine for two cooperating applications, but not all applications will be able to recognize which elements should have Base64 encoded binary data, which have hex-encoded binary data, or which simply have character strings. It would be better if the data were self-describing.

For this reason, the XML-Data paper proposed a "dt" attribute that would allow you to write something like:

<stuff dt:dt="binary.base64">84592gv8Z53815Zb82bA68g</stuff>

This example signals to any application that the contents are a binary stream encoded using the Base64 notation.

Although encoding binary data in base64 is one solution for handling binary data from within XML documents, it's not a very efficient way to represent large amounts of such binary data.

Binary Data as Objects

Another approach is to handle the images as external objects containing binary data and manipulate them through a common object model. A scripting application would be able to access elements of the XML document as well as these external objects and map them together. Essentially, this collection of objects can be considered a compound document.

Compound documents often contain objects that don't actually exist in their assembled form until they reach their destination. None of the current mechanisms of handling binary data in XML can efficiently represent compound documents. Multimedia systems, for instance, are just one example of systems requiring the processing of a number of large binary files, often simultaneously. Borden's Synapse system is based on compound document technology that integrates medical images with voice dictations, drawings, and textual reports. In XML-enabling this system, Borden had to devise a mechanism to represent compound documents. His solution for handling binary data was to handle external references using multipart/related MIME compound documents.

Multipart/related MIME type (RFC 2112)

MIME is a standard originally developed for transmitting email messages that were not just ASCII text but included different "types" of information. HTTP servers also use MIME "types" as a wrapper for different types of information sent to a web browser.

The multipart/related MIME type was developed to represent compound documents in a straightforward and efficient manner. Individual parts represent individual streams in the compound document. Individual parts may themselves have the multipart/related MIME type and hence represent sub-storages of the compound document.

The difference between 'multipart/related' and other multipart MIME types such as multipart/mixed and multipart/form-data, is that each part of a multipart message has a logical relationship to the other parts. This is, in essence, a compound document.

This MIME type can be used to incorporate all the different parts of a document (both xml and 'external' binary data streams -- i.e. pictures, audio, video) into a single 'multipart/related' data structure that can be transmitted and archived as a single unit (much as the multipart/MIME type is used to include E-mail attachments).

Multipart/related defines a name (the Content-ID: XXX) for each part and references these parts using the special identifier "cid:XXX". Typically the first object in a message is in XML format. Other objects may be GIF, JPEG, or other binary formats. The special link "cid:XXX" (where "XXX" refers to the specific content-id: XXX tag of that part) is used to provide a quasi-internal link from within XML to an "external" binary data stream. The whole package travels together so that once the message gets through a firewall, the links remain valid. Listing 2 shows the structure of this message.


 Content-Type: multipart/related; boundary=--xxxxxxxxxx; --xxxxxxxxxx Content-Type:
        text/xml Content-ID: Contents <?xml version="1.0" ?> <objectDef uid="?">
        <value><i4>1024</i4></value> </property>
        <value><i4>1024</i4></value> </property>
        <value><i4>16</i4></value> </property>
        <value><i4>16</i4></value> </property>
        	<property><name>Pixels</name> <value><stream href=cid:Pixels
        /></value> </property> --xxxxxxxxxx Content-Type: application/binary
        Content-Transfer-Encoding: Little-Endian Content-ID: Pixels Content-Length: 524288
        ....binary data here... --xxxxxxxxxx 

SMIL and Binary Data

SMIL, the Synchronized Multimedia Integration Language, is a good example of an application that uses XML to reference an external binary data type.

In SMIL, 99.9% of the data is not text (i.e. video, audio etc.) XML is only used to format the data for the screen (i.e. tell the media where to go).

Listing 3 shows is the first example in the SMIL spec:


 <smil> <head> <switch> <layout type="text/css"> [region="r"] {
        top: 20px; left: 20px } #i2 { top: 30px; left: 30px } </layout> <layout>
        <region id="r" top="20" left="20" /> </layout> </switch> </head>
        <body> <seq> <img region="r" src="" dur="10s" />
        <img id="i2" src="" dur="5s" /> </body> </smil> 

Now we convert this to multipart/related, as shown in Listing 4.


 Content-Type: multipart/related; boundary="--- part separator" --- part separator
        Content-Type: text/smil Content-ID: smil-message <smil> ... <body> <seq>
        <img region="r" src="cid:test" dur="10s" /> <img id="i2" src="cid:test2" dur="5s"
        /> </seq> </body> <smil> ---part separator Content-Type:
        application/jpeg Content-ID: test ... jpeg data goes here ---part separator Content-Type:
        application/gif Content-ID: test2 ... gif data goes here ---part separator 


For now, XML developers will have to explore creative workarounds for supporting binary data in XML documents. There isn't an ideal solution for large binary data files today. In the future, optimized APIs and standardized coding practices for MIME and XML-based compound documents may begin to make things easier.