XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


DIDL: Packaging Digital Content

May 30, 2001


In this article we detail the reasons for undertaking the development of a digital packaging standard and describe in depth a package manifest scheme that potentially addresses the enumerated needs. In doing so, we show how such a scheme effectively disassociates the notion of content item from individual files. We conclude by describing an XML vocabulary, the Digital Item Declaration Language (DIDL), a recently released first working draft from ISO/MPEG that will, when completed, provide standard means for packaging digital content.

The Need for a Raw Content Description Standard

Today's popular Internet applications generally fall short in their ability to transfer raw resource content. The content of a web page for example may be defined as the collection of discrete resources -- bitmaps, JPEG images, text blocks, and so on -- that are aggregated within some predetermined format. The components of the web page may possess attributes and relationships that, while not explicitly part of the final, viewable form, may be critical in generating the displayed result. Information accompanying a JPEG image, for example, could be utilized in creating a photo caption. Information about the relationships among a group of images could be utilized in locating the images on the page. If the web page is generated from a script, information on the sizes of the various images could be utilized to decide which images to begin downloading first.

Comment on this article Does DIDL sound like a good idea? Are there other approaches which might DIDL could borrow from, or which would be complementary?
Post your comments

Describing raw content as a structured collection of resources in a standard manner requires: (1) a standard and flexible metadata format; (2) a standard way to aggregate multiple resources of various types; and (3) a standard way to express structural relationships within the resource collection. Associating standard-form metadata with a given file allows semantic descriptions and application-specific behavior to be directly associated with content contained in the file. Currently, ad hoc metadata schemes are employed in several Internet applications. In peer networks for example, long file names are often used as crude substitutes for semantic descriptions of file contents. File headers are also utilized; but header formats are largely designed to document only the technical rather than semantic contents of a particular file. And in spite of the widespread use of headers, digital content in the form of a standalone file currently cannot be delivered to any client or rendering platform without a significant amount of user intervention. Intervention typically takes the form of directing a browser to some web site, selecting some resource URI for download or streaming, and, then, if it's a file, directing the downloadable material to a directory. Rendering or viewing the content in many cases includes being informed by the client system that a required plug-in or player is either not installed or not updated, requiring the user to search the Web for the right rendering engine or viewer.

The greatest limitation of multimedia header and file formatting schemes is that they are inherently incapable of describing multicomponent collections. XHTML, for example, while serving well as an output format for multicomponent content, is not adequate for describing the raw digital components and their relationships. Standard ways of aggregating multiple digital components in an output-agnostic way are required simply because things like web pages and other display types are composed of many items.

Finally, the ability to describe relationships (this goes with that component, this component contains that component, etc.) in a formal way is required to associate things like images with their corresponding descriptive text. It also could be used to describe component structures that would otherwise be difficult to describe with textual metadata.

Case In Point: The Family Album in Cyberspace

Consider the digital family scrapbook. The scrapbook may be composed of digital photos, video, and text documents. The scrapbook designer needs a straightforward way to represent the individual digital components as a single entity, to annotate the components, and to specify the relationships among the components ("this video and these pictures were taken on Bob and Emily's last trip to Florida"). Having a formal annotation scheme would allow other family members to add new annotations without disturbing the original content ("caption this picture"). It would also permit the setting of intermedia anchor points. This would be especially useful for long videos containing sequences of special interest ("here's the part where Bob fell off the boat"). All of the technical information required by the viewing client, like the media format of each component, sizes of the binary elements, and so on, would need to be included as transparently as possible. Since the collection is likely to be viewed by friends and family on all kinds of computing platforms, a user-transparent way to package together multiple format versions of the same content is also critical for minimizing user intervention in obtaining the album ("I need the QuickTime version of this video").

Another scrapbook need that exposes additional packaging requirements is the case of content that requires encryption, identification, or formal rights declarations to be associated with some specific source component. In the scrapbook example, one might want to associate a specific picture or some other component with a formal copyright statement. If one of the pictures was a derivative of some other photo, identifying it as a copy and also identifying the original source would be valuable. Noting what specifically constituted the original content would be critical in order to maintain the original material as inviolate and reconstructable under long-term usage and storage.

Perhaps the strongest motivation for the use of digital packages emerges from the distinction between the scrapbook package manifest and the resources. While it would be occasionally necessary to actually encapsulate small resources (like thumbnail images) in the manifest itself, most resources would be included in the package by reference. In the digital scrapbook, each component would ideally be accompanied not only by a detailed description of its media type but also the URI for obtaining the platform-specific browser/player plug-in capable of rendering the media type. This would be an especially critical feature in the design of a scrapbook for an extended family in which the various digital components of the collection were located in different, fixed archives in geographically far-flung locations. The highly compact nature of the manifest would allow it to be rapidly transmitted and edited without dragging around the whole collection. The content of the scrapbook would thus be defined by the scrapbook package manifest description rather than the collection components themselves.

Metadata associated with each component and component relationship would also allow the viewer to execute searches on the package manifest (perhaps employing regular expressions) for specific components and, thus, to download or view only a subset of the materials referenced by the package ("Retrieve only the pictures of Bob and Emily when they lived in Ohio").

Finally, since a given package manifest would describe only the structural and semantic relationships of the components in the scrapbook collection in a completely output-agnostic way, formatting for renderable output would be relegated to the application software, or to a transformation or stylesheet. This would allow a multitude of differently-formatted scrapbooks to be generated from the same package manifest.

Pages: 1, 2

Next Pagearrow