XMP Lowdown

September 22, 2004

The Extensible Metadata Platform (XMP) is a specification describing RDF-based data and storage models for metadata about documents in any format. The specification includes information about embedding XMP in text files such as HTML and SVG/XML; image formats such as JPEG, TIFF, and GIF; and Adobe formats such as Illustrator, Photoshop, and Acrobat files. While its use of the RDF metadata format comes with some caveats, RDF enthusiasts should be pleased to see that the company making the most popular image manipulation tools has made those tools capable of reading and writing RDF metadata about these images.

Where Does It Come From?

Adobe announced XMP in September of 2001, although it had been available in Acrobat 5 since the spring of that year. The Adobe applications that support XMP web page lists 11 products and asserts that "Eventually all Adobe applications will use XMP."

The title page of the XMP Specification calls itself just that: a specification. While the first sentence of the Preface says that XMP "provides a standard format for the creation, processing, and interchange of metadata," it itself is not a standard in the sense of being connected to any standards body, and it is written and owned by a single corporation. (When an Adobe representative speaking at a Seybold conference mentioned the date that this "standard" was "submitted," I raised my hand and said, "Submitted to who?" He didn't have much of an answer.)

XMP's basis in RDF, however, provides plenty for developers interested storing arbitrary metadata in a non-proprietary standard format.

What Are They Working On?

Adobe won't reveal any future plans about XMP, other than that they update the free SDK in conjunction with new product releases. As the quote from the Adobe web page mentioned above demonstrates, they plan to build XMP support into their full product line.

What Else Is There? (Related Standards Landscape)

Members of the Semantic Web community have experimented with standardizing an RDF-based vocabulary for describing pictures, sometimes from their own cameras for their own personal use (1, 2, 3). The IDEAlliance standards group, with their general focus on the publishing industry, has two groups working on metadata for digital media: the PRISM group has developed a standard for magazine and journal metadata that can be expressed using RDF or non-RDF XML. Additionally, the Digital Image Submission Criteria group defines metadata fields that can be expressed as RDF.

The RDF basis, or at least friendliness, of these efforts means that they'll mix and match easily enough with each other and with XMP. If two different terms from two of these efforts nearly but don't perfectly describe a piece of your metadata, you'll want to pick one over the other to keep your metadata consistent. (This shouldn't be much of a problem, given the common core of most of these metadata efforts.) Still, mixing different RDF vocabularies is much easier than mixing different DTDs or schemas. That's part of the point of RDF.

How Does It Work?

The metadata vocabulary offered by XMP is not huge, and is not its greatest strength. You'll probably want to augment it with work from one of the metadata standards mentioned above and perhaps your own custom vocabulary. XMP's greatest strength is the consistent framework it provides for embedding metadata into binary formats not normally amenable to having arbitrary metadata added, particularly highly compressed binary formats such as JPEG and Adobe's Acrobat PDF format.

The XMP specification (in PDF format, naturally) was last updated in January of this year, but shows no version number. It describes the subset of RDF that it uses to store metadata, the XML serialization of that metadata, and the Dublin Core properties and other specialized Adobe namespaced properties handled natively by Adobe products supporting XMP (what the spec calls "schemas," although no actual RDF schemas are provided). It also describes the mechanics of embedding this metadata into various file formats.

To a non-technical user of Adobe products, XMP metadata is just another dialog box to fill out. For example, when using Acrobat Professional 6.0, picking Document Metadata from the Advanced menu displays the Document Metadata dialog box, which offers Description and Advanced choices of metadata display. Selecting Description displays a set of fields that let you enter basic metadata such as the document's Title, Author, and Description. Selecting Advanced displays a tree widget with four default branches: PDF Properties, XMP Core Properties, XMP Media Management Properties, and Dublin Core Properties. (Changing the View setting from Summary to Source displays the metadata as RDF/XML.) Expanding any of these branches shows properties belonging to these categories, including any values filled out on the Description display of the metadata.

Across the bottom of the dialog box are Replace, Load, and Save buttons. Save saves an RDF/XML file (including the extra processing instructions used to delimit a block of XMP metadata as described in the XMP spec) with all the properties defined for the document in the dialog box. The Replace button replaces the metadata with the metadata in an RDF/XML file that you specify. The Load button adds any metadata in a specified XMP file to the metadata already stored with the document.

The following screenshot shows the result of such a load. I had added a value for a goofinessFactor property from a namespace based on my own snee.com domain name to the RDF/XML file to test the addition of arbitrary metadata to the properties that Acrobat Professional already knew about. Acrobat Professional handled it just fine, creating a new main branch in its tree representation of metadata for the new namespace. (The original PDF file was created using OpenOffice, which currently has no support for XMP metadata — wouldn't it be great if it did? — so the other properties were inferred by Acrobat Professional.)

Acrobat Professional Document Metadata screen shot

One slightly annoying aspect of Acrobat Professional's handling of the metadata, as shown in the illustration, is the storing of several properties in containers even when it's unnecessary. For example, after adding the following XML to an exported XMP file and loading that metadata back into Acrobat Professional,

 <rdf:Description rdf:about='uuid:f7726098-03f5-43dd-98ba-53b74bed5d08'
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:creator>Bob DuCharme</dc:creator>
 </rdf:Description>

saving to a new XMP file right away showed that the same data was now stored like this:

 <rdf:Description rdf:about='uuid:f7726098-03f5-43dd-98ba-53b74bed5d08'
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:format>application/pdf</dc:format>
  <dc:creator>
   <rdf:Seq>
    <rdf:li>Bob DuCharme</rdf:li>
   </rdf:Seq>
  </dc:creator>
 </rdf:Description>

The simple structure of the metadata that I added was changed, although I'd done nothing to indicate that I wanted my single creator predicate stored as part of an ordered sequence. I'm guessing that something about Acrobat's internal data structures assumes that there may be more than one author, so even if there's only one, it puts it in an rdf:Seq container whether you like it or not. As you can see from the plus signs hanging off the main branches in the illustration above, it makes the same assumptions for Dublin Core title properties and XMP Title and Authors properties.

This doesn't just happen with metadata that it expects, but with arbitrary metadata as well. While this property/value pair made the XMP-to-Acrobat Professional-to-XMP round trip just fine,

<sn:buzzword>dark</sn:buzzword>

these two properties

<sn:buzzword>dark</sn:buzzword>
<sn:buzzword>edgy</sn:buzzword>

got converted to this in the round trip:

<sn:buzzword>
 <rdf:Seq>
  <rdf:li>dark</rdf:li>
  <rdf:li>edgy</rdf:li>
 </rdf:Seq>
</sn:buzzword>

The moral of the story is that you shouldn't make any assumptions about the structure of your metadata as stored by an XMP-enabled application. Before making this capability part of a production system, make sure to import and export some sample data to prevent future surprises about changes to metadata structures from causing problems elsewhere in your workflow.

The XMP spec lists several RDF features that it doesn't support, such as the rdf:ID and rdf:bagID attributes and rdf:parseType settings of "Literal."

Who's Using It?

Obviously, Adobe, and that's a lot of popular commercial products right there. For now, the list of specific non-Adobe products supporting XMP is much shorter than the list of vendors who've announced eventual support without specifying product names. The product list includes IBM's NICA Digital Asset Management System, Extensis Portfolio, IXIASOFT's TEXTML Server, iView's MediaPro, and various plugins from Pound Hill Software. In June, Adobe and the International Press Telecommunications Council announced extended use of XMP in IPTC metadata.

Issues & Challenges

For now, the use of XMP means either depending on commercial vendor tools or being comfortable with C++ so that you can use Adobe's SDK, but this is changing. Activity in the XMP User-to-User forum shows that open source Java tools are on the way, which will make it much easier to incorporate the use of XMP into production workflows — for example, to extract the metadata from a batch of images and then load that data into a database.

More in Standards Lowdown

Standards Selection is Vendor Selection

UBL: A Lingua Franca for Common Business Information

XBRL: The Language of Finance and Accounting

Adobe looks very strongly committed to XMP. Their decision to make it an RDF-based format, the high profile that Adobe products have in the commercial publishing world, and big print media publishers' growing interest in efficiently tracking their metadata are three factors that combine to make XMP a golden opportunity for the business world to appreciate the value of RDF. This business community doesn't care much about the Semantic Web, and its use of XMP (and hence RDF) will be behind firewalls, but an increased use of XMP in PDF, JPEG, and other formats will eventually mean more files with RDF-based metadata sitting on publicly accessible web servers, and hence a greater extension of the Semantic Web.

I like to picture an art director deciding which pictures to put in both People Magazine and Teen People. If the tools she uses to track these pictures store their metadata using RDF — even if she's never heard of the Semantic Web and all of the picture files are behind a Time Inc. firewall — then that's still great news for the RDF community, because it's another example of RDF moving from experiments by the academic and web-geek communities to production systems at major corporations.