Listen Print Discuss
XMP Lowdown

XMP Lowdown

by Bob DuCharme
September 22, 2004

The Extensible Metadata Platform (XMP) is a specification describing RDF-based data and storage models for metadata about documents in any format. The specification includes information about embedding XMP in text files such as HTML and SVG/XML; image formats such as JPEG, TIFF, and GIF; and Adobe formats such as Illustrator, Photoshop, and Acrobat files. While its use of the RDF metadata format comes with some caveats, RDF enthusiasts should be pleased to see that the company making the most popular image manipulation tools has made those tools capable of reading and writing RDF metadata about these images.

Where Does It Come From?

Adobe announced XMP in September of 2001, although it had been available in Acrobat 5 since the spring of that year. The Adobe applications that support XMP web page lists 11 products and asserts that "Eventually all Adobe applications will use XMP."

The title page of the XMP Specification calls itself just that: a specification. While the first sentence of the Preface says that XMP "provides a standard format for the creation, processing, and interchange of metadata," it itself is not a standard in the sense of being connected to any standards body, and it is written and owned by a single corporation. (When an Adobe representative speaking at a Seybold conference mentioned the date that this "standard" was "submitted," I raised my hand and said, "Submitted to who?" He didn't have much of an answer.)

XMP's basis in RDF, however, provides plenty for developers interested storing arbitrary metadata in a non-proprietary standard format.

What Are They Working On?

Adobe won't reveal any future plans about XMP, other than that they update the free SDK in conjunction with new product releases. As the quote from the Adobe web page mentioned above demonstrates, they plan to build XMP support into their full product line.

What Else Is There? (Related Standards Landscape)

Related Reading

XML Schema

XML Schema
The W3C's Object-Oriented Descriptions for XML
By Eric van der Vlist

Table of Contents
Index
Sample Chapter

Read Online--Safari Search this book on Safari:
 

Code Fragments only

Members of the Semantic Web community have experimented with standardizing an RDF-based vocabulary for describing pictures, sometimes from their own cameras for their own personal use (1, 2, 3). The IDEAlliance standards group, with their general focus on the publishing industry, has two groups working on metadata for digital media: the PRISM group has developed a standard for magazine and journal metadata that can be expressed using RDF or non-RDF XML. Additionally, the Digital Image Submission Criteria group defines metadata fields that can be expressed as RDF.

The RDF basis, or at least friendliness, of these efforts means that they'll mix and match easily enough with each other and with XMP. If two different terms from two of these efforts nearly but don't perfectly describe a piece of your metadata, you'll want to pick one over the other to keep your metadata consistent. (This shouldn't be much of a problem, given the common core of most of these metadata efforts.) Still, mixing different RDF vocabularies is much easier than mixing different DTDs or schemas. That's part of the point of RDF.

How Does It Work?

The metadata vocabulary offered by XMP is not huge, and is not its greatest strength. You'll probably want to augment it with work from one of the metadata standards mentioned above and perhaps your own custom vocabulary. XMP's greatest strength is the consistent framework it provides for embedding metadata into binary formats not normally amenable to having arbitrary metadata added, particularly highly compressed binary formats such as JPEG and Adobe's Acrobat PDF format.

The XMP specification (in PDF format, naturally) was last updated in January of this year, but shows no version number. It describes the subset of RDF that it uses to store metadata, the XML serialization of that metadata, and the Dublin Core properties and other specialized Adobe namespaced properties handled natively by Adobe products supporting XMP (what the spec calls "schemas," although no actual RDF schemas are provided). It also describes the mechanics of embedding this metadata into various file formats.

To a non-technical user of Adobe products, XMP metadata is just another dialog box to fill out. For example, when using Acrobat Professional 6.0, picking Document Metadata from the Advanced menu displays the Document Metadata dialog box, which offers Description and Advanced choices of metadata display. Selecting Description displays a set of fields that let you enter basic metadata such as the document's Title, Author, and Description. Selecting Advanced displays a tree widget with four default branches: PDF Properties, XMP Core Properties, XMP Media Management Properties, and Dublin Core Properties. (Changing the View setting from Summary to Source displays the metadata as RDF/XML.) Expanding any of these branches shows properties belonging to these categories, including any values filled out on the Description display of the metadata.

Across the bottom of the dialog box are Replace, Load, and Save buttons. Save saves an RDF/XML file (including the extra processing instructions used to delimit a block of XMP metadata as described in the XMP spec) with all the properties defined for the document in the dialog box. The Replace button replaces the metadata with the metadata in an RDF/XML file that you specify. The Load button adds any metadata in a specified XMP file to the metadata already stored with the document.

The following screenshot shows the result of such a load. I had added a value for a goofinessFactor property from a namespace based on my own snee.com domain name to the RDF/XML file to test the addition of arbitrary metadata to the properties that Acrobat Professional already knew about. Acrobat Professional handled it just fine, creating a new main branch in its tree representation of metadata for the new namespace. (The original PDF file was created using OpenOffice, which currently has no support for XMP metadata — wouldn't it be great if it did? — so the other properties were inferred by Acrobat Professional.)

Acrobat Professional Document Metadata screen shot

One slightly annoying aspect of Acrobat Professional's handling of the metadata, as shown in the illustration, is the storing of several properties in containers even when it's unnecessary. For example, after adding the following XML to an exported XMP file and loading that metadata back into Acrobat Professional,

 <rdf:Description rdf:about='uuid:f7726098-03f5-43dd-98ba-53b74bed5d08'
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:creator>Bob DuCharme</dc:creator>
 </rdf:Description>

saving to a new XMP file right away showed that the same data was now stored like this:

 <rdf:Description rdf:about='uuid:f7726098-03f5-43dd-98ba-53b74bed5d08'
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:format>application/pdf</dc:format>
  <dc:creator>
   <rdf:Seq>
    <rdf:li>Bob DuCharme</rdf:li>
   </rdf:Seq>
  </dc:creator>
 </rdf:Description>

The simple structure of the metadata that I added was changed, although I'd done nothing to indicate that I wanted my single creator predicate stored as part of an ordered sequence. I'm guessing that something about Acrobat's internal data structures assumes that there may be more than one author, so even if there's only one, it puts it in an rdf:Seq container whether you like it or not. As you can see from the plus signs hanging off the main branches in the illustration above, it makes the same assumptions for Dublin Core title properties and XMP Title and Authors properties.

This doesn't just happen with metadata that it expects, but with arbitrary metadata as well. While this property/value pair made the XMP-to-Acrobat Professional-to-XMP round trip just fine,

<sn:buzzword>dark</sn:buzzword>

these two properties

<sn:buzzword>dark</sn:buzzword>
<sn:buzzword>edgy</sn:buzzword>

got converted to this in the round trip:

<sn:buzzword>
 <rdf:Seq>
  <rdf:li>dark</rdf:li>
  <rdf:li>edgy</rdf:li>
 </rdf:Seq>
</sn:buzzword>

The moral of the story is that you shouldn't make any assumptions about the structure of your metadata as stored by an XMP-enabled application. Before making this capability part of a production system, make sure to import and export some sample data to prevent future surprises about changes to metadata structures from causing problems elsewhere in your workflow.

The XMP spec lists several RDF features that it doesn't support, such as the rdf:ID and rdf:bagID attributes and rdf:parseType settings of "Literal."

Who's Using It?

Obviously, Adobe, and that's a lot of popular commercial products right there. For now, the list of specific non-Adobe products supporting XMP is much shorter than the list of vendors who've announced eventual support without specifying product names. The product list includes IBM's NICA Digital Asset Management System, Extensis Portfolio, IXIASOFT's TEXTML Server, iView's MediaPro, and various plugins from Pound Hill Software. In June, Adobe and the International Press Telecommunications Council announced extended use of XMP in IPTC metadata.

Issues & Challenges

For now, the use of XMP means either depending on commercial vendor tools or being comfortable with C++ so that you can use Adobe's SDK, but this is changing. Activity in the XMP User-to-User forum shows that open source Java tools are on the way, which will make it much easier to incorporate the use of XMP into production workflows — for example, to extract the metadata from a batch of images and then load that data into a database.

More in Standards Lowdown

Standards Selection is Vendor Selection

UBL: A Lingua Franca for Common Business Information

XBRL: The Language of Finance and Accounting

Adobe looks very strongly committed to XMP. Their decision to make it an RDF-based format, the high profile that Adobe products have in the commercial publishing world, and big print media publishers' growing interest in efficiently tracking their metadata are three factors that combine to make XMP a golden opportunity for the business world to appreciate the value of RDF. This business community doesn't care much about the Semantic Web, and its use of XMP (and hence RDF) will be behind firewalls, but an increased use of XMP in PDF, JPEG, and other formats will eventually mean more files with RDF-based metadata sitting on publicly accessible web servers, and hence a greater extension of the Semantic Web.

I like to picture an art director deciding which pictures to put in both People Magazine and Teen People. If the tools she uses to track these pictures store their metadata using RDF — even if she's never heard of the Semantic Web and all of the picture files are behind a Time Inc. firewall — then that's still great news for the RDF community, because it's another example of RDF moving from experiments by the academic and web-geek communities to production systems at major corporations.


Comment on this articleShare your experience in our forums.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • PHP Metadata toolkit
    2006-03-25 07:29:06 Paul Freeman [Reply]

    XMP is a bit of a pain, and I'm afraid that the PHP Metadata toolkit has been broken by more recent releases.


    The PHP Metadata toolkit works well where the XMP was injected by Photoshop CS and previous versions. However in CS2 which uses more recent versions of the XMP toolkit, Adobe has decided on a completely different syntax for XMP. This breaks the PHP Metadata toolkit which makes assumptions (reasonable ones I have to say) about the RDF serialisation which is being used.


    I'm not an RDF expert, but it looks to me as though the new format strays further from the spirit of RDF. Its an odd looking serialisation with nearly every datum recorded as an attribute of the top level rdf tag.


    The PHP Metadata toolkit still has uses (it can still pull out JPEG thumbnails for example) but it is not being actively developed.


    I'm going to try a couple of PHP based RDF parsers on the new format and see if they can handle this serialisation, but I'm not too hopeful.


    Paul AT architek DOT co DOT uk

  • Search tools?
    2006-03-22 11:11:05 creative-zen [Reply]

    Has anyone seen any search tools for searching XMP?

  • Perl implementation
    2006-02-28 00:51:45 martijn@foodfight.org [Reply]

    The Perl module 'Image::Exiftool' handles XMP metadata on loads of formats, even videos, mp3s and other non-images.

    • Perl implementation
      2006-02-28 04:31:58 Bob DuCharme [Reply]

      Excellent! This is great news, and I look forward to playing with it.

  • PHP JPEG Metadata Toolkit
    2006-01-27 14:40:26 trefrog [Reply]

    I don't know how well this toolkit works, but I'm definitely going to try it out. It has functions for working with embedded Dublin Core metadata in JPEG files, which is exactly what I was looking for.


    http://www.ozhiker.com/electronics/pjmt/

  • No Tools on the horizon...
    2005-05-21 13:30:44 s.frank [Reply]

    The article is from somewhere last year, and I have searched the web quite desparately for the mentioned open-source tools for XMP (especially a java SDK to read, write and embed XMP data into different file-formats) - without any success. In the user-forum, everybody is looking for tools, but noone seems to have anything on their hands - even adobe only offers a c-based sdk (Which is a shame, as nearly all rdf-toolkits are java based.)


    I really wonder, if xmp is dead and\or limited to adobe, or if it still has chances to get off the ground...

    • No Tools on the horizon...
      2005-11-17 13:36:34 pdresslar [Reply]

      Any new updates on this? Not much going on with XMP as of now, at least according to Google. Pity, as it is atractive to a few large firms I know of.

    • No Tools on the horizon...
      2005-08-17 09:01:54 Alex55 [Reply]

      Brightech's MediaBeacon R3volution is a great digital asset manager that is Java based.


      It has taken full advantage of XMP making it easier to add and work with metadata.

    • No Tools on the horizon...
      2005-05-21 15:31:30 Bob DuCharme [Reply]

      Yeah, Adobe doesn't seem to understand that they need to provide at least some Java tools or a Python wrapper to their C tools before any kind of grass roots popularity of XMP gets moving. I've heard that some people in Adobe do get this, but that they're losing the internal battles over the best way to promote XMP. It's too bad.

  • exploring XMP on publicly acessible JPEG's
    2004-11-20 07:28:56 THXmil138 [Reply]

    "an increased use of XMP in PDF, JPEG, and other formats will eventually mean more files with RDF-based metadata sitting on publicly accessible web servers,"


    Is there at the moment any tool that allows you to view embedded xmp on JPEG on a common web-browser?
    Or any form of reading that by only pulling the headers of the file from a normal http server?

  • author (mis)behavior...
    2004-09-28 09:05:13 Eric Johnson [Reply]

    Of course, we can't get most webpage authors to populate a TITLE tag (much less META tags), but this still sounds like a step forward -- especially if you can set default values (like author name, last edited, etc).


    As I think the author alluded to, this will probably be most useful for content generated by professionals where workflow rules enforce metadata application.


    Again, it's still a win. Just a small one for now.