Menu

What Is Atom

October 26, 2005

Ben Hammersley

Atom
The Atom Syndication Format is the next generation of XML-based file formats, designed to allow information--the contents of web pages, for example--to be syndicated between applications. Like RSS before it, Atom places the content and metadata of an internet resource into a machine-parsable format, perfect for displaying, filtering, remixing, and archiving.

In This Article:

  1. The Preservation of Metadata
  2. Constructs
  3. What's to Come?

This year, it seems, marked a turning point in the world of Syndication Formats. The collection of formats that started it all, RSS, has reached out of the tech world and into the mainstream. It's rare, nowadays, to find a news site or weblog that doesn't offer some flavor of feed. RSS is supported within Apple's Safari browser, within the next version of Windows, and with an every growing mass of applications, both desktop and web based. Its growth has been remarkable.

But as a technology grows, its shortcomings become more apparent. While the different versions of RSS are good for various applications--with RSS 2.0 very useful for simple syndication applications and ad hoc hacking, and RSS 1.0 the most commonly deployed version of the complex Semantic Web technology, RDF--neither format was perfect. RSS 2.0 is too loosely defined, and RSS 1.0, conversely, too complicated. And so, over the past three years, a volunteer development team has been building a format called Atom, which provides a formally-structured, and well-documented, system solely for the syndication of entire news articles and the like, as well as their respective payloads of metadata.

One of the key differences between the development of RSS and the development of Atom is that Atom's whole design process is held out in the open, on the Atom-Syntax mailing list and on the Atom wiki. The wiki is a great place to find the latest developments, issues, ideas, and pointers to the latest specification documents. It is well worth exploring, if you are interested in the history of the specification, and want to see why features are as they are.

Now, though, the specification is formalizing itself. On August 23, 2005 the Atom Syndication Format became a proposed standard at the Internet Engineering Task Force (IETF), after it was submitted by the AtomPub Working Group. It is this version that this article talks about, and which you should work with from now on. Older implementations might use versions 0.3 or 0.4 of the Atom format. They're out of date now.

So what are the differences between Atom and RSS? Apart from the process used to build the specification, and the rigor of the documentation, there are two main substantive changes. These are the Preservation of Metadata, and the concept of Constructs.

The Preservation of Metadata

The key issue when syndicating data is to make sure that you don't lose any information in the process. Apart from the document's content itself, we're also interested in preserving the fundamental metadata about the document too, namely:

  1. What it is called
  2. Who created it
  3. When it was created
  4. Where it is

We can know all of these things automatically--and should really keep them, but the different versions of RSS do not preserve this data by default. RSS 2.0, for example, doesn't require a date, an author, or a URI at all.

Atom, on the other hand, is specifically designed to never lose any data. To see this, take a look at this example of an Atom feed. This, as with all the code examples in this article, is taken from the official developer's guide:

<?xml version="1.0" encoding="utf-8"?>

<feed xmlns="http://www.w3.org/2005/Atom">



   <title>Example Feed</title>

   <link href="http://example.org/"/>

   <updated>2003-12-13T18:30:02Z</updated>

   <author>

     <name>John Doe</name>

   </author>

   <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>



   <entry>

     <title>Atom-Powered Robots Run Amok</title>

     <link href="http://example.org/2003/12/13/atom03"/>

     <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>

     <updated>2003-12-13T18:30:02Z</updated>

     <summary>Some text.</summary>

   </entry>



</feed>

The Fundamental Changes

Although those interested in implementing the Atom Syndication Format within their own software are directed to read the original specification document, rather than this article, there are some technical things to note from this feed listing. First things first: an Atom feed is XML, naturally, and so must follow all the usual well-formedness rules that that implies. All the elements must be in the http:// www.w3.org/2005/Atom namespace, and are considered to be in plain text--specifically, entity-encoded html will be considered to be plain text. Dates must be in RFC 3339 format too. Ok? Right, let's look at the feed.

As you can see, a feed consists of some metadata about the feed, followed by one (or naturally more) entries. This metadata chunk contains, happily, all of the data we found missing from the default RSS 2.0 feeds. The id element provides the "where," giving the feed's URI. The title provides a "what," giving the title of the feed. updated gives the "when," with an obligation to say when the feed was last changed. author says "who" created the feed, and link provides the "how"--giving a link to an HTML version of the resource the feed represents.

As you can see here, this section of an Atom feed can also contain elements detailing categories, intellectual property rights, contributors' details, a feed's logo, and more.

The Entry Section

An Atom feed contains one or more entry sections. These are just like an RSS feed's item sections. The entry section, just like the feed's main metadata section, has the obligatory id, title, updated, author, and link. It wouldn't be much use without the content, and it's highly recommended to have the summary, a "a short summary, abstract, or excerpt of the entry."

In addition to those core features, an Atom entry can contain categories and rights information, and an interesting (and unique to Atom) element called source. The source element allows for metadata about an entry's parent feed to be preserved if that entry is copied from one feed to a new one. An example of this might look like:

<source>

  <id>http://example.org/</id>

  <title>Fourty-Two</title>

  <updated>2003-12-13T18:30:02Z</updated>

  <rights>© 2005 Example, Inc.</rights>

</source>

More on the optional subelements that can be used in the entry element can be found here.

Constructs

An Atom feed is made up of standardized elements. Each of these elements is blessed with content that has been organized into one of the options provided by the Reusable Syntax of Constructs. Apart from being a particularly good name for a modern jazz quintet, the idea behind the Reusable Syntax of Constructs is to make the discussion of elements, both established and proposed, much simpler.

The official version 1.0 specification lists two official constructs, although the developer's guide has six: category, content, link, date, person, and text.

Constructs allow new elements, within new namespaces, to be added to an Atom feed in a controlled way. Elements pointing to a person, for example, should use the Person construct, and not reinvent the wheel. Mandating that dates are to be written only in rfc3339 format means that developers do not need to waste time on getting date-parsing code to work with anything else, and so on.

What's to Come?

The Atom Syndication Format is only the first part of the Atom project. It is balanced by the Atom Publishing Protocol, more commonly known as the Atom API. The aim of this project is to improve on, and replace, the existing XML-RPC-based publishing protocols such as the Blogger API. The AtomPub working group also has its specification in the IETF ratification process, if a few steps behind the Syndication Format. Expect to hear a lot more about the Atom Publishing Protocol in the next year.