XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Dreaming of an Atom Store: A Database for the Web
by Joe Gregorio | Pages: 1, 2

An Atom Store

The idea of an Atom Store has been bouncing around the blogosphere for a bit now, though not always called by that name. Jesse Andrews points out a few of the sources of inspiration, and as far as I know he was the first person to use the term "Atom Store":

  • Mark Pilgrim's magicline and monkey do could use it to store data

  • Rohit Khare & Ben Sittler at Commerce.net have been working on requirements of an Atom Store.

  • Joe Gregorios[sp], author of Atom Publishing Protocol, is researching it.

  • You can even hear Google's Adam Bosworth request it on IT Conversations, hoping MySQL folks don't become Oracle as Oracle doesn't scale the way an Atom Store could scale.

The range of applications that are being talked about here is breathtaking. The monkeydo and magicline usage of an Atom Store would be a remote persistence mechanism for a Greasemonkey script. Contrast that to the ideas that Adam Bosworth is talking about, databases that scale like Google's GFS does today.

It's All About the REST

That's a huge range of applications, but I think such a thing could happen. There are several forces driving it. First, you and I have lots of data, and it's stored in lots of places. I have my weblog, my email, my subscriptions to all my syndication feeds, maybe a del.icio.us and flickr account, and so on, and so on. You are not going to combine those all into one big, happy service. Ever.

I want my choices and even if you are a big company and end up being able to provide all those services under one brand, I doubt I would trust all that data in one place. Instead of consolidating services, what syndication over the past 5 years suggests is that now I can aggregate feeds from all those places into a single dashboard that let's me view the status of my far-flung data empire in a single view. Now if all those sources of data not only supplied a feed, but also supported the interface of an Atom Store, well now that passive view changes into a real dashboard -- not only are those entries viewable, but they're editable from one spot.

Yes, I know that some aggregators support search, and some even support some of the current blogging APIs, but that's very different from every source being searchable and editable. An aggregator is only going to be able to search across entries that have appeared since it started subscribing to that feed, and not any earlier ones.

The other advantage of an Atom Store is that it's built on top of RESTful services. That means that we get the advantages of REST -- caching and uniform interfaces and hypermedia as the engine of application state. For both OpenSearch and the APP there is an XML document that describes the capabilities of each endpoint. They are self describing. That allows another service to come along and wrap several Atom Stores together by reading those description documents and then presenting itself as an Atom Store, an aggregate of all those stores it uses. Now that aggregate store could be a melange of your disparate data, your weblog, your email, etc. On the other hand, it could be a uniform series of servers each with a subset of a huge store: now you're building a monster database.

"Just" Use a Database

Aren't these just the same promises made in the early days of SQL? Sure they are, but I think an Atom Store has a better chance of meeting the hype for several reasons: The first is that the data model is not wide open like SQL; the format is pretty restricted as far as the core elements of Atom are concerned. Secondly, the query and updating operations are not nearly as comprehensive as SQL. If you want to point to SQL as the only reasonable way to query over gigabytes of data, I'll just point to Google or Yahoo as counter examples.

It's Not All Puppies and Roses

Now that I've got you all worked into a lather over how great the world will be with Atom Stores on every street corner, let me splash a little cold water in your direction. I've kind of glossed over some areas that need work. Some of the open questions are:

More from
The Restful Web

Implementing the Atom Publishing Protocol

httplib2: HTTP Persistence and Authentication

Doing HTTP Caching Right: Introducing httplib2

Catching Up with the Atom Publishing Protocol

Dispatching in a REST Protocol Application

Indexing
Does indexing have to be immediate for the idea to be beneficial?
Annotating
How do you know where to POST to for creating new entries vs. annotations?
Creation
If I POST a new Entry to an aggregate of a bunch of Atom Stores, which of those Atom Stores should it be created in? How should I route that POST?
Foreign Markup
Let's say I wanted to use an Atom Store for storing all the customer transactions in my e-commerce store. To do that effectively I may have to add some extra information to an Atom Entry to fully represent a transaction. How and where is that information stored and indexed? Do I start creating microformats for all of that data or do I stuff it in the Entry as foreign markup? How much indexing of foreign markup is useful? Do we need specialized indexing and search terms for that?

As you can see there's plenty of work to be done. Let's roll up our sleeves and make it happen.



1 to 1 of 1
  1. Single Sign On
    2006-02-12 09:49:59 joekim
1 to 1 of 1