Vox Populi: Web Services From the Grassroots
Last month, Sam Ruby threw the blogging world into a tizzy when he created a wiki to serve as the home for a new syndication format and protocol. This month we'll take a look at the project -- the working name is "Necho" but has been "Echo" and "Pie" at various times. We'll use it to motivate a look at tradeoffs in XML and web services design.
"Syndication" is the term used when a site makes an RSS ("Really Simple Syndication") document available at a URL. For more information about the history of RSS, see Mark Pilgrim's inaugural XML.com column (What is RSS?). In what follows, I mean RSS 2.0 when I write "RSS".
Interest in RSS has been waxing, perhaps because the commercial possibilities are starting to occur to some folks. I doubt it was altruism that made Ruby's boss assign him to this project full-time, for example. The canonical web services example is a stock quote service, and translating that into an RSS feed that reports price updates is an obvious thing to do. Those of you who still have a portfolio worth managing could keep track of price movements by using any of the dozen or so RSS news aggregators that are available. As another example, my family is thinking of moving. I'd pay for a short-term subscription to an RSS feed that contained new sale listings in towns of interest to me.
RSS is described in this document, which is written and maintained by Dave Winer. Winer has been the individual most responsible for creating and proselytizing large portions of the RSS family.
Unfortunately, the RSS 2.0 specification is rather informal and imprecise. For example, here is an item within an RSS feed:
<item> <title>Are you <i>Crazy?<i></title> <description>"Are you <i>crazy</i>?" she said. "Nobody in their right mind would hand-create HTML markup in an RSS example...</description> <pubDate>10 Feb 60 11:23 MST</pubDate> </item>
This example shows some of the problems with the current RSS
description may be either a summary or the
complete item. The spec doesn't tell us how to tell whic it is. If it is
the complete item, entity-encoded HTML is allowed; but apparently not XML:
imagine an RSS feed that contained another feed's item as its summary.
There's also no way to tell whether the markup is HTML or plain text. It
neither makes clear how to write "5 < 7", nor does it specify if the
markup tricks in that example are necessary. (Notice the markup
difference between the
elements.) The date format is the date-time format of RFC 822, amended to allow
but not require four-digit years. Not surprisingly, many RSS feeds aren't
compliant, since that format is broken (think Y2K, timezones, and so
Sometimes RSS uses XML in a way that is rather, well, funky. For
guid element is a global identifier, intended to
be an opaque string generated by the RSS producer to uniquely identify an
RSS item. If, however, the
guid element has an
isPermaLink attribute with the value true, then the
element content is really a permanent URL that points to the item.
Attributes are usually best used for metadata. I don't think using an
attribute to switch between
xsd:anyURI qualifies as metadata.
Other than requiring every item to have either a title or description, every element within an RSS entry is optional. Additional elements can be added, provided that they appear in their own namespace. In order to support backward compatibility, RSS is not defined in any namespace. That's unfortunate, as it makes versioning very difficult.
Many RSS developers are fond of the
Dublin Core Metadata Initiative which has been working to establish a
set of metadata terms for nearly a decade. As a result, RSS
pubDate was replaced with the more precisely-defined
dc:date element. This turned out to be an improper
implementation of the spec, although it took a number of individuals
several days to determine the exact reason. Apparently it's not valid to
replace an existing element with an extension element of similar
semantics. Using a technique that can most charitably be described as
curious, Winer never publicly explained the exact problem, leaving others
to figure it out.
While this interpretation makes sense, it's important to realize that
it prevents any evolution in the RSS core. Any part of RSS which turns
out to be underspecified or just plain wrong cannot be phased out. A
community consensus to move to something like
never happen. While the RSS spec always said it was frozen, it wasn't
until the community went through this exercise that it realized how
Certainly a lot of increase use of RSS is due, as the name says, to it being rather simple to generate. According to Winer, one of the design goals was that anyone with an understanding of HTML could generate an RSS feed pretty quickly. That's a valid goal, but ambiguities like the aforementioned make things correspondingly more difficult for RSS consumers like the news aggregator developers. Whatever the reasons, we're nearly to the point where it's more notable that a content provider doesn't have an RSS feed than when it does.
But if RSS is going to evolve, it better happen now, while it is on the upswing, before it becomes a commodity, baked into every system, and impossible to change. Here is a portion of a Necho entry:
<entry xmlns="http://example.com/necho"> <author> <name>Rich Salz</name> <homepage>http://www.datapower.com</homepage> </author> <link>http://example.com/glob/42</link> <id>371a0eb3-594a-4923-b1c0-8684d3d50f22</id> <created>2003-02-05T12:29:29Z</created> <issued>2003-02-05T08:29:29-04:00</issued> <modified>2003-02-05T12:29:29Z</modified>
There are a couple of things to notice here:
- Necho is defined in a namespace; the URI isn't specified yet, but the intent is clearly there.
- There is a lot more metadata; we've shown a partial entry, and the 10 lines above haven't even shown any content.
- Dates are in WXS compatible format.
- All "overlap"
is removed: RSS's
guidelement is replaced by
- All ambiguity is replaced by explicit elements, such as the three different dates.
content element is where the biggest difference
occurs. The Necho content element has a
type attribute to
contain the MIME-compatible content-type. This is brilliant, as it allows
Necho to smoothly integrate with work on adding attachments to SOAP. It's
also multicultural, allowing the
xml:lang attribute to
specify the language being used. And, finally, multiple
content elements act as a MIME multipart/alternative
construct, allowing an RSS reader to find the representation it can best
Here are some example elements:
<content type="text/plain" xml:lang="en-us"> Are you *crazy*? </content> <content type="text/html" xml:lang="en-us"> Are you <i>crazy?</i> </content>
Is this technically better than RSS? It clearly is better. The ambiguities are gone, the metadata is more precise, the ability to provide rich and accurate content is now provided, and the use of XML is quite clean. Unlike RSS, it's feasible to define a schema for Necho. DTDs, XML Schema, and Relax NG are all in the works. In other words, validation won't require a special-built validator. News aggregators and other RSS consumers -- if they are written as XML applications -- should have an easier job of presenting more information to their users. Generating a Necho feed does not look to be that much harder than generating an RSS feed, only requiring the tweaking of a few output statements or templates. Creating a Necho-to-RSS stylesheet in XSLT should be fairly straightforward. So from the technical front, it looks like everyone will win.
Is it politically and socially better? The jury is still out. Radical format changes rarely win converts, and there are many who believe that the window of opportunity for change has already passed. At the beginning of this column, I said that Necho was defining a protocol as well as a data format. I'll look at that in full detail in the next column.
More from Rich Salz
The blogging community has defined several APIs for distributed manipulation of weblogs. These include the ability to add comments, post trackbacks (essentially a comment that says my article at this URL linked to you), and to ping servers informing them of updates. Most of these were quick hacks or first drafts that were eagerly adopted by multiple vendors. Most of these vendors are now interested in developing a new generation of APIs to provide these features -- and others, such as search, archiving, etc. -- in a new and consistent manner.
The wiki has a fairly active discussion about how to best define that protocol. Not surprisingly, I advocate that it be done using SOAP to convey XML documents. As long as the content being delivered is namespace-qualified, SOAP is a surprisingly lightweight messaging envelope:
<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope"> <S:Body> ...content here... </S:Body> </S:Envelope>
Just because the full web services machinery (WSDL, WXS, all those WS-xxx specs) rides atop SOAP, that doesn't mean that SOAP itself should be avoided. As we'll see next time, using SOAP as the messaging envelope enables all those features but doesn't require them. And along the way, we'll discover where REST becomes less useful.
- Locksmith schlag locks baldwin locks Installation 1-310-925-1720
2009-06-30 21:30:17 carpetcare
- Locksmith medeco lock, schlag locks, MUL T LOCK, schlag locks, Install Los Angeles 1-323-678-2704
2009-06-30 21:29:24 carpetcare
- Locksmith locked out service open or break or pop any locks Los Angeles 1-323-678-2704
2009-06-30 21:23:39 carpetcare
- High Security Locks Installation Repair Los Angeles 1-310-925-1720
2009-06-30 21:22:23 carpetcare
- Locksmith High Security Locks Standard Security Decorative lock sets Install Los Angeles 310-925-1720
2009-06-30 21:21:24 carpetcare
- Los Angeles Locksmith 24 hour emergency locksmith services 1-310-925-1720
2009-06-30 21:19:30 carpetcare
2003-07-09 07:12:31 Mark Baker
2003-07-14 10:41:24 Rich Salz