Lady and the Tramp

September 29, 2004

This week's XML Deviant is a tale of two specifications. One, a scruffy ragtag affair that barely seems to do the job. The other, a well-heeled, aristocratic sort of document, groomed for longevity. You know what's coming next, of course. The tramp of which I speak, the unlikely RSS, goes from strength to strength. But the latest offspring in the noble line of XML, version 1.1, is doomed.

Getting Our RSS into Gear

I must confess that these days urgent debates between RSS fanatics hold even less fascination for me than the grimmest recesses of the post schema validation infoset. However, recent noise suggests that there are developments worthy of attention.

RSS, and I include Atom under this umbrella term, has become the ultimate reinvention of what was once falsely called "push technology." Push meaning, of course, scheduled polling. What was originally intended as a metadata format has become an envelope for the entire contents of web sites, advertising and all, pushed into a coherent user interface on a user's desktop.

In various ways, we who applaud open development models and scorn over-engineered standards should be very happy with what happened with RSS. Something quick and nasty has blossomed and burgeoned. Through pruning and training it is developing and emerging into something more refined. I can't help feeling that we might have been able to avoid at least one or two of the issues along the way by accepting a little more upfront design, but of course, I would say that.

The latest growing pain for RSS is the problem of distribution. As anyone who observes web server logs regularly will see, the pounding a site gets from dumb feed readers polling RSS feeds can be considerable. Despite the original inclusion of recommended polling schedules in RSS files, little or no attention is paid to them now.

When you start putting in the full text of a site into an RSS feed, not just the metadata, this can mean problems. The issue culminated in trouble for Microsoft's MSDN, which had to disable some RSS feeds for this very reason.

Such an eventuality has gotten the RSS world talking about alternative distribution strategies and ways to keep the bandwidth bill down. At Tim O'Reilly's recent geekfest, foo camp, various RSS wonks including Tim Bray, Robert Scoble, and Sam Ruby started to work out a potential solution to bandwidth reduction, which they're calling Vary: ETag.

The new proposal appears to advocate building smarts into web servers so that feed readers only retrieve new RSS items when the RSS file changes, rather than the whole last ten or so entries posted to a web site. This means a web server needs to understand the syndication format. If indeed it can be made as an easy Apache drop-in, it sounds like a hopeful move forward.

While encouraging, Vary: ETag doesn't feel to me like a complete solution to the problem. I have been wondering idly about other distribution mechanisms. The time-honored NNTP news protocol would seem to make a world of sense, as RSS directly fits the model of news. Unfortunately in our HTTP-or-nothing world, it doesn't seem like it will fly too far.

Another option would be to use peer-to-peer client techniques to disseminate the content. This works very well for sharing media content using programs such as BitTorrent and seems to be working out well too for internet telephony company Skype. One of the advantages of RSS is that many people are willing to download, and regularly update, custom software in order to read it. If even just a modicum of the installed base of desktop feed readers acquired BitTorrent-like functionality for sharing feeds, it seems to me that the RSS distribution problem could be nailed quite quickly. Any takers?

If you're not so worried about decentralization, then an announcement this week from RSS aggregator Bloglines could be a good solution to the RSS distribution problem. Many people I know have switched to Bloglines' web-based RSS reading service in order to be able to read their RSS from multiple locations. Now Bloglines is offering to redistribute its aggregated RSS via a web service. The authors of RSS-reading applications FeedDemon, NetNewsWire, and blogbot have already pledged to support the new API.

The attraction is, of course, less bandwidth consumption and also the offer of Bloglines to "insulate developers from the current blog syndication format wars." Of course, one person's insulator is another's gatekeeper. I note that Bloglines only outputs RSS 2.0, eschewing both RSS 1.0 and Atom. Perhaps not so much an insulation from the war, as an attempt to fire a winning salvo.

The Bloglines technology is RESTful, at least. It looks a bit like NNTP re-implemented in HTTP. The service appears to be free. So where is the gain for Bloglines? I'll watch with interest to see if this service is embraced or treated with caution. (There's more about Bloglines in Marc Hedlund's an O'Reilly Network article.)

XML 1.1 Dead in the Water?

The XML 1.1 recommendation, published at the beginning of this year, changes what is permitted in the names of elements and attributes in order to accommodate the growth of Unicode. As the list of changes says:

Whereas XML 1.0 provided a rigid definition of names, wherein everything that was not permitted was forbidden, XML 1.1 names are designed so that everything that is not forbidden (for a specific reason) is permitted.

This change substantially affects the definition of what is a well-formed XML document, and so all things that depend upon the base XML recommendation, whether software or specification, must change too. That is the big issue looming over XML 1.1 of course: does enough momentum exist to change the very large installed base of XML 1.0?

Recent indications are that schema technologies won't be changing in a hurry. A message from Norm Walsh to the RELAX NG mailing list comments on the decision by the ISO working group not to incorporate XML 1.1 names into the current RELAX NG standard.

Well, I suppose I should console myself that at least W3C XML Schema and RELAX NG have consistent views on the matter. It makes XML 1.1 a nearly pointless waste of time, but that's just the way it is, I guess.

Walsh refers to the fact that the W3C XML Schema working group will not amend WXS 1.0 to incorporate XML 1.1 names and will instead wait until W3C XML Schema 1.1 is published. The situation as it stands now for XML 1.1 is uncertain. It may be years before even schema technologies support it and even longer before tools do. There's a very real risk that XML 1.1 documents might end up as mere interesting curiosities.

Births, Deaths, and Marriages

The latest announcements from the XML-DEV mailing list. Thin pickings this week, I'm afraid.

Examploforms 0.1: Start of an XForms authoring and modeling tool from Micah Dubinko. Bringing Examplotron's design-by-example mentality to forms.
XML 2004 Deadlines Approaching -- Early Bird Registration, Late-Breaking/Product Submissions: Time to get your house in order for the main U.S. XML event of the year, Nov. 15-19, in Washington, D.C.
Saxon 8.1: New release of Michael Kay's XSLT and XQuery processor, including "added goodies in Saxon-SA beyond schema-awareness, most notably an extension to support higher-order functions."

Scrapings

Fear my magic firewall ... I was wrong about WS-*, apparently analysts say it's all good ... 52 messages to XML-DEV last week, Len rating 10% (bonus points for understatement of the week, "OWL is relatively academic") ... taking a proactive approach to out of office autoresponders ... revisiting history and whetting our appetites for next week.