Menu

Services and Links

January 13, 2003

Jon Udell

I found a link in my weblog's referrers file last night that seemed emblematic of the current milieu. Here's the text of the link:

http://www.w3.org/2000/06/webdata/xslt?xslfile=http://ipwebdev.com/outliner_xsl.txt&xmlfile=http://weblog.infoworld.com/udell/rss.xml

Here's a clickable instance of the link, and here's a snapshot of its output:

wedge Jon's Radio
wedge First line trivia at AllConsuming.Net

Erik Benson's wonderful All Consuming book site continues to delight me. The newest feature, First Line Trivia, presents the first line of a book on each refresh of the home page. You try to guess the book, and click through to see the answer. Members, who can edit book metadata, add the first-line data, IMDb-style. Example:

First Line Trivia
"This is a tale of two cities. Cities of the near future, say 10 or 20 years from now."

This could get addictive!

The first-line-trivia feature pushed me over the activation threshold, and I registered for the site. As a member, you can create a list of friends, which is seeded for you with candidates gleaned from Google's what's related and bl.ogs' related blogs. When friends add books to their All Consuming lists, you can receive them as Web (and optionally email) recommendations--and vice versa, your list can recommend books to them.

In fact, I'm unlikely to maintain an explicit book list because the blog universe that All Consuming inhabits already disseminates book awareness very effectively. Bloggers mention books on their blogs; All Consuming picks up on those references; its RSS feed brings them to my attention.

I'm surprised that there isn't more chatter about All Consuming on the weblogs I read. Increasingly, when I link to a book, I'm now likely to offer its All Consuming URL rather than its Amazon URL. Of course, as I just realized when reading this interview, Erik is an Amazon employee. Perfect! All Consuming is, in my view, one of the cleverest imaginable marketing schemes for Amazon--and for books in general. More and more books are available, at ever higher prices, but fewer and fewer people read. Boosting demand is the only hope for publishing, and Erik's service does that magnificently well. I'm more aware of books now than I have been in years. And since All Consuming's URIs are compatible with LibraryLookup, it's easier than ever to satisfy the increased demand.

Update When you use All Consuming, you may be surprised by unintended consequences. The other day, I was puzzled to see it attribute to me a reference to a book I hadn't mentioned on this blog. I wrote to Erik about it, and he was stumped for a while too, then he realized that it must have been something my Google box found and made available to All Consuming. It just now happened again. I put the phrase "all consuming" into my Google box. A few minutes later, All Consuming attributed a reference to Affluenza: The All-Consuming Epidemic to my blog. I find these spontaneous interactions fascinating and delightful. I can foresee, though, that a time will come when we'll want to be able to control these effects--for example, by applying robots.txt-like technology at the level of page components.

wedge It's just freaking cool

The most compelling effect in Minority Report, for me, was the visualization of active paper. Last night we watched it again, and later some friends dropped by. To put this in context, I live in smalltown New Hampshire, not Silicon Valley or Silicon Alley. There are lots of dial-up Internet happening here, and DSL is growing, but Wi-Fi households are rare. When a topic came up in conversation, and I flipped open the TiBook to check it out, I had an epiphany. The future really is here, albeit not evenly distributed. I didn't mention, and I'm sure it didn't occur to my friends, that I was connecting wirelessly to the Internet. It seemed completely natural that "the Internet" would be "in" this little box, whether or not wires were running to it. The technology is disappearing into the woodwork, as it should. It is becoming a small-i internet.

The emergence of Wi-Fi really has to be the story of the year. I'm currently reading The Wireless Networking Starter Kit, an excellent primer. The authors, Adam Engst and Glenn Fleishman, explaining how and why Wi-Fi is transformative, finally conclude: "It's just freaking cool." Amen to that!

wedge Scripting an interactive service intermediary

The recent discussion about active intermediaries (Sam Ruby, Phil Windley) sent me in an unexpected direction. What I meant to do was revisit some earlier writing on Web proxies, email proxies, and SOAP routing, and try to draw some conclusions. Instead, I invented another bookmarklet.

Here was the problem. It's nice that I can now look up a book in my local library, but what if it's not in the collection? My library's OPAC (online public access catalog) enables you to ask the library to acquire a book, but the required fill-in form creates an activation threshold that I am rarely motivated to leap over.

The basic LibraryLookup bookmarklet is a kind of intermediary. It coordinates two classes of services--Amazon/BN/isbn.nu/AllConsuming and your local library's OPAC--to facilitate a lookup. I couldn't resist trying to create another intermediary that would facilitate a purchase request.

The solution I'll present here is less general than the basic lookup in several ways, but also interesting in several ways. Here are the ways in which it is less general:

  • Amazon-only. The basic lookup works with any site whose URL matches either /ISBN or isbn=ISBN. But to fill out a purchase request, more information is needed. This solution relies on Amazon-specific markup to find that information.

  • Innovative-only. The basic lookup works with any of four OPAC systems (and potentially others, as users discover and report the URI patterns that can enable them). But since I only have an account at my own library, which uses an Innovative OPAC, that's the only case I could try. Further, the solution is likely not to work with your Innovative OPAC. A little spelunking reveals that the /acquire function (e.g. http://your.library.baseurl/acquire) produces differently-constituted forms from one Innovative OPAC to the next. Sometimes name/password, sometimes PIN and library-card number, etc. Mine uses name and library-card number.

Nevertheless, here are the reasons I find the solution interesting.

  • It works. Specifically, it works for my library, but the geek-inclined should find it easy to adapt it to another Innovative OPAC, and--presumably--to other OPACs.

  • It's a simple but compelling demonstration of the JavaScript DOM.

  • It uses JavaScript to set and get Amazon cookies. I'm sure JS hackers take this for granted, but I've never had occasion to try it.

  • It's a live example of the technique (which Derek Robinson mentioned to me and Art Rhyno showed me) that removes the MSIE bookmarklet size limit.

Intermediating a library purchase request

Here is the bookmarklet you can drag to your link toolbar: Please Acquire

Here is an Amazon page against which to test it: The Eighth Day of Creation: Makers of the Revolution in Biology.

Clicking the bookmarklet's link should bring up a screen like the one shown here. It's OK to click the button. I've neutered the script so it will just pop up a message rather than send the request. To unneuter it, rewrite the form's action= attribute to specify your OPAC's acquisition-request URL.

A few points to note in the code that follows:

  • Amazon's consistent use of the first META tag makes it very easy to pick out the book's title and author, like so:

    var m0 = document.getElementsByTagName('META')[0];
    var titleAuthor = m0.getAttribute('content');

    Gotta love that million-dollar markup!

  • There's no million-dollar markup for the publisher's name and date. Digging that out of the page is feasible, but much harder. I punt, in this case, by referring the librarian to the book's Amazon URL.

  • The setCookie script written into the generated form uses Amazon's domain. I'd never thought about this, but cookies are a two-way street. Amazon can use them to coordinate with me, but I can also use them to coordinate with Amazon. In this example, the generated form looks for Amazon cookies named MyLibraryUserName and MyLibraryUserID. If it finds them, it defaults two fields to their values. Otherwise, whatever you type there is remembered (via the onChange() handler) in Amazon cookies.

All in all, an instructive little exercise. This sort of technique won't replace active intermediaries, including the local kind that work at the level of HTTP or SMTP. Rather, it will complement them. Users need to be able to see, and approve, what intermediaries propose to do on their behalf. I like the idea of an interactive intermediary that prepares a connection between two services, previews it for the user, and then makes the connection.

Update

The script below contains a privacy bomb which, after a few minutes of reflection, I removed from the live version invoked by the bootloader. It's a fascinating scenario, actually:

  1. You don't want Amazon to see your library-card number.

    1. Don't store/send it at all. This, of course, eliminates most of the convenience of the solution.

    2. Store/send an encrypted version.

  2. You do want Amazon to see your library-card number. Does that sound crazy? Maybe not. Reasons I might trust Amazon with that information:

    1. Because it could use it to coordinate my library activity with my Amazon activity, and make better-informed Amazon recommendations. In particular, Amazon could emphasize books known not to be available to me in my local library. This would certainly seem to be a fair quid-pro-quo for the use of that handy ISBN in its URI!

    2. Because it could use it to offer me an email-notification service alerting me to overdue library books.

I find (2b) especially intriguing. It's not really in Amazon's interest for me to be aware of what's available in the local library, and it's not really in the library's interest for me to be made promptly aware of fines accumulating there. By yoking them together, I might be able to play the two services off against one another to my benefit--and to theirs.


The bookmarklet's bootloader

javascript:void((function() {var%20element=document.createElement('script'); element.setAttribute('src', 'http://weblog.infoworld.com/udell/gems/acquire.js'); document.body.appendChild(element)})())

The script loaded by the bootloader

var setCookieScript = 'function setCookie(Name1, Value1) { var expires = new Date(); expires.setFullYear(expires.getFullYear()+1); var cookie = Name1 + '=' + escape(Value1) + ';domain=amazon.com;path=/;expires=' +expires.toGMTString(); alert(cookie); document.cookie = cookie; }';

function getCookie(Name)
  {
  var s = '; '+document.cookie+';';
  var i = s.indexOf('; '+Name+'=');
  if (i == -1)
    { return ''; }
  else
    {
    i += 3 + Name.length;
    var j = s.indexOf(';', i);
    return unescape(s.substring(i, j));
    }
  }

var myLibraryUserID = getCookie('MyLibraryUserID');
var myLibraryUserName = getCookie('MyLibraryUserName');
var m0 = document.getElementsByTagName('META')[0];
var titleAuthor = m0.getAttribute('content');
var re = /(.+),s*([^,]+)$/;

re.test(titleAuthor);
var title = RegExp.$1;
var author = RegExp.$2;

var win = window.open('','LibraryAcquisitionRequest', 'resizable=1,scrollable=1,width=600,height=400');

win.document.write('<html><head><title>Request acquisition of: ' + titleAuthor + '</title><scr' + 'ipt>' + setCookieScript + '</scr' + 'ipt></head><body>');

win.document.write('<p>Request acquisition of: ' + titleAuthor + '</p>');

win.document.write('<form name="acquire" method="post" action="javascript:alert(\'Demonstration only!\');"><table><tr><td align="right">Author: </td> <td><input name="author" value="' + author + '" size="40" maxlength="255"></td> </tr><tr><td align="right">Title:</td> <td><input name="title" value="' + title + '" size="40" maxlength="255"></td> </tr><tr><td align="right">Where/when published:</td> <td><input name="publish" value="See Amazon: ' + location.href + '" size="60" maxlength="255"></td> </tr><tr><td align="right">Where mentioned:</td> <td><input name="mention" value="Amazon" size="60" maxlength="255"></td> </tr><tr><td align="right">Other info:</td> <td><input name="other" value="Intermediated by the LibraryLookup project" size="40" maxlength="255"></td> <tr><tr><td align="right">Your name:</td> <td><input name="name" value="' + myLibraryUserName + '" size="40" maxlength="255" onChange="javascript:setCookie (\'MyLibraryUserName\',forms[0].name.value);"></td> </tr> <tr><td align="right">14-digit library card #:</td> <td><input name="barcode" type="text" value="' + myLibraryUserID + '" size="40" maxlength="40" /* use with caution! OnChange="javascript:setCookie (\'MyLibraryUserID\',forms[0].barcode.value);" */></td> </tr><tr><td align="left" colspan="2"><br><input name="submit" type="submit" value="Ask library to acquire this book"></td> </tr></table></form><p></body></html>');

win.document.close();

wedge When not to cooperate

In an essay called Peer and non-peer review, Andrew Odlyzko pooh-poohs the fear that blogging (although he doesn't call it that) will undermine the classical system of scholarly peer review:

With the development of more flexible communication systems, especially the Internet, we are moving towards a continuum of publication. I have argued, starting with [3]1, that this requires a continuum of peer review, which will provide feedback to scholars about articles and other materials as they move along the continuum, and not just in the single journal decision process stage.

Obviously I agree. I'm not a scientist, but when asked in mid-2000 to produce a report on how Internet-based communication could improve scientific collaboration, I focused (in part) on weblogs and RSS as engines of distributed awareness and precise feedback.

Back in September, Sébastien Paquet wrote me a thoughtful email, which I cited with permission, on the subject of blogging and research culture. His assessment bears repeating:

Here are reasons why Sébastien thinks blogging and research culture should naturally go together:

  1. Scholars value knowledge. They have a lot of it to manage and track.
  2. A scholar's professional survival depends on name recognition. A K-log can help provide visibility and recognition.
  3. Scholars are used to writing; most of them can write well.
  4. Scholars are geographically disparate. They need to nurture relationships with people that they seldom meet in person.
  5. Scholars need to interlink in a person-to-person fashion (see Interlinktual)
  6. Scholars already rely heavily on interpersonal trust and direct communication to determine what new stuff is worth looking at. Such filtering is one of the central functions weblog communities excel at.
  7. For many scholars, the best collaborations come about when they find someone who shares their values and goals (this is argued e.g. in section 3 of Phil Agre's excellent Networking on the Network). The personal output that is reflected in one's weblog makes it much easier to check for such a match than work that is published through other channels.
  8. Scholars recognize the value of serendipity. Serendipity can come pretty quickly through weblogging; see Manufactured Serendipity.
  9. Every scholar must strive to be a knowledge hub in his niche, and an expert in related areas. A K-log is a good medium for this, as it is a way of letting knowledge flow through you while adding your personal spin.
  10. Scholars pride themselves on being independent thinkers. K-logs epitomize independent thought.

Here are reasons why Sébastien thinks blogging has failed to become a research nexus:

  1. It takes time.
  2. "The technology is not well-established and tested at this point."
  3. Many people don't like being among the first ones doing something.
  4. Not all scholars are used to the Web and hypertext.
  5. Shyness and fear of public mistakes. Many scholars won't write unless they have to. They may especially be reluctant to publicly expose ideas that they haven't tested.
  6. Fear that someone else will pick up their ideas and work them out before they do.
Rosalind Franklin

The sixth objection probably looms largest. The enterprise of science is at once exquisitely collaborative and fiercely competitive. One of the most poignant examples of the resulting dilemma is detailed in Horace Freeland Judson's The Eighth Day of Creation, the authoritative history of the elucidation of DNA's structure. Rosalind Franklin came very close to solving the riddle. But in the end, her X-ray crystallographic photos of DNA, conveyed indirectly to James Watson, triggered the crucial insight. She was denied the opportunity to collaborate directly, died of cancer a few years later, and is now a historical footnote.

Obviously the world of science was less kind to women then than it is now. But Robert Axelrod's The Evolution of Cooperation suggests that Franklin probably would have been out of luck in any case. In his analysis, cooperation can arise and be sustained only when the Prisoner's Dilemma is iterated--that is, when there is reason to expect many future interactions, and when there is no clearly-defined endgame. The hunt for the structure of DNA wasn't like that. A once-in-a-lifetime career-making Nobel-prize-winning goal was in view, and that distorted the payoff matrix.

In science (and in business) we might as well admit that, in such cases, competition will suppress cooperation. Rarely, we're pursuing a quest for a once-in-a-lifetime payoff. Usually, though, we're playing a game that looks more like an iterated prisoner's dilemma. A kind of meta-prisoner's-dilemma then arises. How can you tell the difference?


1Tragic loss or good riddance? The impending demise of traditional scholarly journals: There are obvious dangers in discontinuous change away from a system that has served the scholarly community well [Quinn]. However, I am convinced that future systems of communication will be much better than the traditional journals. Although the transition may be painful, there is the promise of a substantial increase in the effectiveness of scholarly work. Publications delays will disappear, and reliability of the literature will increase with opportunities to add comments to papers and attach references to later works that cite them.

wedge Tinkering with scripts and service lists

I tinkered a bit more with the LibraryLookup project yesterday. First, I noticed that the Build your own bookmarklet feature was broken in Mozilla. It turns out that any undeclared variable in the JavaScript will break it. Some kind of security feature, perhaps? Anyway, fixed. While I was at it, I added a feature that previews the link that will be embedded in the bookmarklet, so you can test it first. It's the same principle as the ASP.NET test page.

The bookmarklet generator also now emits a streamlined script. The original version, I'm embarrassed to say, went like so:


var re=/[/-](d{9,9}[dX])|isbn=(d{9,9}[dX])/i;

if ( re.test ( location.href ) == true )

  {

  var isbn=RegExp.$1;

  if ( isbn.length == 0 )

    { isbn = RegExp.$2 };

  ...

Of course, all that was really necessary was:


var re= /([/-]|isbn=)(d{9,9}[dX])/i;

if ( re.test ( location.href ) == true )

  { 

  var isbn = RegExp.$2

  ...

How did this happen? The usual way: when I expanded the original pattern to include the "isbn=" case, I didn't refactor. An instinctive programmer would have refactored on the fly. I'm not one, so I didn't see this until later. The problem with seeing it later is that you run smack into Don's Amazing Puzzle. It's far too easy to see a written text in terms of what we think it should say, rather than what it actually says.

(Here, by the way, are two tips for Radio UserLand folks who want to include JavaScript in items and stories. First, remove all blank lines from your script, because the Radio formatter will turn these into <p> tags that will break the script. Second, backslash-escape all instances of //--which if it occurs nowhere else, will be found before the closing end-comment tag. Radio's not-very-discriminating URL auto-activator is triggered by an unescaped //--like this one: //.)

Next, I took another look at the service lists. The first one came from Innovative's customer page, since withdrawn. The others I found by Googling for URL signatures. But I had been meaning to dig into the Libdex lists that a Palo Alto librarian, Martha Walters, referred me to. That turned out to be a fairly straightforward text-mining exercise which yielded, for Innovative and Voyager libraries in particular, greatly expanded lists with much more descriptive library names--and international coverage. Some of the many newly-added libraries:

Hong Kong - Kowloon - City University of Hong Kong
Scotland - St Andrews - University of St Andrews
Wales - Bangor - University of Wales Bangor and North East Wales Institute
Finland - Helsinki - Helsinki University
Puerto Rico - Gurabo - Universidad del Turabo
Scotland - Edinburgh - Edinburgh University

Because the Libdex catalog uses an extremely regular HTML format, it was not hard to reinterpret the HTML as a directory of services. But it wasn't as easy as it could have been, either. On the Backweave blog, Jeff Chan wonders whether Mark Pilgrim's use of the CITE tag is really an improvement over raw text mining. And Jeff mentions my report on Sergey Brin's talk at the InfoWorld conference, where I quote him as saying:

Look, putting angle brackets around things is not a technology, by itself. I'd rather make progress by having computers understand what humans write, than by forcing humans to write in ways computers can understand.

This isn't an either/or proposition. Like Mark, I strongly recommend exploiting to the hilt every scrap of latent semantic potential that exists within HTML. Like Jeff, I strongly recommend sharpening your text-mining skills because semantic markup, in whatever form, will never capture the totality of what can be usefully repurposed.

I guess I'm an extreme anti-extremist.

wedge A performance, expressed in text, data, and code

The 115 columns I wrote for BYTE.com are now restored to the public Web. I took this step reluctantly, and would have preferred that the original namespace remain intact, but so be it. Those columns that have continuing value can now weave themselves back into the fabric of the Web.

This exercise was another chance to experiment with Creative Commons licensing, which had raised some questions. In the case of these columns, I chose the Attribution-NoDerivs-NonCommercial 1.0 option, following the logic expressed by Denise Howell (via Scripting News).

Based on comments, I've also rethought my use of the CC license for LibraryLookup. My thinking on this was quite badly muddled, I'm afraid, mixing patent and copyright issues. As Matt Brubeck pointed out, a copyright has no bearing on patents, but publication alone is a hedge against potential frivolous use.

In the end, I concluded that LibraryLookup was a poor test case for the application of CC licensing to software. So I switched to the more basic Attribution license. I spent quite a while staring at the screen before I decided what to write in the Description metadata field. Here is what I finally said: A performance, expressed in text, data, and code.

Is it software? Phil Wainewright has a great essay on his Loosely Coupled weblog today: Software, Jim, but not as we know it.

wedge Is it software?

Is it software? asks Dave. That's such a great question! From the moment I first saw an HTML form on a Web page, it was clear that boundaries were about to blur. Web pages are both documents and programs. Web sites are both publications and applications. URLs are both phrases and function calls. Text is code, code is data, data is text.

The renewed understanding of documents and URLs in the SOAP community, over the past year, is an appreciation of this fundamental intertwingularity. Joshua Allen's terrific recent essay, Naked XML, translates into practical terms:

I have a litmus test of sorts that I use to determine if someone has "got it". I show them an XPath like "//contact[.//fax]" and watch their faces. Of the people who understand what it does, most will have no reaction, and most of the rest (the experts) will raise their brows skeptically and say "only a stupid person would write such an inefficient query!". There are yet precious few who exclaim "that is how things should be!" as their faces light up.

The lesson, of course, is that real-world information is chaotic. In any but the smallest "proof of concept" systems, the best that one can hope for is to be able to recognize small pockets of structure within a sea of otherwise unstructured information.

[Better Living through Software]

Now, take a look at Jonnosan's geographical service browser. Note, in particular, this feature:

Check Availability (not very accurate yet!)

Let's think about why not. Consider this query, which leads to a status page containing:


<TD>



   On Shelf&nbsp;

</td>

It would be great, of course, if all 117,418 libraries in the U.S. were to offer comprensive XML APIs. I'm optimistic (or foolish) enough to think that I might even live to see the day. Meanwhile, though, suppose this status page were instead merely well-formed HTML or XHTML, with structural cues, like so:


<td class="availability">

On Shelf

</td>

There's a nice little "pocket of structure within a sea of otherwise unstructured information."

Multiply by 117,418. It adds up.

Is it software? Yes.

wedge Build your own bookmarklet

Thanks to Andrew Mutch, the LibraryLookup project has added support for a fourth vendor of library software, Sirsi/DRA. The Google technique for service discovery turned up about fifty of these systems. But when Martha Walters showed me the master list of vendors, I remembered Will Cox's number--117,418 libraries in the U.S. alone.

Googling remains a useful way to discover services, but it only finds a fraction of four supported systems, and there are many still unsupported. So here's a complementary approach: Build your own bookmarklet.

The idea here is twofold. First, if your library uses one of the supported systems, but isn't listed, you can just generate the bookmarklet you'll need. Second, it provides a framework that can easily include more systems, as people discover and report the URL patterns that can drive them.

As you'll discover by clicking the triangles, this is an active-outline version of my weblog's RSS feed, using Marc Barrot's activeRenderer technology.

What's going on with this link, and why is it so interesting? Let's decompose the link into its three constituent parts, each of which is a resource--or, we might say, a kind of Web service:

  1. http://www.w3.org/2000/06/webdata/xslt

    This is the W3C's XSLT transformation service. I believe it was Aaron Swartz who first drew my attention to it. You call it on the URL-line, and pass two arguments: a pointer to an XSLT script, and a pointer to an XML file. The output is the XSLT transformation of the XML file.

  2. http://ipwebdev.com/outliner_xsl.txt

    This is Marc Barrot's XSLT script for transforming an RSS file into an active outline. (Editor's note: This script is actually an adaptation of Marc's work done by Adam Wendt, cited originally at this URL: ipwebdev.com/radio/2002/06/07.php#a177.

  3. http://weblog.infoworld.com/udell/rss.xml

    This is my weblog's RSS file.

I've written elsewhere about how a URL can be used to coordinate resources in order to produce a novel resource. This notion of coordination seems intuitively clear, and yet after years of exploration I have yet to fully unravel it.

The View Source principle

Clearly this URL-composition idiom is rooted in the classic Unix pipeline. The composite URL says: pipe the referenced XML data through the referenced filter using the referenced transformation rules. The references, though, are global. Each is a URL in its own right, one that may be cited in an email message, blogged to a Web page, indexed by Google, and used to form other composite URLs. This arrangement has all sorts of interesting ramifications. Two seem especially noteworthy. First, there's what I call the View Source principle. I've long believed that the browser's View Source function had much to do with the meteoric rise of the early Web. When you saw an effect that you liked, you could immediately reveal the code, correlate it with the observed effect, and clone it.

This behavior, argues Susan Blackmore in The Meme Machine, is uniquely human:

Imitation is what makes us special. When you imitate someone else, something is passed on. This 'something' can then be passed on again, and again, and so take on a life of its own. We might call this thing an idea, an instruction, a behavior, a piece of information...but if we are going to study it we shall need to give it a name. Fortunately, there is a name. It is the 'meme'.

It's clear that memes, when packaged as URLs, can easily propagate. Less obvious, but equally vital, is the way in which such packaging encourages imitation. My own first use of this technique imitated Aaron Swartz, and operated in a similar domain: production of an RSS feed. Marc Barrot's use of it went the other way, consuming an RSS feed to produce an active HTML outline. But over at the NOAO National Optical Astronomy Observatories, it's been adapted to a very different purpose. Googling for the URL signature of the W3C's XSLT service, I found a link that transforms VOTable data produced by the NOAO's SIM (Simple Image Access) service.

From astronomers, the technique could propagate to physicists, and thence almost anywhere, creating new (and imitatable) services along the way. Now in fact, as Google reveals, it hasn't propagated very widely. You have to be somewhat geek-inclined to form a new URL in this style, and much more so to whip up your own XSLT script. Assuming, of course, that you have a source of XML data in your domain, and some reason to transform it. Historically neither condition held true for most people, but the weblog/RSS movement is poised to change that.

Consider the source of the URL that prompted this column: I found it in my weblog's referrers file. Had I not already known about these things, clicking the link would have shown me:

  • that the W3C's XLST transformation service exists,

  • that activeRenderer exists,

  • that the two can be yoked together to process my RSS feed's XML data into an active outline,

  • that http://ipwebdev.com/outliner_xsl.txt is an instructive XSLT script, available for reuse and imitation,

  • that the service which transforms my RSS feed into an active outline was deployed by merely posting a link,

  • and that I could consume the service--thereby offering an active outline to people visiting my blog--merely by posting another link.

Once this composite service and its constituents are discovered, they are easy to inspect and imitate. It's true that not many people can (or should!) become XSLT scripters. But lots of people can and do twiddle parameterized URL-driven services.

How do people discover these services? That leads to a second principle: the Web, in many ways, is already a good-enough directory of services.

The good-enough directory

Mine was among the heads that nodded sagely, in 1994, when Internet's lack of an authoritative directory was said to be its Achilles' heel. Boy were we wrong. I'm a huge fan of LDAP, and I think that UDDI may yet find its sweet spot, but a recent project to connect book web sites to local libraries reminded me that the Web already does a pretty good job of advertising its services.

My project, called LibraryLookup, sprang from the observation that ISBNs are an implicit link between various kinds of book-related services, and that a bookmarklet could make that link explicit. The immediate goal was to facilitate a lookup from Amazon, BN, isbn.nu, or All Consuming to my local library, whose OPAC (online public access catalog) supports URL-line-driven query by ISBN.

I then realized that this bookmarklet was a kind of service--packaged as a link, and parameterizable by library and by OPAC. Extending the service to patrons of thousands of libraries was merely a matter of tracking down service entry points. The vendor of my own library's OPAC offered a list of nearly 900 other OPAC-enabled libraries on its web site, and it was easy to transform that list into a set of LibraryLookup bookmarklets. Then a librarian pointed me to a more complete and better-categorized list, which a bit of Perl turned into over 1000 bookmarklets for libraries around the world.

Like the OPAC vendor, the maintainer of the Libdex catalog thought of it as a list for human consumption, not programmatic use. There was no special effort to tag the service entry points. But being good webmasters, they instinctively followed a consistent pattern that was easy to mine. We can hope that, when more people realize how this kind of list is a programmatically-accessible directory, webmasters will be more likely to make modest investments in what Mark Pilgrim calls million-dollar markup.

We're lazy creatures, though. The semantic Web requires more effort than most people are likely to invest. Is there a lazy strategy that will take us where we need to go? Perhaps so. As the LibraryLookup project began to add support for other OPAC vendors, I experimented with a strategy I call "Googling for services." The idea is that in a RESTful world, services exposed as URLs will be indexed and can be searched for. Using this strategy, I was able to round up a number of epixtech and Data Research Associates OPACs by searching Google for their URL signatures.

The experiment wasn't entirely successful, to be sure. These auto-discovered service lists are neither as complete nor as well-categorized as the lists maintained by Libdex. Of course, the links that Google found were never intended to advertise service entry points. Suppose more services were explicitly RESTful (as opposed to implicitly so, subject to reverse engineering of HTML forms). And suppose these RESTful services followed simple conventions, such as the use of descriptive HTML doctitles. And suppose that the Google API were tweaked to return an exhaustive list of entry points matching a URL signature.

None of these hypotheticals requires a huge stretch of the imagination. The more difficult adjustment is to our notion of what directories are, and can be. In a paper entitled Web Services as Human Services, Greg FitzPatrick takes an expansive view:

We will exercise considerable breadth as to what we call a directory. Obviously the current UDDI specification was not designed with this sort of thing in mind, but it is perhaps in keeping with the vision of Web services as reaching beyond the known and adjacent world to the unknown and possible world, where hardwiring and business-as-usual are replaced by exploration and discovery.

Later, describing the conclusions reached by the SKICal (Structured Knowledge Initiative - Calendar) consortium, he writes:

The SKICal consortium came to accept that it was not its task to build another portal or directory over the resources of its member's domains, but rather to make better use of the universal directory already in existence--the Internet.

Exactly. There is no silver-bullet solution here, and formal directories will play an increasing role as the future of Web services unfolds. But service advertisement techniques such as UDDI are not likely to pass the View Source test anytime soon, and will not be easy for most people to imitate. What people can do, pretty easily, is post links. Services that express themselves in terms of links will, therefore, create powerful affordances for use, for imitation, and for discovery.