Interactive Microcontent

October 8, 2003

Jon Udell

Now that Kimbro Staken, Sam Ruby, and Kingsley Idehen have demonstrated variations of the structured microcontent idea I've been exploring for the past few months, I'm wondering what's next. Sure, it's cool to be able to link to XPath queries for items on Intertwingly that cite me or XPath code fragments contained within comments that I've posted to Kimbro's weblog and that cite Kimbro, but that's an abstract thing to do. Most people, quite rightly, would like to see more concrete benefits. If you're going to go to the trouble of creating structured microcontent, you'd like to be able to interact with it in immediate and tangible ways, and you'd like to empower others to do that too.

The kinds of interactions I have in mind are as common as dirt. I'm talking about basic routine chores that we perform daily and almost unconsciously. For example, I'm always citing content from other weblogs on mine, and I've developed a convention for doing so that looks like this:

<blockquote cite="name">

...contents-of-quotation... [<a href="url">name</a>]


I'd been doing this the hard way for months, though lately I'd leaned heavily on a spiffy but little-known feature of Mozilla called View Selection Source. Accessible from the right-click menu when there's an active selection, it has a surprising and (to me) fascinating behavior which I'll illustrate by selecting part of this paragraph as I'm writing it. First, the selection as it appears in the browser:

And now, the selection as shown by View Selection Source:

I've thrown a gratuitous <span> tag into the mix just to clarify what's happening here. When the selection crosses element boundaries, View Selection Source returns the common ancestor of the elements whose text nodes are touched by the selection. This might not be what you want, of course. Most folks will expect just the highlighted characters that were actually selected, minus the markup. In that case, you'll want a bookmarklet like the one Phil Windley cooked up recently and described in this blog entry. But now that I'm hooked on writing well-formed and XPath-searchable content, I want to grab complete well-formed fragments from the sources I quote, and I'd like other folks to be able to grab mine. We ought to be able to hand fragments around without damaging their fidelity, and we need smarter tools to help us do that. Mozilla's View Selection Source is such a tool, and I wondered how I could adapt it to my quotation style.

The DOM Range API

Wondering how Mozilla managed its clever selection trick, I was led to the DOM Level 2 Traversal and Range Specification, which contains this beautiful illustration:

The picture tells the story; there's almost no explanation needed. You can see at a glance how this model relates a stream of characters to a higher-order structure.

The getSelection() method implemented by MSIE and by Safari has no concept of this relationship. It simply returns the stream of selected characters. Mozilla's selection object does that too, if you take the blue pill. But if you take the red pill you can jump through the looking glass into the world of the DOM. The selection object's getRangeAt() method returns a range object whose magical properties include commonAncestorContainer. To see how Mozilla's View Selection Source uses that property, look at viewPartialSource.js. Armed with this example, I made a bookmarklet that uses the same technique to capture whole fragments for quoting in my style. Here's the text of the bookmarklet's bootloader:

javascript:(function(){var element=document.createElement('script'); 


document.body.appendChild(element); })()

Here's my quote bookmarklet, draggable to your linkbar if you're running Mozilla.

And here's the actual script:

function mozInnerHTML(node)


  var str = '';

  for (var i = 0; i < node.childNodes.length; i++)

    {  str += mozOuterHTML(node.childNodes.item(i));  }

  return str;


function mozOuterHTML(node) 


  var str = '';

  switch (node.nodeType) 


    case Node.ELEMENT_NODE: 

      str += '&lt;' + node.nodeName.toLowerCase();

      for (var i = 0; i < node.attributes.length; i++) 


        var attr = node.attributes.item(i);

        str += ' ' + attr.nodeName;

        str += '=' + '\"' + attr.nodeValue + '\"';


      if ( !node.hasChildNodes && !node.hasAttributes ) 

        {  str += '/>';  }



        str += '>';

        str += mozInnerHTML(node);

        str += '&lt;/' + node.nodeName.toLowerCase() + '>';



    case Node.TEXT_NODE:

      str += node.nodeValue;





  return str;


var sel = window.getSelection(); 

var node = sel.getRangeAt(0).commonAncestorContainer; 

var html = mozOuterHTML(node);

var uri=document.location;

var title = document.title;


win.document.write('<body><div>&lt;blockquote cite=\"' + title + '\"></div>' + 

  html + ' [&lt;a href=\"' + uri + '\">' +  title + '&lt;/a>]' +

  '<div>&lt;/blockquote></div></body>'  );

With this tool, selection becomes more of a gesture than a careful delineation of boundaries. Sweep out a range that touches any parts of two paragraphs contained within a <div> or <blockquote> and you'll capture the complete containing element. Of course the results of a gesture aren't always predictable. In the example above, selecting "Accessible from the" and "Accessible from the right" gives two very different results. In the first case, you get the text node preceding the <span> tag, and in the second you get the whole paragraph. If CSS were used to distinguish the <span>'s contents, this behavior would be less surprising. We've already seen many examples of how CSS can provide hooks for intelligent search. As microcontent becomes interactive, perhaps CSS can also help users visualize the structures they're working with.

Working with Calendar Fragments

Let's try a different example. Ray Ozzie recently touched off a flurry of discussion about sharing calendar objects. I did some experimenting and wrote up some observations on the matter, to which Adrian Cuthbert responded as follows (email quoted with permission):

The idea of being able to click on a link and have embedded page content delivered to a helper application seems reasonable enough. But there doesn't appear to be an easy way to embed XML data into XHTML and use it. For example there would need to be a way of mapping XML tags onto the user's choice of helper application, much as one can do with mime-types for downloaded content.

This is the sort of framework one might have expected to have evolved within web-browsers after the emergence of XML. Were there such a framework, I can't help thinking that the benefits of providing simple XML encodings for things like calendar events and contacts would be much more immediate.

If I understand your enthusiasm for RSS and blogging correctly, it's partly because the people developing the standards and writing new applications seem more prepared to take such risks. Assuming there were an XML standard for encoding calendar events woven into some RSS 2.x, how do you see things changing? I would like to see something not unlike Microsoft's SmartTags except the smartness comes from the author marking up their content [which also allows all the detail to be hidden behind a simple label]. This would provide a real usability benefit and maybe provide an incentive for authors marking up parts of content. Of course we need a suitable client, any thoughts?

Yes, it strikes me that Mozilla is within shouting distance, at least, of being a suitable client. I'm not just talking about Mozilla Calendar here, but about the browser itself as a container of interactive microcontent.

To set the stage, let's have a look at some parallel exploration that Alf Eaton has been doing. First, he noted that a file containing an individual event in iCalendar format can be click-loaded into an iCalendar app. (The Mac's iCal handles this more gracefully than Mozilla Calendar.) Then, a few days later, he offered a service that enables you to write parameterized URLs that dish out individual events.

This is really useful stuff. I can't help noticing, though, that we have all kinds of HTML renderings of calendar data on the Web, and they are only suitable for viewing, not for intelligent search or for manipulation. It strikes me that things might be otherwise. Here's a sketch that suggests how.


<a href="javascript:getFragment('cal:date')">grab calendar event</a>,

<a href="javascript:getFragment('cal:range')">grab calendar events</a> 



Here's the schedule:

<table class="calCalendar" cal:range="20031021-2003122">

<tr class="calEvent" cal:date="20030921">

<td><div class="calName"><span team:id="42">Red Sox</span> 

  vs. <span team:id="27">Yankees</span> </div>

<div class="calLocation">Fenway Park</div>


<tr class="calEvent" cal:date="20030922">


<div class="calName"><span id="42">Red Sox</span> 

  vs. <span id="27">Yankees</span> </div>

<div class="calLocation">Fenway Park</div>





Let's not worry for now about the details of attributes and namespaces in this hypothetical XML representation of an iCalendar event. And let's not concern ourselves with whether the getFragment() method is delivered by way of a link on the page or by way of a bookmarklet. What I'd like to focus on, instead, is the kind of interaction that seems within reach. If you are viewing this page in Mozilla, the following rendering of the XHTML shown above will provide a live demonstration:

grab calendar event, grab calendar events

Here's the schedule:

Red Sox vs. Yankees
Fenway Park
Red Sox vs. Yankees
Fenway Park

If I click in one or the other of the events to set focus there, then click on the "grab calendar event" link, the <tr> element containing it (which could as easily be a <p> or <div>) is returned in a new window as a chunk of XML. Likewise if I click in either of these events and then click the "grab calendar events" link the containing element, which happens to be a <table> in this case (but again, could be something else) is returned as XML. View the source of this page to see how.

So what? In previous installments we've considered how XHTML content that's directly viewable without transformation can also be a repository that's accessible to structured search. It'd be a snap to search a weblog, or a collection of weblogs, for events at Fenway Park in September. Now I'm thinking that same XHTML content can also be interactive.

Given any XML representation of an event, no matter what the format, it's a simple matter to dish out an iCalendar version, or to form the parameterized URL that yields an iCalendar version, or to feed some other process in some other way. If two different formats are isomorphic, separated only by a straightforward transformation, then I consider them to be the same. Of course that's never literally true because transformation isn't frictionless. But the friction that really wears us down is at the interface between people and data. I'm not too worried about how we represent XML fragments, but very curious about how we enable people to interact with them. These little experiments are hardly conclusive, but they do hint at the still-untapped potential of the scriptable document object model.