Flipping the Links

September 12, 1998

Once I had authored all these hyperlinks, I faced the problem of how to display them. Unfortunately, in 1998 there weren't any Web browsers that could do the job, and in fact, I wanted this to be usable by anyone with a basic HTML browser. So my program had to turn the links around; instead of one XML file containing hundreds of links into another, I needed one HTML file (the annotated spec) containing hundreds of links, each to one little annotation file. All this was done in Java, as noted above. The rest of this article gives a step-by-step description of the program, and may be a bit challenging if you're not a Java programmer.

Step 1: Parse the Annotations

First, the program creates an instance of the Lark parser, and in one step, reads the annotations file:

  annot = new Lark();
  r = new XmlInputStream(new FileInputStream(args[0]));
  System.err.println("Parsing Annotations...");
  annot.buildTree(true);
  annot.saveText(true);
  aroot = annot.readXML(h, r);

After this operation, the variable aroot points to the root of the annotations document tree.

Step 2: Put the XLinks in a Vector

This code runs through the annotations tree and puts all the XLinks that it finds in the variable xlinks:

  Vector xlinks = new Vector();
  findXLinks(aroot, xlinks);
  System.out.println("xlinks: "+xlinks.size());
  ...
private static void findXLinks(Element e, Vector v)
...
    XLink link = XLink.isLink(e, sIDs);
    if (link != null) v.addElement(link);
...
    children = e.children();
    for (i = 0; i < children.size(); i++)
    {
      child = children.elementAt(i);
      if (child instanceof Element)
        findXLinks((Element) child, v);
    }

This code looks at each element, then calls itself recursively on that element's children. It uses the routine isLink to determine whether an element is an XLink:

  public static XLink isLink(Element e, Hashtable ids)
  ...
    String form = e.attributeValue("xml:link");
    if (form == null) return null;
    else if (form.equals("simple") || form.equals("extended"))
    {
      XLink link = new XLink(ids);
      link.loadFromElement(e, ids);
      return link;
    }
    ...

An element is an XLink if it has an attribute xml:link= whose value is either simple or extended.

Each XLink, once identified, is stored into an XLink object for later re-use; these objects are what populate the xlinks vector mentioned above. The function which loads up the XLink data structures is the same for the extended and locator linking elements, since their makeup is almost identical.:

private void loadFromElement(Element e, Hashtable ids)
  ...
  // load up role, content-role, etc. fields from attribute values
  mRole = e.attributeValue("role");
  ..
  splitHref(); // save the HRef and build an XPointer if there is one
  Vector children = e.children();
  for (i = 0; i < children.size(); i++)
  {
      if ( // check xml:link att to see if this child is a locator
      {
        XLink link = new XLink(ids);
        link.loadFromElement((Element) child, ids);
        mLocators.addElement(link);

Step 3: Traverse the XLinks

Once the xlinks vector is filled up, we run through it, traversing those whose role is not "annotation" - in effect, this means we traverse only the spec XPointers:

findXLinks(aroot, xlinks);
...
System.err.println("Traversing...");
for (i = 0; i < xlinks.size(); i++)
{
  XLink link = (XLink) xlinks.elementAt(i);
  targets = link.traverseExcept("annotation");
  if (targets.size() == 0)
  {
    System.out.println("Dangling Annotation!");
    link.dump(System.out);
  }

The routine traverseExcept returns the list of targets that the corresponding XPointer points at. If that list is of size 0, it means that the XPointer is broken and doesn't point at anything. Since I was constructing the XPointers by hand, I saw this message a lot during the construction of the Annotated Spec.

The example above doesn't show the actual traversal of a single link; here's the code which does that:

private void doTraverse(Vector ret, Element linkingElement)
  ...
  if (// I haven't already parsed this instance
  {
    // parse the instance
    System.err.println("Parsing target...");
    Lark xml = new Lark();
    ...
    sTargetRoot = xml.readXML(h, r); 
    System.err.println("Done.");
    }
  }
  ret.addElement(sTargetRoot);
  // if there's an XPointer, traverse that
  if (mXP != null)
    mXP.traverse(ret);

This code first looks to see whether the URL in the XLink has already been parsed. If it hasn't, it makes a new parser and parses that document. Since in this case, all the links are into the XML spec, this code only gets executed once, causing the parsing of the XML spec when the first spec locator element is traversed.

The code above is putting the results of each traversal in the vector ret. First of all, it inserts a pointer to the root of the target XML document, then checks to see if the XLink contains an XPointer (in the Annotated Spec, they all do) and if so, traverses that. The code for that is here:

public void traverse(Vector ret)
  ...
  // an XPointer is stored as an array of steps in an obvious way
  for (i = 0; i < mSteps.size(); i++)
  {
    step = (XPStep) mSteps.elementAt(i);
    kw = step.kw();
    switch (kw)
    {
    case sDescendant:
      for (j = 0; j < old.size(); j++)
      {
        start = (Element) old.elementAt(j);
        inOrder(start, ret, step, 0);
      }
      break;

    case sString:
      for (j = 0; j < old.size(); j++)
      {
        start = (Element) old.elementAt(j);
        searchForString(start, ret, step, 0);
      }
      break;

    case sID:
      ret.addElement(mIDs.get(step.type()));
      break;

    case sChild:
      ...
          for (k = 0; k < children.size() && !done; k++)
          {
            o = children.elementAt(k);
            if (step.match(o))
      ...
    case sRoot: case sHere: case sDitto: case sHTML:
    case sAncestor: case sPreceding: case sPSibling: 
    case sFollowing: case sFSibling: default:
      if (kw < sMaxKW)
        throw new Exception("Can't do '" + sKeywords[kw] + "' yet");
      else
        throw new Exception("Bogus KW value "+kw);

As the comment says, each XPointer object contains an array with each element being one of the XPointer's steps. First, look for a moment at the sChild code above. It uses the method step.match() to check whether one step matches a particular child.

The code for sDescendant also uses that code, but of course has to do an in-order traversal of the whole subtree. Note how few of the XPointer verbs are implemented.

This part of the code, following all the links and making the connections between the annotation and the spec, takes almost no time to run. Some of the elapsed time in running this program is spent in parsing the annotations file and the XML spec, but most of it goes into the task of loading these fairly large complex documents into trees in memory. This burns immense amounts of memory; the two files are together less than 500K in size, but the two trees in memory use well over 10 megabytes. There are some tricks that could be used to make the trees more compact, but even so, the lesson is that fully-parsed XML documents burn a lot of memory. This is one reason why it's a good idea to do as much work as possible through a stream API such as SAX. However, the annotation in particular and XPointer processing in general are examples of jobs that it would be really hard to do without having a tree in memory.

Step 4: Decorate the Target

At the end of all this work, the vector targets contains a list of all the locations in the XML spec that have annotations pointing at them. Next, the code runs through the XML spec and, for each element that is the target of a link, it adds a new child which is an HTML A element with an href= pointer pointing at the appropriate annotation file. This element is added as the last child of the annotated element, so the annotation symbol will appear at the end of the target. This is an arbitrary choice that worked fine for the Annotated Spec, but may not be appropriate for many other applications. If the XPointer points into the middle of a chunk of text, more work is required; the text has to be split into two nodes, and the HTML A element inserted between them:

for (k = 0; k < targets.size(); k++)
{
  Object o = targets.targetAt(k);
  Element lE = link.linkingElement();
  if (o instanceof Element)
  {
    ((Element) o).addChild(makeAnchor(lE, targets.roleAt(k)));
  }
  else if (o instanceof Text)
  {
    // OK, we're going to have to split this text node
    splitText((Text) o, targets.offsetAt(k),
      lE, targets.roleAt(k));
  }

Once all these extra elements have been spliced into the XML spec, writing out the annotated version is pretty easy. I already had a large amount of Java code that converts the XML to HTML, which is used to generate the HTML versions of the spec that you can find at the W3C site and elsewhere. This code works by having a Java class for each element; all I had to do was to add a new class to handle the new spliced-in A elements, which involved very little work aside from putting in the -style decorations depending on the role attribute:

  System.err.println("Done.  Writing target...");

  xroot = XLink.currentRoot();
  out = new PrintStream(new FileOutputStream("target.html"));
  if (xroot != null)
    printer.writeHTML(xroot, out, false);
  out.close();
  System.err.println("Done.  Writing notes...");

Step 5: Write the Annotations

Writing out the annotations is pretty easy too. They'd been placed in a vector named notes constructed earlier in the process. For each one, we open a file, dump in the HTML text (with a bit of extra decoration) and the job's done.

    System.err.println("Done.  Writing notes...");

    for (i = 0; i < notes.size(); i++)
    {
      if (notes.elementAt(i) instanceof Element)
      {
        child = (Element) notes.elementAt(i);
        title = child.attributeValue("id");
        out = new PrintStream(new FileOutputStream(
                    "notes/" + title + ".html"));
        out.println("<HTML><HEAD><TITLE>" 
              + title + "</TITLE>");
        ...

We use the value of the id attribute for the filename; that's why it is #REQUIRED.