Flipping the Links
September 12, 1998
Once I had authored all these hyperlinks, I faced the problem of how to display them. Unfortunately, in 1998 there weren't any Web browsers that could do the job, and in fact, I wanted this to be usable by anyone with a basic HTML browser. So my program had to turn the links around; instead of one XML file containing hundreds of links into another, I needed one HTML file (the annotated spec) containing hundreds of links, each to one little annotation file. All this was done in Java, as noted above. The rest of this article gives a step-by-step description of the program, and may be a bit challenging if you're not a Java programmer.
Step 1: Parse the Annotations
First, the program creates an instance of the Lark parser, and in one step, reads the annotations file:
After this operation, the variable aroot points to the root of the annotations document tree.annot = new Lark(); r = new XmlInputStream(new FileInputStream(args[0])); System.err.println("Parsing Annotations..."); annot.buildTree(true); annot.saveText(true); aroot = annot.readXML(h, r);
Step 2: Put the XLinks in a Vector
This code runs through the annotations tree and puts all the XLinks that it finds in the variable xlinks:
This code looks at each element, then calls itself recursively on that element's children. It uses the routine isLink to determine whether an element is an XLink:Vector xlinks = new Vector(); findXLinks(aroot, xlinks); System.out.println("xlinks: "+xlinks.size()); ... private static void findXLinks(Element e, Vector v) ... XLink link = XLink.isLink(e, sIDs); if (link != null) v.addElement(link); ... children = e.children(); for (i = 0; i < children.size(); i++) { child = children.elementAt(i); if (child instanceof Element) findXLinks((Element) child, v); }
An element is an XLink if it has an attribute xml:link= whose value is either simple or extended.public static XLink isLink(Element e, Hashtable ids) ... String form = e.attributeValue("xml:link"); if (form == null) return null; else if (form.equals("simple") || form.equals("extended")) { XLink link = new XLink(ids); link.loadFromElement(e, ids); return link; } ...
Each XLink, once identified, is stored into an XLink object for later re-use; these objects are what populate the xlinks vector mentioned above. The function which loads up the XLink data structures is the same for the extended and locator linking elements, since their makeup is almost identical.:
private void loadFromElement(Element e, Hashtable ids) ... // load up role, content-role, etc. fields from attribute values mRole = e.attributeValue("role"); .. splitHref(); // save the HRef and build an XPointer if there is one Vector children = e.children(); for (i = 0; i < children.size(); i++) { if ( // check xml:link att to see if this child is a locator { XLink link = new XLink(ids); link.loadFromElement((Element) child, ids); mLocators.addElement(link);
Step 3: Traverse the XLinks
Once the xlinks vector is filled up, we run through it, traversing those whose role is not "annotation" - in effect, this means we traverse only the spec XPointers:
The routine traverseExcept returns the list of targets that the corresponding XPointer points at. If that list is of size 0, it means that the XPointer is broken and doesn't point at anything. Since I was constructing the XPointers by hand, I saw this message a lot during the construction of the Annotated Spec.findXLinks(aroot, xlinks); ... System.err.println("Traversing..."); for (i = 0; i < xlinks.size(); i++) { XLink link = (XLink) xlinks.elementAt(i); targets = link.traverseExcept("annotation"); if (targets.size() == 0) { System.out.println("Dangling Annotation!"); link.dump(System.out); }
The example above doesn't show the actual traversal of a single link; here's the code which does that:
This code first looks to see whether the URL in the XLink has already been parsed. If it hasn't, it makes a new parser and parses that document. Since in this case, all the links are into the XML spec, this code only gets executed once, causing the parsing of the XML spec when the first spec locator element is traversed.private void doTraverse(Vector ret, Element linkingElement) ... if (// I haven't already parsed this instance { // parse the instance System.err.println("Parsing target..."); Lark xml = new Lark(); ... sTargetRoot = xml.readXML(h, r); System.err.println("Done."); } } ret.addElement(sTargetRoot); // if there's an XPointer, traverse that if (mXP != null) mXP.traverse(ret);
The code above is putting the results of each traversal in the vector ret. First of all, it inserts a pointer to the root of the target XML document, then checks to see if the XLink contains an XPointer (in the Annotated Spec, they all do) and if so, traverses that. The code for that is here:
As the comment says, each XPointer object contains an array with each element being one of the XPointer's steps. First, look for a moment at the sChild code above. It uses the method step.match() to check whether one step matches a particular child.public void traverse(Vector ret) ... // an XPointer is stored as an array of steps in an obvious way for (i = 0; i < mSteps.size(); i++) { step = (XPStep) mSteps.elementAt(i); kw = step.kw(); switch (kw) { case sDescendant: for (j = 0; j < old.size(); j++) { start = (Element) old.elementAt(j); inOrder(start, ret, step, 0); } break; case sString: for (j = 0; j < old.size(); j++) { start = (Element) old.elementAt(j); searchForString(start, ret, step, 0); } break; case sID: ret.addElement(mIDs.get(step.type())); break; case sChild: ... for (k = 0; k < children.size() && !done; k++) { o = children.elementAt(k); if (step.match(o)) ... case sRoot: case sHere: case sDitto: case sHTML: case sAncestor: case sPreceding: case sPSibling: case sFollowing: case sFSibling: default: if (kw < sMaxKW) throw new Exception("Can't do '" + sKeywords[kw] + "' yet"); else throw new Exception("Bogus KW value "+kw);
The code for sDescendant also uses that code, but of course has to do an in-order traversal of the whole subtree. Note how few of the XPointer verbs are implemented.
This part of the code, following all the links and making the connections between the annotation and the spec, takes almost no time to run. Some of the elapsed time in running this program is spent in parsing the annotations file and the XML spec, but most of it goes into the task of loading these fairly large complex documents into trees in memory. This burns immense amounts of memory; the two files are together less than 500K in size, but the two trees in memory use well over 10 megabytes. There are some tricks that could be used to make the trees more compact, but even so, the lesson is that fully-parsed XML documents burn a lot of memory. This is one reason why it's a good idea to do as much work as possible through a stream API such as SAX. However, the annotation in particular and XPointer processing in general are examples of jobs that it would be really hard to do without having a tree in memory.
Step 4: Decorate the Target
At the end of all this work, the vector targets contains a list of all the locations in the XML spec that have annotations pointing at them. Next, the code runs through the XML spec and, for each element that is the target of a link, it adds a new child which is an HTML A element with an href= pointer pointing at the appropriate annotation file. This element is added as the last child of the annotated element, so the annotation symbol will appear at the end of the target. This is an arbitrary choice that worked fine for the Annotated Spec, but may not be appropriate for many other applications. If the XPointer points into the middle of a chunk of text, more work is required; the text has to be split into two nodes, and the HTML A element inserted between them:
for (k = 0; k < targets.size(); k++) { Object o = targets.targetAt(k); Element lE = link.linkingElement(); if (o instanceof Element) { ((Element) o).addChild(makeAnchor(lE, targets.roleAt(k))); } else if (o instanceof Text) { // OK, we're going to have to split this text node splitText((Text) o, targets.offsetAt(k), lE, targets.roleAt(k)); }
Once all these extra elements have been spliced into the XML spec, writing out the annotated version is pretty easy. I already had a large amount of Java code that converts the XML to HTML, which is used to generate the HTML versions of the spec that you can find at the W3C site and elsewhere. This code works by having a Java class for each element; all I had to do was to add a new class to handle the new spliced-in A elements, which involved very little work aside from putting in the -style decorations depending on the role attribute:
System.err.println("Done. Writing target..."); xroot = XLink.currentRoot(); out = new PrintStream(new FileOutputStream("target.html")); if (xroot != null) printer.writeHTML(xroot, out, false); out.close(); System.err.println("Done. Writing notes...");
Step 5: Write the Annotations
Writing out the annotations is pretty easy too. They'd been placed in a vector named notes constructed earlier in the process. For each one, we open a file, dump in the HTML text (with a bit of extra decoration) and the job's done.
We use the value of the id attribute for the filename; that's why it is #REQUIRED.System.err.println("Done. Writing notes..."); for (i = 0; i < notes.size(); i++) { if (notes.elementAt(i) instanceof Element) { child = (Element) notes.elementAt(i); title = child.attributeValue("id"); out = new PrintStream(new FileOutputStream( "notes/" + title + ".html")); out.println("<HTML><HEAD><TITLE>" + title + "</TITLE>"); ...