Flipping the Links
First, the program creates an instance of the Lark parser, and in one step, reads the annotations file:
annot = new Lark();
r = new XmlInputStream(new FileInputStream(args[0]));
System.err.println("Parsing Annotations...");
annot.buildTree(true);
annot.saveText(true);
aroot = annot.readXML(h, r);
After this operation, the variable aroot points to the root of
the annotations document tree.
This code runs through the annotations tree and puts all the XLinks that it finds in the variable xlinks:
Vector xlinks = new Vector();
findXLinks(aroot, xlinks);
System.out.println("xlinks: "+xlinks.size());
...
private static void findXLinks(Element e, Vector v)
...
XLink link = XLink.isLink(e, sIDs);
if (link != null) v.addElement(link);
...
children = e.children();
for (i = 0; i < children.size(); i++)
{
child = children.elementAt(i);
if (child instanceof Element)
findXLinks((Element) child, v);
}
This code looks at each element, then calls itself recursively on that
element's children. It uses the routine isLink to determine
whether an element is an XLink:
public static XLink isLink(Element e, Hashtable ids)
...
String form = e.attributeValue("xml:link");
if (form == null) return null;
else if (form.equals("simple") || form.equals("extended"))
{
XLink link = new XLink(ids);
link.loadFromElement(e, ids);
return link;
}
...
An element is an XLink if it has an attribute xml:link= whose
value is either simple or extended.
Each XLink, once identified, is stored into an XLink object for later re-use; these objects are what populate the xlinks vector mentioned above. The function which loads up the XLink data structures is the same for the extended and locator linking elements, since their makeup is almost identical.:
private void loadFromElement(Element e, Hashtable ids)
...
// load up role, content-role, etc. fields from attribute values
mRole = e.attributeValue("role");
..
splitHref(); // save the HRef and build an XPointer if there is one
Vector children = e.children();
for (i = 0; i < children.size(); i++)
{
if ( // check xml:link att to see if this child is a locator
{
XLink link = new XLink(ids);
link.loadFromElement((Element) child, ids);
mLocators.addElement(link);
Once the xlinks vector is filled up, we run through it, traversing those whose role is not "annotation" - in effect, this means we traverse only the spec XPointers:
findXLinks(aroot, xlinks);
...
System.err.println("Traversing...");
for (i = 0; i < xlinks.size(); i++)
{
XLink link = (XLink) xlinks.elementAt(i);
targets = link.traverseExcept("annotation");
if (targets.size() == 0)
{
System.out.println("Dangling Annotation!");
link.dump(System.out);
}
The routine traverseExcept returns the list of targets that
the corresponding XPointer points at. If that list is of size 0, it means
that the XPointer is broken and doesn't point at anything. Since I was
constructing the XPointers by hand, I saw this message a lot during the
construction of the Annotated Spec.
The example above doesn't show the actual traversal of a single link; here's the code which does that:
private void doTraverse(Vector ret, Element linkingElement)
...
if (// I haven't already parsed this instance
{
// parse the instance
System.err.println("Parsing target...");
Lark xml = new Lark();
...
sTargetRoot = xml.readXML(h, r);
System.err.println("Done.");
}
}
ret.addElement(sTargetRoot);
// if there's an XPointer, traverse that
if (mXP != null)
mXP.traverse(ret);
This code first looks to see whether the URL in the XLink has already been
parsed. If it hasn't, it makes a new parser and parses that document.
Since in this case, all the links are into the XML spec, this code
only gets executed once, causing the parsing of the XML spec when the first
spec locator element is traversed.
The code above is putting the results of each traversal in the vector ret. First of all, it inserts a pointer to the root of the target XML document, then checks to see if the XLink contains an XPointer (in the Annotated Spec, they all do) and if so, traverses that. The code for that is here:
public void traverse(Vector ret)
...
// an XPointer is stored as an array of steps in an obvious way
for (i = 0; i < mSteps.size(); i++)
{
step = (XPStep) mSteps.elementAt(i);
kw = step.kw();
switch (kw)
{
case sDescendant:
for (j = 0; j < old.size(); j++)
{
start = (Element) old.elementAt(j);
inOrder(start, ret, step, 0);
}
break;
case sString:
for (j = 0; j < old.size(); j++)
{
start = (Element) old.elementAt(j);
searchForString(start, ret, step, 0);
}
break;
case sID:
ret.addElement(mIDs.get(step.type()));
break;
case sChild:
...
for (k = 0; k < children.size() && !done; k++)
{
o = children.elementAt(k);
if (step.match(o))
...
case sRoot: case sHere: case sDitto: case sHTML:
case sAncestor: case sPreceding: case sPSibling:
case sFollowing: case sFSibling: default:
if (kw < sMaxKW)
throw new Exception("Can't do '" + sKeywords[kw] + "' yet");
else
throw new Exception("Bogus KW value "+kw);
As the comment says, each XPointer object contains an array with each element
being one of the XPointer's steps.
First, look for a moment at the sChild code above.
It uses the method step.match() to check whether one step
matches a particular child.
The code for sDescendant also uses that code, but of course has to do an in-order traversal of the whole subtree. Note how few of the XPointer verbs are implemented.
This part of the code, following all the links and making the connections between the annotation and the spec, takes almost no time to run. Some of the elapsed time in running this program is spent in parsing the annotations file and the XML spec, but most of it goes into the task of loading these fairly large complex documents into trees in memory. This burns immense amounts of memory; the two files are together less than 500K in size, but the two trees in memory use well over 10 megabytes. There are some tricks that could be used to make the trees more compact, but even so, the lesson is that fully-parsed XML documents burn a lot of memory. This is one reason why it's a good idea to do as much work as possible through a stream API such as SAX. However, the annotation in particular and XPointer processing in general are examples of jobs that it would be really hard to do without having a tree in memory.
At the end of all this work, the vector targets contains a list of all the locations in the XML spec that have annotations pointing at them. Next, the code runs through the XML spec and, for each element that is the target of a link, it adds a new child which is an HTML A element with an href= pointer pointing at the appropriate annotation file. This element is added as the last child of the annotated element, so the annotation symbol will appear at the end of the target. This is an arbitrary choice that worked fine for the Annotated Spec, but may not be appropriate for many other applications. If the XPointer points into the middle of a chunk of text, more work is required; the text has to be split into two nodes, and the HTML A element inserted between them:
for (k = 0; k < targets.size(); k++)
{
Object o = targets.targetAt(k);
Element lE = link.linkingElement();
if (o instanceof Element)
{
((Element) o).addChild(makeAnchor(lE, targets.roleAt(k)));
}
else if (o instanceof Text)
{
// OK, we're going to have to split this text node
splitText((Text) o, targets.offsetAt(k),
lE, targets.roleAt(k));
}
Once all these extra elements have been spliced into the XML spec, writing
out the annotated version is pretty easy.
I already had a large amount of Java code that converts the XML to HTML, which
is used to generate the HTML versions of the spec that you can find at
the W3C site and elsewhere.
This code works by having a Java class for each element; all I had to do was
to add a new class to handle the new spliced-in A elements,
which involved very little work aside from putting in the
-style decorations depending on the role
attribute:
System.err.println("Done. Writing target...");
xroot = XLink.currentRoot();
out = new PrintStream(new FileOutputStream("target.html"));
if (xroot != null)
printer.writeHTML(xroot, out, false);
out.close();
System.err.println("Done. Writing notes...");
Writing out the annotations is pretty easy too. They'd been placed in a vector named notes constructed earlier in the process. For each one, we open a file, dump in the HTML text (with a bit of extra decoration) and the job's done.
System.err.println("Done. Writing notes...");
for (i = 0; i < notes.size(); i++)
{
if (notes.elementAt(i) instanceof Element)
{
child = (Element) notes.elementAt(i);
title = child.attributeValue("id");
out = new PrintStream(new FileOutputStream(
"notes/" + title + ".html"));
out.println("<HTML><HEAD><TITLE>"
+ title + "</TITLE>");
...
We use the value of the id attribute for the filename; that's
why it is #REQUIRED.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.