Menu

Comments in a "No comment" World

July 30, 2003

John E. Simpson

Q: How do I control the formatting of comments?

I have the following piece of XML:

<?xml version="1.0" encoding="UTF-8" ?>

<TopLevel> <!-- TopLevel comment -->

  <SubLevel /> <!-- SubLevel comment -->

</TopLevel>

When viewed in the browser it looks like this:

<?xml version="1.0" encoding="UTF-8" ?>

<TopLevel>

  <!-- TopLevel comment -->

  <SubLevel />

  <!-- SubLevel comment -->

</TopLevel>

This makes it hard to determine which element the comment refers to. Is there a way to have the comments output on the same line or is this dependent on the browser? Should comments generally be placed before or after the elements they refer to?

A: Commenting XML documents is not often given much attention in XML books and articles, for at least two reasons:

  • Technology: XML parsers generally pass comments on to whatever downstream applications use the parsers' output. But the XML Recommendation explicitly says they needn't do so. Thus, planning for special treatment of comments (as in your case) implies that you know your own parser's behavior.
  • Human nature: XML is no different from other areas of geek focus in at least one respect: its practitioners love (in some cases the word is no exaggeration) the nuts-and-bolts of character and parameter entities, Unicode, XSLT template rules, RDF and XML Schema, to say nothing of the software to process it all. In contrast to the near-geometric beauty of document content, comments -- potentially messy, free-form, undisciplined clots of text -- are dull and of little interest. And like white space added for readability, they can seem just so much noise clogging up the signal.

A vague whiff of disrepute also lingers over markup comments, dating back to (X)HTML's accepted practice of (ironically) requiring that they be "structured" in some cases: when their contents embed scripting language code, in order to hide the code from the browser's user. (I don't think of myself as a purist, but this hack has always seemed to me more sneaky than elegant, like painting your driveway black when it really needs fresh asphalt.)

At the same time, commenting code has a long and honorable history among software developers who recognize the importance of letting other developers know what's going on, especially in particularly thorny, inscrutable passages of code. So my heart leapt up when I read your question: Here, I thought, is somebody who wants to do something good, and wants to do it the right way.

If you want to tie the display of your code to the browser medium, you've got an uphill battle before you. You probably know that XML parsers must pass to downstream applications all non-markup characters, including white space (blanks, newlines, tabs) included for readability. Furthermore, you can explicitly force this behavior for white space by adding an xml:space="preserve" attribute to your document's root element. The problems begin once the parser hands off the data to the requesting application, such as a browser. Unconstrained by the XML Recommendation (which affects the behavior only of parsers or "processors" as the spec calls them) or, really, by much of anything except perhaps their developers' nobility of purpose, browsers can do pretty much whatever they want with comments or whitespace.

But if you're determined to do this right, and do it in a browser to boot, then you can always rely on XSLT to force the formatting for you. This can effect almost any kind of look you want, from a bare-bones "plain old text" to a more elaborate JavaDoc-like appearance. As you've already seen, the challenge is to display not just the document's contents, but its structure.

One simple approach is suggested by G. Ken Holman's SHOWTREE stylesheet, available in both non-Microsoft and Microsoft-specific versions. Holman's stylesheet uses a numbering system to indicate "how far down" a particular node exists in the document tree. For example, applied to the document you supplied in your question, the SHOWTREE output looks like this:

SHOWTREE Stylesheet -

http://www.CraneSoftwrights.com/resources/

Processor: SAXON 6.2.2 from Michael Kay

1 Proc. Inst. 'xml-stylesheet': {type="text/xsl"

href="showtree-20000610.xsl" }

2 Element 'TopLevel':

2.1 Text (TopLevel): { }

2.2 Comment (TopLevel): { TopLevel comment }

2.3 Text (TopLevel): {

}

2.4 Element 'SubLevel' (TopLevel):

2.5 Text (TopLevel): { }

2.6 Comment (TopLevel): { SubLevel comment }

2.7 Text (TopLevel): {

}

A fancier approach is the Pretty XML Tree Viewer developed by Mike Brown (with help from Jeni Tennison). This actually consists of not just an XSLT stylesheet, but a CSS stylesheet as well. The CSS stylesheet augments the display of the XHTML code to which your XML is (via the XSLT stylesheet) transformed. Brown's stylesheets together render your sample document like this:

Pretty Tree Viewer XML-to-XHTML output

By the way, both Holman's and Brown's stylesheets highlight one potentially troublesome aspect of the code fragment you supplied. This has to do with the placement of comments and white space not physically, but structurally, relative to their corresponding elements. To wit: Everything in the document is a child of the TopLevel element. Even the comment which (to a human reader's eye) "belongs with" the SubLevel element actually is a child of TopLevel -- because SubLevel, being an empty element, contains nothing at all.

In addition, the whitespace (sometimes consisting of a single newline) can actually hamper the output if you're going the XSLT route. For instance, you can see four text nodes you probably weren't even thinking of as present in your document. In this case, you might want to strip out all "insignificant" whitespace at the time you do the transformation.

On the issue of where to place the comments, it depends on whether the comments are brief and in-line, as in your example, or set up as larger blocks. For the former, at the end of the line seems to make better sense. As for the latter, my own preference is to read the comment before the corresponding bit of code. (In everyday terms, this translates to a distaste for "What the heck was that?" experiences.) While there's no official rulebook for such things, I'd bet that this is the de facto standard for most block-style comments in XML documents as in programming source code.

Q: Any tools for rendering an XML schema's annotations?

I am looking for a way (in XSLT, I guess) to extract the documentation in an XML schema contained inside (standard) xsd:annotation/xsd:documentation elements to a human-readable document, ideally a DocBook or HTML.

A: Start with Chris Maden's xsd2html ("XSD-documenting XSLT stylesheet"), tweaking it to your own needs. (Note especially that the XML Schema namespace prefix used in this stylesheet is xs:, not xsd:. If you prefer the latter prefix, be sure at least to change all occurrences of xs: to xsd: in your copy of Maden's stylesheet.) The result tree vocabulary in this case is HTML.

Ironically, this schema-documenting tool is itself undocumented, but the simple structure makes tweaking easy. Each XML Schema element and attribute has at least one template rule. In some cases, specific occurrences of a given element within the node tree are elevated to special status. For instance, xs:documentation elements are generally transformed to simple HTML p elements, except in the case of the first xs:documentation child of the first xs:annotation child of the root xs:schema element. This first (and presumably, "most important") xs:documentation element's content is promoted to full-blown h1 status for large display.

What does output of Maden's stylesheet look like? Well, we can use it to view (for example) the XML Schema 1.0 " Schema for Schema Structures." This is a heavily-annotated schema (as well it should be). One fragment from the beginning of this schema looks like the following:

<xs:schema targetNamespace="http://www.w3.org/2001/XMLSchema"



   blockDefault="#all" elementFormDefault="qualified"

   version="Id: XMLSchema.xsd,v 1.48 2001/04/24 18:56:39 ht Exp

"

  xmlns:xs="http://www.w3.org/2001/XMLSchema"

xml:lang="EN">



  <xs:annotation>

    <xs:documentation



source="http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/structures.html">



      The schema corresponding to this document is

normative,

      with respect to the syntactic constraints it expresses in

the

      XML Schema language. The documentation (within

&lt;documentation> elements)

      below, is not normative...</xs:documentation>

  </xs:annotation>
  <xs:annotation>

    <xs:documentation>

      The simpleType element and all of its members are

defined

      in datatypes.xsd</xs:documentation>

  </xs:annotation>



...[etc.]...



</xs:schema>

As you can see, there's a lengthy xs:documentation element at the outset, followed by a shorter one which precedes the (imported) declaration of the simpleType element. The first xs:documentation element renders as follows in a browser:

xsd2html: a schema's first xs:documentation element

And the second like this:

xsd2html: one of a schema's later xs:documentation elements

One of the first things you might want to tweak, obviously, is the set of assumptions about the first xs:documentation element in the schema. In the first place, its transformation to an HTML h1 element looks freakishly large; at least it does when the element's content is lengthy. Also, the element's content all gets dumped into the browser title bar.

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

I asked Maden what his real-world experience had been with using the xsd2html stylesheet. He said he's used it principally in work for clients, but added: "Be sure to mention that this is very beta -- or even alpha -- but it's definitely open source, and I'd encourage folks to share improvements they make." He assured me any such improvements "will be credited to [their developers] in future releases."

I also checked with Eric van der Vlist, author of O'Reilly & Associates' XML Schema, to see if he knew of any other tools to perform this task.

Van der Vlist says he applied a series of home-grown multiple XSLT transformations ("nothing I would dare to show, at least nothing generic enough") to the W3C XML Schema schema (referenced above) in order to produce chapters 15 and 16 of his book. The main task: "to simplify [the schema] into something usable." He added, "That's probably one of trickiest jobs I have done with XSLT".

I can believe it. Luckily, if all you are really interested in are the xs:annotation and xs:documentation elements, your task should be much, much simpler.