XHTML is the Most Important XML Vocabulary
May 21, 2003
Taking the long view of recent technology, XHTML may be the most important XML vocabulary ever created. What I mean is not that XHTML will be the most widely deployed XML vocabulary, though if we take the long view, it could be. What I mean is that XHTML puts XML's reputation -- and, by extension, the W3C's reputation -- on the line to a greater degree than any other XML vocabulary. (Which, if true, makes XHTML 2.0's relative absence in discussion on XML-DEV a puzzle...)
There are, as XML.com readers are fully aware, thousands of XML vocabulary projects proposed, underway, or completed. They range from the simple to the sublime; well, probably not sublime, but at least crucial to some larger technical or social endeavor. But XHTML is the most crucial, for the reputations of both XML and the W3C, because it is the most visible, the most document-centric, and the most central to the future health and vitality of the Web itself.
As I wrote in an XML-Deviant column last summer ("XHTML 2.0: The Latest Trick"), ordinary Web designers and content creators quickly learned and used early versions of HTML because they were reasonably easy to grasp, and the reward for learning them was substantial. A reasonably computer-literate person can still learn to create XHTML 1.1 documents with reasonable effort and within a reasonable time. Even if it takes a week of evenings to become comfortable with the main features of XHTML, that's a small investment to make for a relatively big return.
The Web's success, then, is due in part to the simplicity and generality of HTML. The ongoing success of the Web will be in part a function of maintaining a positive balance between how difficult and how empowering it is to learn XHTML. Some form of HTML, eventually XHTML, will always be the most common type of Web content; people will keep writing it by hand, building user interfaces with it, trying, succeeding, failing to scrape useful information from it, and so on. Any part of the Web's infrastructure with such a long future life cycle deserves careful, attentive, community shepherding.
XHTML 2.0 Continues to Evolve
The HTML Working Group released a new draft of XHTML 2.0 at the beginning of May. It is a draft which displays evidence that community feedback can make a difference to the development of a specification. In what follows I briefly comment on some of the most interesting bits of the new XHTML 2.0 draft.
The arrival of RELAX NG. Perhaps the most welcome development, particularly from the perspective of XML-DEV geeks, is the appearance of a normative RELAX NG schema for XHTML 2.0. This development is welcome because it signals a growing acceptance of RELAX NG -- a non-W3C schema specification language -- within the working groups of the W3C. It is also welcome because XHTML is among the most document-centric of all XML vocabularies, and having RELAX NG's fittingness for such vocabularies on display is a good thing.
The Edit Collection. The most striking difference between the Web as it evolved and the Web as it was intended (by, among others, Tim Berners-Lee) is the read-only nature of the Web for most of its users. In other words, the early vision of the Web, and the earliest implementations of Web browsers, was as a read-and-write medium and a read-and-write tool.
XHTML 2.0's section 6.4 "Edit Collection" adds back some support for Web content editing.
The collection, according to the new draft, "allows elements to carry information
how, when and why content has changed." Particular XHTML 2.0 elements (including inline
elements like <span>) can have an
edit attribute, which can have one of
four permissible values:
moved. One of these values,
deleted, carries with it a "default
presentation" which, in CSS terms, is
display: none. For those of us for whom
XHTML is or will be an editorial workflow document format, the Edit Collection is
a move in
the right direction.
The return of style. Surely the most hotly contested XHTML 2.0 change was
an early draft's removal of the
style attribute, which allows CSS designers to
apply local style code to XHTML constructs. The debate between those who wanted to
and those who wanted to preserve the style attribute hilighted a fundamental cleft
XHTML community between -- to put it not too tendentiously -- markup geeks and presentation
weenies. Each side got a bit nasty during the debate, causing no small amount of
schadenfreude among bemused onlookers. (The anti-style attribute position was most
aptly argued by Ian Hickson in -- whether you agree with it or not -- a classic mailing list
post in January of this year.)
The HTML Working Group has demonstrated, however, that it knows how to listen to community squabbles, and it has restored the style attribute in the latest XHTML 2.0 draft. I suspect, however, that we have not heard the last word on this issue, and I wouldn't be at all surprised if the style attribute finds itself out in the cold again at some point.
The revenge of the nerds:
<blockcode>. Moving on to an issue nearer to my geeky heart, the
Working Group has added an analogue of the venerable <blockquote> just for
programmers: <blockcode>. My only complaint is that the similar element names means
bit of my HTML muscle memory is going to have to be retrained. If you squint hard
<blockcode> is syntactic and semantic sugar for <pre><code>-sequences. It
can carry a
class attribute, which may be used to indicate the type of code
contained in the block. I suspect that this is probably semantically underdetermined,
first things first. Even though this new feature is of no interest to the great majority
non-programming XHTML users, I can't help but think that it's one of my personal favorites.
I look forward to being able to do stuff like this:
<blockcode class="http://www.python.org/"> from mailbox import UnixMailbox from email import message_from_file; import sys mbox = UnixMailbox(open(sys.argv, 'r'), message_from_file) new_mbox = concat(sys.argv, 'w') substring = sys.argv for message in mbox: if message['subject'].find(substring) != -1: new_mbox.write(message.as_string(unixfrom=1)) new_mbox.close() </blockcode>
The return of <cite>. Paralleling the return of the style attribute,
cite element has also returned to the latest XHTML 2.0 draft. Though not as
hotly contested as the removal of style, <cite> definitely has its fans and
supporters, and I number myself among them. Though it is ironic since, in my experience,
<cite> is by far the most often misused bit of HTML by XML.com authors. It
isn't used very often, but when XML.com authors use it, it's almost always misused
as if it
cite takes a
cite attribute, which I think
would be better named "source", but that's merely a quibble.
Caption, Glorious Caption! I have been using HTML of one variety or another since 1995, and I have most frequently lamented the lack of a generic way to markup a caption for images. As newspaper and other hard media geeks know, editorial images just about always demand some kind of captioning text, usually containing image metadata of some kind or another: author, date, copyright, etc. In these editorial contexts, the lack of a caption construct has meant faking it with redundant and vague table-and-paragraph constructs. The advent of CSS has alleviated the pain here somewhat, but it's long past time that a first-class caption construct was added to XHTML.
Also in XML-Deviant
I am very pleased to report that the latest XHTML 2.0 draft contains a provision for
caption element, which may reside within either
object elements. I applaud this rational, simplifying, and long overdue
addition. There is more than enough evidence of the utility and need for exactly this
XHTML 2.0 is headed in the right direction, even if you're among those who think that, for example, the style attribute should die a horrible death. Sometimes W3C working groups do not have much of an active user community with which to have dialog about its work. But in those lucky cases where there is such a community, working groups do well to pay careful attention to what they want and say. This general rule is even more important in the case of XHTML. Despite the widespread pessimism about XHTML's deployment, it is far, far too important to be left in the hands of a working group alone.