XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Mix and Match Markup: XHTML Modularization

May 02, 2001

XHTML Modularization makes it convenient to create specialized versions of XHTML: subsets with tailored content models and extensions in other namespaces. XHTML Modularization may be one of the most important new technologies of 2001. This article introduce the basics of XHTML modularization. The same approach can be used with many XML languages.


HTML is the major markup language used in the world today. However, it is source of wonder that the design of something so important should have been so out-of-control.

  • HTML started underdefined or, at least, with not much control,
  • but then proprietary extensions and incompatibility abounded,
  • so then attempts were made to rein in HTML by providing a DTD,
  • but it turned out that several DTDs were needed, (strict, loose and with frames) to manage the variants,
  • so the XML namespace mechanism was developed in part to allow more control (better labeling) of proprietary and standard extensions,
  • but the early assumption of some that namespace = DTD proved inappropriate for HTML, because generic processors were not interested in which HTML DTD was being used, they needed all HTML variants to belong to the same namespace.

The resulting situation, circa late 1999, was that the namespace mechanism and the three de facto standard W3C DTDs were not enough to meet the underlying requirement. Vendors and users needed to created supersets and subsets more conveniently.

The advent of W3C XML Schema seemed as if it might help by being namespace-aware and providing datatyping. But W3C XML Schema was clearly not being designed with XHTML's special needs as a primary requirement. Rather the Schema Requirements document was interested in generally reconstructing DTDs and supporting Java and SQL interchange. Convenience of use did not seem high on the agenda.

Furthermore, some of the key players in XHTML development had pragmatic and serious reservations about the applicability of W3C XML Schema. In particular, whether there would be much improvement over DTDs as far as XHTML's specific requirements were concerned. One of the leading voices of concern in this area was Sun's guru Murray Altheim, who has a web site www.doctypes.org, with the heading Long Live DTDs! (We're not exactly anti-schema, but we're sure pro-DTD.) If W3C XML Schema turns out to be good for XHTML modularization, it will not be because the early critics were wrong, but because they persisted and won through.

What Altheim and others recognized was that there was a missing layer required which would enable mix-and-match selection of components even within a namespace. So, for example, the mobile phone industry could define its own version of XHTML with the subset required (for example, to have no frames or tables) and with whatever other extensions are needed from other namespaces. From this realization came the XHTML Modularization project at W3C. A W3C Recommendation has been published at http://www.w3.org/TR/xhtml-modularization/.

("Modularization" is abbreviated M12N, in an obvious analogue to the way "internationalization" is abbreviated as i18n.)

XHTML Modularization is essentially a set of conventions for splitting up a large DTD into modules. Following these conventions, XHTML has been split into modules, which have been made available to the public.

Modularization works by providing a construct more largely-grained than the element and more finely-grained than the entire HTML namespace. The purpose of modularization is allow someone, perhaps not an expert in DTDs or Schemas, to restrict and extend their own version of HTML. Using modules means they won't miss something out by accident, as well as that there are placeholders for extensions and restrictions that are convenient and visible to others. So modularization does not actually alter the expressive power of DTDs or W3C XML Schema. Instead it provides an abstract model and practical conventions for how to organize a DTD or Schema.

As the abstract to the Recommendation Modularization of XHTML puts it,

This Recommendation specifies an abstract modularization of XHTML and an implementation of the abstraction using XML Document Type Definitions (DTDs). This modularization provide a means for subsetting and extending XHTML, a feature needed for extending XHTML's reach onto emerging platforms.

The Module Model

XHTML modularization has the following features.

The top-level document, which collects all the modules together. This document only references other DTDs, and does not contain any markup declarations in itself.
An entity which collects the various declarations relating to a distinct type of content. There are some core(required) modules which give the minimum declarations which a document type always needs in order to still be XHTML (or whatever language is being modularized), but most modules are optional.
Redefinable components
The various use of parameter entities in the HTML DTDs were analyzed to discover how they were being used. Various categories were introduced to handle the different uses: in the case of DTDs these are still implemented using parameter entities, but with a convention the parameter entity names have a suffix giving their category. In an W3C XML Schema implementation, many of the categories have more direct analogs. The categories include:
  • .content for general content models
  • .class for substitution groups
  • .mix for mixed content models
  • .attrib for attribute groups
You customize XHTML by redefining the declarations in those categories.

The abstract modules into which XHTML has been divided are:

  • Structures (core)
  • Text (core)
  • Hypertext (core)
  • List (core)
  • Applet
  • Text Extension
  • Forms
  • Table
  • Image
  • Client-side Image Map
  • Server-side Image Map
  • Object
  • Frames
  • Target
  • Iframe
  • Intrinsic Events
  • Metainformation
  • Ruby Annotations
  • Scripting
  • Style-sheet
  • Style Attribute
  • Link
  • Base
  • Name Identification

There are other modules for low-level housekeeping, and it is possible that some other modules, notably for Math ML will be used. So the modularization approach also will help make extensions of XHTML and HTML within the HTML namespace practical, which some may see as ironic: the purpose of XML Namespaces is to allow modularity, yet there are good reasons why, in the case of XHTML, the namespace should be allowed to grow. Document-type evolution does not always mean merely adding a new element in a new namespace. Even document types with a single namespace may need to be maintained and grow.

Pages: 1, 2

Next Pagearrow