XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Extending the Web: XHTML Modularization
by Kendall Grant Clark | Pages: 1, 2

A Brief Overview of XHTML Module Creation

Adapting Hudak's DSL-creation process is useful, but step 3 is inaccurate in a way worth remarking. Because XHTML is really a collection of modules fitted together to act like HTML 4, one can equally embed a new module within XHTML, thus using XHTML like a host language, or embed some XHTML modules into a separate markup language, thus using XHTML as an integration tool.

The modularization of XHTML makes both patterns of embedding possible, but in what follows I concentrate on extending rather than integrating XHTML. Extending XHTML involves nontrivial XML DTD hacking. The W3C aims to make XHTML modules implementable using W3C XML Schema. For many people W3C XML Schema hackery is more off-putting than XML DTD hackery.

The easiest way to extend XHTML informally is to do without it. The XHTML specification requires XHTML instances to be valid; there is no well-formed but not valid XHTML. If you add elements or attributes to XHTML, but do not go through the DTD or Schema hacking contortions to extend XHTML formally, the resulting instances may be well-formed XML, but they aren't formally XHTML. Depending on what you need, that may or may not matter.

A DTD specifies elements, element attributes, and element content models. It follows, then, that an XHTML extension module specifies elements, element attributes, or element content models. You extend XHTML by adding elements, attributes, modifying content models, or some combination of these. The concrete implementation of an XHTML module requires both a qname (qualified name) module, which does namespace handling, and a declaration module, which holds the element, element attribute, and content model declarations. The declaration module uses the parameter entities declared in the qname module.

The qname Module

Through clever use of INCLUDE and IGNORE sections, the qname module declares all the qualified names of the XHTML module, including whether or not XML namespaces are used. A qname module contains at least five parameter entities (see Norm Walsh's "What is XML?" for a refresher on parameter entities), plus one for each new element the module declares; the names of these parameter entities are formed with the name of the module being defined.

So, for example, if you were building an XHTML module for FAQs for the United Nations, you might name your module "unfaq" and put the following parameter entities into the qname module.

First, unfaq.prefixed, which has as its default value, "%NS.prefixed;"; declares whether or not unfaq's elements are to be used with XML namespace prefixed names. The default value of the parameter entity that unfaq.prefixed points to is IGNORE.

Second, unfaq.xmlns, which contains unfaq's namespace URI, http://www.un.org/XML/XHTML/faq/1.0/.

Third, unfaq.prefix, which contains unfaq's default prefix string, which is used when prefixing is turned on: "unfaq".

Fourth, unfaq.pfx, which has %unfaq.prefix; as its value if prefixing is turned on; otherwise has nil value.

Fifth, unfaq.xmlns.extra.attrib, which has as its value the declaration of any XML namespace attributes for any namespaces used in the unfaq module.

Sixth, for every element defined by unfaq, the qname module contains a parameter entity that holds the qualified name. If unfaq declares three new elements -- faqItem, question, and answer -- its qname module would have parameter entities unfaq.faqItem.qname (value: %unfaq.pfx;faqItem), unfaq.question.qname (value: %unfaq.pfx;question), and unfaq.answer.qname (value: %unfaq.pfx;answer). Thus, if prefixing is turned on, the faqItem element will be written as <unfaq:faqItem>, otherwise it will be <faqItem>.

The declaration Module

The declaration module of an XHTML extension module contains the actual declarations of all elements, element attributes, and element-content models which together constitute the module. In the case of the hypothetical UN FAQ module, the declaration module would contain declarations for the three elements, faqItem, question, and answer. The ATTLIST for each element, in addition to the attributes required by the content domain itself, must also contain a parameter entity, %NS.decl.attrib;, if prefixing is turned on. If prefixing is turned off, the ATTLIST must also include the specific namespace information for the module.

The trick to building the declaration module is to remember to make declarations about the parameterized structures from the qname module; that is, the declaration module looks like an ordinary XML DTD, except that it uses qualified names via parameter entities. For example,

<!ELEMENT %unfaq.faqItem.qname;  ( %unfaq.question.qname;, 
                                   %unfaq.answer.qname; )       >
<!ATTLIST %unfaq.faqItem.qname; 
          %unfaq.xmlns.attrib;                                  >

<!ELEMENT %unfaq.question.qname; ( #PCDATA )                    >
<!ATTLIST %unfaq.question.qname; 
          %unfaq.xmlns.attrib;                                  >

<!ELEMENT %unfaq.answer.qname;   ( #PCDATA )                    >
<!ATTLIST %unfaq.answer.qname; 
          %unfaq.xmlns.attrib;                                  >

The New DTD

The last step is to create the actual DTD machinery for the XHTML extension. It references both the XHTML modules and the new modules which contain the new semantics.

The new model module must be combined with XHTML's other model modules in a new DTD. The various qname modules corresponding to each model module must be collected into a new module, the qualified names collection, which contains all the qualified names for the extended XHTML markup language. The qualified names collection module must reference the qname module for each extension module defined; in this case, unfaq. It must also contain the declaration of the XHTML.xmlns.extra.attrib parameter entity as the collection of unfaq.xmlns.extra.attrib parameter entities.

In addition to a file constituting the collection of qname modules, the new markup language needs a driver file, which is read by validating parsers and other XML tools in order to validate the new markup language. The driver file must declare a parameter entity, XHTML.version, the value of which is the formal public identifier for the newly created markup language; for example,

"-//United Nations//DTD XHTML Frequently Asked Questions Extension 1.0//EN"

The driver file must contain a declaration of the parameter entity xhtml-qname-extra.mod, the value of which points to the qnames collection module for the newly created markup language. Next, the driver must contain a declaration of the parameter entity xhtml-model.mod, the value of which points to the model module for the new markup language. Finally, the driver points to the declaration module of the new markup language as well as to the XHTML modules themselves, including the module that holds all the assorted modularization machinery, the XHTML Modularization Framework Module.

Conclusions and Other Conceits

Resources

• Shane McCarron, How to create XHTML Family modules and markup languages for fun and profit

• Nicholas Chase, Modularization of XHTML

• W3C HTML Working Group, Modularization of XHTML

• Zvon's XHTML 1.0 Reference

• The XHTML-L mailing list

Even though I have talked about XHTML in terms of some of the early marketing promises of XML, as well as the needs and expectations of content creators, there are many other contexts within which XHTML is valuable, including small and nonstandard device profiles, as well as the maintenance needs of moving the Web from classic HTML as lingua franca to the still as yet uncertain future. In this article I have tried to give a very brief overview of what it's like to employ the XHTML modularization framework in an extension-superset direction. While I have left many details undiscussed, I think it should be clear that extending XHTML requires a good working knowledge of the mechanics of XML DTDs. There are a growing number of very detailed tutorials available from many places on the Web.