XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Mix and Match Markup: XHTML Modularization
by Rick Jelliffe | Pages: 1, 2

Modules using DTDs

Modularization gets more complex when you want to add your own elements or customize content models. But the most basic use of just creating an XHTML subset is quite simple.

There are several ways to use modularization with DTDs. The most common way is subtractively. You set some switches (parameter entities containing IGNORE or INCLUDE keywords for marked sections) to override the settings of a canned driver file such as http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd. To make a version of XHTML without applets and objects, you do something like the following:

<!ENTITY %
xhtml-applet.module "IGNORE" > <!ENTITY % xhtml-object.module
"IGNORE" >

<!ENTITY % xhtml11.mod
     PUBLIC "-//W3C//DTD XHTML 1.1//EN"
            "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" >
%xhtml11.mod;

The other way to use modularized DTDs is additively, where you explicitly create your own driver file, including various modules as you will. I include an example of the additive method for comparison with the XML Schemas example below. Here is an example for a minimal language which only contains the basic head-body structure, text, hypertext, lists, tables, images and image maps.

<!--
     PUBLIC "-//Rick Jelliffe//SGML XHTML Little HTML 1.0//EN"
 -->
<!ENTITY % XHTML.ns  "http://www.w3.org/1999/xhtml" >

<!-- required -->
<ENTITY % xhtml-framework.mod
     PUBLIC "-//W3C//ENTITIES XHTML 1.1 Modular Framework 1.0//EN"
            "xhtml11-framework-1.mod" >
%xhtml-framework.mod;

<!ENTITY % xhtml-text.mod
     PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Basic Text 1.0//EN"
            "xhtml11-text-1.mod" >
%xhtml-text.mod;

<!ENTITY % xhtml-hypertext.mod
     PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Hypertext 1.0//EN"
            "xhtml11-hypertext-1.mod" >
%xhtml-hypertext.mod

<!ENTITY % xhtml-list.mod
     PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Lists 1.0//EN"
            "xhtml11-list-1.mod" >
%xhtml-list.mod;

<!ENTITY % xhtml-struct.mod
     PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Document Structure 1.0//EN"
            "xhtml11-struct-1.mod" >
%xhtml-struct.mod;

<!-- optional -->
<!ENTITY % xhtml-table.mod
     PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Tables 1.0//EN"
            "xhtml11-table-1.mod" >
%xhtml-table.mod;

<!ENTITY % xhtml-image.mod
     PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Images 1.0//EN"
            "xhtml11-image-1.mod" >
%xhtml-image.mod;

<!ENTITY % xhtml-csismap.mod
     PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Client-side Image Maps 1.0//EN"
            "xhtml11-csismap-1.mod" >
%xhtml-csismap.mod;

<!ENTITY % xhtml-ssismap.mod
     PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Server-side Image Maps 1.0//EN"
            "xhtml11-ssismap-1.mod" >
%xhtml-ssismap.mod;
			

Modules using XML Schemas

Daniel Austin of Mozquito Technologies AG has been working on a draft specification for implementing modularization using XML Schemas. A draft is now available, and public comments solicited, at http://www.w3.org/TR/xhtml-m12n-schema/. (The approach is similar to that in will be similar to "An Approach to the Modularization of XHTML using XML Schemas". )

This approach is a little different than the DTD approach. In the simplest use of DTD, you start with a complete XHTML and weed out the modules you don't want with explicit IGNORE settings, while the XML Schemas approach is to only include modules you need. However, the important thing is that both methods allow the same result: you can select or reject the modules using a single statement.

What is most surprising, to me anyway, is that XML Schema is simpler and more straightforward to use than DTDs. XML Schema provides higher-level constructs which, though not as high-level as those of the M12N abstract model, are closer to the M12N model than DTDs.

So the mapping from XHTML M12N concepts to XML Schemas requires less glue than DTDs require. Support for XHTML M12N is an important, though late use-case for XML Schemas.

Rather than give details, here is a simple example of what a driver (hub document) might look like in XML Schemas. See the XHTML M12n in XML Schemas draft for more concrete details. Note that the modules are simply included, commented out, or deleted.

<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
   version="-//Rick Jelliffe//SGML XHTML Little HTML 1.0//EN"
   targetNamespace="http://www.w3.org/1999/xhtml">
    
    <annotation>
         <documentation>This is a simple example of XHTML Modularization.
         </documentation>
    </annotation>

    <-- core module -->
    <include  schemaLocation="xhtml-framework-1.xsd" /> 
	
     <-- optional modules -->
    <include  schemaLocation="xhtml-table-1.xsd" />
    <include  schemaLocation="xhtml-image-1.xsd" />
    <include  schemaLocation="xhtml-csismap-1.xsd" />
    <include  schemaLocation="xhtml-ssismap-1.xsd" />
    
</schema>   
			

The apparent simplicity hides the exhaustive analysis of HTML by the experts at the W3C HTML Working Group to partition HTML into modules. And some of the simplicity compared to the DTD version above is superficial. The DTD version above provides public identifiers as well as the system identifier, and the designers of the XHTML modularization could have chosen to incorporate the required modules into a single module for terseness.

The redefine element in XML Schemas was introduced specifically to support the XHTML M12n use-case. Features such as include, import, and redefine are syntactic sugar which do not increase the power of XML Schemas: you could do the same by editing the schema document by hand. Instead they make it convenient to use and reduce the amount of messy glue needed to impose some higher-level abstractions, such as modules, which group schema components.

W3C XML Schema is not perfect yet. In particular, XML Schema operates on the infoset (the parsed XML content) rather than the document as it is being parsed. So the issue of how to treat special character entity references is still open.

There is also an interesting bootstrapping issue. It would be desirable to have the documentation contain XHTML elements, but that might tax some Schema processors, especially in the early days. In general, it is unwise to use elements in a namespace inside a schema defining that namespace. So it is possible that XHTML schemas will be the only schemas that cannot use XHTML for documentation.

Future

Evidence for the success of the XHTML Modularization concept may be found in the rapid development of the RDDL language, which is Jonathon Borden and Tim Bray's (with others) Resource Directory Description Language, a version of XHTML with a simple linking element added in another namespace to point to the various resources related to a namespace URI. It is exactly the kind of DTD that XHTML M12n is good at, though XML M12n was inspired by the needs of PDAs or small appliances.

The next question that arises is whether the modularization system would be useful for other large languages that we might wish to subset (did anyone say XML Schema?) The question is whether that kind of modularization allows the cake to be sliced in the most appropriate way. Additive modules could clearly be used to handle, for example, selecting or not selecting a key/keyref module, but if facets were modularized, an additive driver might be quite large, and there is no subtractive M12n approach tabled for XML Schemas. But still, the document would be simple to read and straightforward to create.

The final question is whether, if modularization in XML Schemas is useful, there should be first-class markup to support it? Presumably, the way to approach this would be to provide some first-class support so that there might be a <module> element which could be used by some schema-management tools directly, to provide markup rather than just conventions. A modularization system for XML Schemas should look at the modularization system in the RELAX language (which is based on Toru Takahashi's early designs for SGML.)