Reconstructing DTD Best Practice

June 13, 2000

Leigh Dodds

In a presentation at XML Europe 2000, Henry Thompson examined current "best practice" in DTD design and provided a reinterpretation using XML Schemas. The talk focused on the capability of schemas for defining complex types and asserting equivalence among classes of elements.

DTD Best Practice

Henry Thompson explained that current best practice in DTD design is to use parameter entities to define class hierarchies of element types. The use of parameter entities allows textual declarations in the DTD to be reused in multiple places.

The problems with this methodology are two-fold. Firstly, heavy use of parameter entities to properly structure large DTDs makes them harder to maintain and interpret by greatly reducing readability. Secondly, the hierarchies of element types are implicit within the parameter entity declarations, and are not a formal part of the DTD. There is no direct support for achieving this kind of reuse in DTD syntax.

Formal language design has progressed significantly since the 1960s, when DTDs were first defined. Thompson observed that, since then, textual substitution-based mechanisms to achieve reuse and structure, like parameter entities, are now seen as less than perfect.

Schema Best Practice

Thompson summarized the four basic requirements placed on the XML Schemas effort as being to 1) reconstruct DTD functionality; 2) accommodate XML Namespaces; 3) provide a richer set of data types; and 4) take advantage of the current understanding of formal design. With the latter in mind, XML Schemas provide richer mechanisms for achieving type reuse and defining class hierarchies. Thompson singled out two of these as the main content of his presentation.

XML Schemas provide a separation between element declarations and type declarations. The schema author can declare a type, and define its content model and attributes. The author can later associate that with any number of elements. Types can also be derived from each other, providing a simple inheritance model. The ability to declare complex element types provides a means to explicitly state the relationships between elements. A type hierarchy clearly defines a related family of elements, and supports reuse.

Equivalence classes, the second feature of Thompson's presentation, allow the schema author to declare the equivalence of elements based on their use in particular contexts. In contrast, type declarations define equivalence based on structure or content. In HTML, for example, an ordered list (<ol>) and an image (<img>) both have very different content (i.e., they are not of the same type), but are equivalent because they may both be used within a paragraph element (<p>).

While these features overlap to a certain extent, when combined they provide a rich set of functionality, in which the semantic relationships between elements are transparent.

The Future

In a brief question and answer session following the presentation, Thompson provided a few hints regarding future work on the Schema specification. He explained that the naming of "equivalence classes" would be altered in a forthcoming draft of the XML Schema Structures specification to clarify some ambiguities implied by the current terminology.

Thompson also explained that XML Schemas 1.0 would not include multiple inheritance, as the Working Group is keen to produce a strong design for a single inheritance model first. However, Thompson did not rule out that multiple inheritance could feature in a later version, and that a model similar to Java (single inheritance supplemented by interfaces) could still be considered at that time. The emphasis is clearly on getting the first version of XML Schemas complete.