Menu

Valid Frustrations

September 26, 2001

John E. Simpson

Q: I need help with two DTD questions...

I am having trouble with a DTD not enforcing some rules I'm trying to create.

Rule 1: Enforce a range of occurrences of one element inside another

For example, a fruit_basket element must contain between 9 and 11 banana elements. I think the following works:

<!ELEMENT fruit_basket ( 
   (banana, banana, banana, banana, banana, banana, banana, banana, banana) | 
   (banana, banana, banana, banana, banana, banana, banana, banana, banana, banana) | 
   (banana, banana, banana, banana, banana, banana, banana, banana, banana, banana, banana)>

Is there a better way to do this?

A: No. Frustrating, isn't it? 

What you're working with here is called a content model for (in this case) the fruit_basket element type. This is a very simple example; constructing a content model is even worse for, say, a hypothetical month element in even a simple calendar application: some months may legally contain 31 days, some 30, and one either 28 or 29, depending on the year.

As you probably know (or can guess from the name), a content model specifies what child elements, their sequence, and how many of each a given element may contain. It's the "how many" specification which is giving you fits here. The only shortcuts available are the following special characters, which may be appended to a child element name in the content model:

Character Meaning
(none) This child must occur only once
+ This child may occur one or more times
? This child may occur once, or not at all
* Any number of occurrences of this child is legitimate (the "0 or more" option)

For instance, you can require that a fruit_basket element must have at least one banana element like this:

<!ELEMENT fruit_basket (banana+)>

This limitation of DTD content models is one which XML Schema is designed to fix. I don't have space to provide details of that spec here, but, in general, an element type's content model is built by declaring that element type with an xsd:complexType element; children of this element include various xsd:element elements, each of which may have a minOccurs and a maxOccurs attribute. The values of these attributes are integers, representing respectively the minimum and maximum number of times which that child element type may appear within that parent. The default value for both is 1, which is consistent with DTD syntax.

Thus, a simple XML Schema declaration of the fruit_basket element type, with your desired number of banana children, might look like

<xsd:complexType name="fruit_basket">
   <xsd:element name="banana" minOccurs="9" maxOccurs="11"/>
</xsd:complexType>

Using XML Schema may not solve all your problems: the spec is still so new that it's not as widely supported as DTDs. But it at least gets you in the right ballpark. 

Rule 2: Enforce a numbering scheme on child elements to distinguish them from their siblings

Example: Suppose a fruit_basket contains three bananas. Each banana should be numbered, 1 through 3; this can be done as an attribute or an element. But I don't know how to do this using either method. If there were only one fruit_basket, I could use ID-type attributes. But those IDs need to be unique over the entire document, and my document will contain multiple fruit_baskets. What if each needs to have a "banana #1"?

A: The answer to this question is the same as the answer to your first: you're asking DTDs to do something they can't do.

What you're after here is some way to constrain the document's content, not its structure. DTDs absolutely cannot constrain an element's text (#PCDATA) content. (The XML spec itself loosely constrains that content: it must fall within certain specified ranges of Unicode values, and it may not include unescaped markup-significant characters like < and &.) That leaves you with the "constrain via attribute values" approach.

You can approximate, yet still be frustratingly far away from, an answer using an ATTLIST declaration for the banana element which restricts the attribute to values 1, 2, or 3. For instance, your DTD might look something like this:

<!ELEMENT fruit_basket (banana*)>
<!ELEMENT banana EMPTY>
<!ATTLIST banana banana_number (1 | 2 | 3) "1" >

Again, though, this isn't a complete (or even very satisfying) solution:

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

  • You still can't limit the number of bananas in a fruit_basket in a useful way.
  • You can't guarantee that there's a relationship between the actual number of banana children and their banana_number attribute values. (This DTD allows you to have 25 banana children in a fruit_basket, for instance -- each with a banana_number whose value is 1.)

The kinds of problems you're struggling to solve here might be amenable to using XML Schema. But there's another, often overlooked approach to validating document content (of both elements and attributes) which stands completely outside the normal DTD-vs.-XML Schema axis: validate with an XSLT stylesheet.

But I don't want to minimize how much work may be involved, especially if you aren't already comfortable with XSLT. Still, here's a stylesheet which tests for both the number of banana elements and the correspondence between the banana_number attribute's value and that banana element's ordinal position within its fruit_basket parent:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns="http://www.w3.org/1999/xhtml" version="1.0">

   <!-- Process fruit_basket element(s) -->
   <xsl:template match="fruit_basket">
   <html>
      <body>

         <!-- Validate number of banana children in fruit_basket -->
         <xsl:choose>
            <!-- Note escaped form of boolean > and < operators -->
            <xsl:when test="count(banana) &gt; 8 and count(banana) &lt; 12">
               <h3># of banana children OK</h3>
            </xsl:when>
            <xsl:otherwise>
               <h3>Whoops! # of banana children is <xsl:value-of select="count(banana)"/></h3>
            </xsl:otherwise>
         </xsl:choose>

         <!-- Set up table of info about banana children -->
         <table border="1">
            <tr>
               <th>banana #</th>
               <th>banana_number</th>
            </tr>
            <!-- Process all banana children of fruit_basket -->
            <xsl:apply-templates select="banana"/>
         </table>

      </body>
   </html>
   </xsl:template>

   <!-- Process banana element(s) -->
   <xsl:template match="banana">
      <!-- Each banana element goes in its own table row -->
      <tr>
         <th><xsl:value-of select="position()"/></th>
         <td>
            <!-- Test for banana's position matching banana_number attribute value-->
            <xsl:choose>
               <xsl:when test="position() = @banana_number">
                  OK
               </xsl:when>
               <xsl:otherwise>
                  <strong>Whoops!</strong>... <xsl:value-of select="@banana_number"/>
               </xsl:otherwise>
9557xnbo             </xsl:choose>
         </td>
      </tr>
   </xsl:template>

</xsl:stylesheet>

This stylesheet "transforms" the source document into an XHTML document, displaying the result of the validation process. (If your XSLT processor supports it, you can use the xsl:message element to notify the source document's author of the document's validity, instead of transforming to XHTML.)

Assume the following simple document:

<fruit_basket>
   <banana banana_number="8"/>
   <banana banana_number="2"/>
</fruit_basket>

fruit_basket validation results With this document as its source tree, the style sheet produces XHTML which looks like the figure at right when viewed in a browser.

Note that this approach to validation is codified in the Schematron project. It's an extremely powerful (and cool) way to perform almost any "validation" you can think of, without the limitations of either DTDs or XML Schema.