September 26, 2001
Q: I need help with two DTD questions...
I am having trouble with a DTD not enforcing some rules I'm trying to create.
Rule 1: Enforce a range of occurrences of one element inside another
For example, a
fruit_basket element must contain between 9 and 11
banana elements. I think the following works:
<!ELEMENT fruit_basket (
(banana, banana, banana, banana, banana, banana, banana, banana, banana) |
(banana, banana, banana, banana, banana, banana, banana, banana, banana, banana) |
(banana, banana, banana, banana, banana, banana, banana, banana, banana, banana, banana)>
Is there a better way to do this?
A: No. Frustrating, isn't it?
What you're working with here is called a content model for (in this case) the
fruit_basket element type. This is a very simple example; constructing a
content model is even worse for, say, a hypothetical
month element in even a
simple calendar application: some months may legally contain 31 days, some 30, and
either 28 or 29, depending on the year.
As you probably know (or can guess from the name), a content model specifies what child elements, their sequence, and how many of each a given element may contain. It's the "how many" specification which is giving you fits here. The only shortcuts available are the following special characters, which may be appended to a child element name in the content model:
|(none)||This child must occur only once|
||This child may occur one or more times|
||This child may occur once, or not at all|
||Any number of occurrences of this child is legitimate (the "0 or more" option)|
For instance, you can require that a
fruit_basket element must have at least
banana element like this:
<!ELEMENT fruit_basket (banana+)>
This limitation of DTD content models is one which XML Schema is designed to fix.
have space to provide details of that spec here, but, in general, an element type's
model is built by declaring that element type with an
children of this element include various
xsd:element elements, each of which
may have a
minOccurs and a
maxOccurs attribute. The values of
these attributes are integers, representing respectively the minimum and maximum number
times which that child element type may appear within that parent. The default value
both is 1, which is consistent with DTD syntax.
Thus, a simple XML Schema declaration of the
fruit_basket element type, with
your desired number of
banana children, might look like
<xsd:element name="banana" minOccurs="9" maxOccurs="11"/>
Using XML Schema may not solve all your problems: the spec is still so new that it's not as widely supported as DTDs. But it at least gets you in the right ballpark.
Rule 2: Enforce a numbering scheme on child elements to distinguish them from their siblings
Example: Suppose a
fruit_basket contains three
banana should be numbered, 1 through 3; this can be done as an attribute or
an element. But I don't know how to do this using either method. If there were only
fruit_basket, I could use ID-type attributes. But those IDs need to be unique
over the entire document, and my document will contain multiple
What if each needs to have a "banana #1"?
A: The answer to this question is the same as the answer to your first: you're asking DTDs to do something they can't do.
What you're after here is some way to constrain the document's content, not its structure.
DTDs absolutely cannot constrain an element's text (#PCDATA) content. (The XML spec
loosely constrains that content: it must fall within certain specified ranges of Unicode
values, and it may not include unescaped markup-significant characters like
&.) That leaves you with the "constrain via
attribute values" approach.
You can approximate, yet still be frustratingly far away from, an answer using an
declaration for the
banana element which restricts the attribute to values 1,
2, or 3. For instance, your DTD might look something like this:
<!ELEMENT fruit_basket (banana*)>
<!ELEMENT banana EMPTY>
<!ATTLIST banana banana_number (1 | 2 | 3) "1" >
Again, though, this isn't a complete (or even very satisfying) solution:
Also in XML Q&A
- You still can't limit the number of
bananas in a
fruit_basketin a useful way.
- You can't guarantee that there's a relationship between the actual number of
bananachildren and their
banana_numberattribute values. (This DTD allows you to have 25
bananachildren in a
fruit_basket, for instance -- each with a
banana_numberwhose value is 1.)
The kinds of problems you're struggling to solve here might be amenable to using XML Schema. But there's another, often overlooked approach to validating document content (of both elements and attributes) which stands completely outside the normal DTD-vs.-XML Schema axis: validate with an XSLT stylesheet.
But I don't want to minimize how much work may be involved, especially if you aren't
already comfortable with XSLT. Still, here's a stylesheet which tests for both the
banana elements and the correspondence between the
banana_number attribute's value and that
banana element's ordinal
position within its
<!-- Process fruit_basket element(s) -->
<!-- Validate number of banana children in fruit_basket -->
<!-- Note escaped form of boolean > and < operators -->
<xsl:when test="count(banana) > 8 and count(banana) < 12">
<h3># of banana children OK</h3>
<h3>Whoops! # of banana children is <xsl:value-of select="count(banana)"/></h3>
<!-- Set up table of info about banana children -->
<!-- Process all banana children of fruit_basket -->
<!-- Process banana element(s) -->
<!-- Each banana element goes in its own table row -->
<!-- Test for banana's position matching banana_number attribute value-->
<xsl:when test="position() = @banana_number">
<strong>Whoops!</strong>... <xsl:value-of select="@banana_number"/>
This stylesheet "transforms" the source document into an XHTML document, displaying
result of the validation process. (If your XSLT processor supports it, you can use
xsl:message element to notify the source document's author of the document's
validity, instead of transforming to XHTML.)
Assume the following simple document:
With this document as its source tree, the style sheet produces XHTML which looks like the figure at right when viewed in a browser.
Note that this approach to validation is codified in the Schematron project. It's an extremely powerful (and cool) way to perform almost any "validation" you can think of, without the limitations of either DTDs or XML Schema.