XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

TREX Basics
by J. David Eisenberg | Pages: 1, 2

Attributes

XML elements can have attributes, and TREX allows you to specify them in great detail. A news story, like HTML, can include an <img> tag which has a required src and optional align, width, and height attributes. The alignment can have only three possible values, so we specify them explicitly with the <string> element.

The width and height must be positive integers. Since TREX doesn't have any default type system, the current implementation of TREX reaches out to XML Schema and uses its type system. That means we need to specify a namespace when we write the pattern for an image element.

<element name="img" xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
    <attribute name="src">
       <anyString/>
    </attribute>
    <optional>
       <attribute name="align">
          <choice>
             <string>left</string>
             <string>center</string>
             <string>right</string>
          </choice>
       </attribute>
    </optional>
    <optional>
       <attribute name="width">
          <data type="xsd:positiveInteger"/>
       </attribute>
    </optional>
    <optional>
       <attribute name="height">
          <data type="xsd:positiveInteger"/>
       </attribute>
    </optional>
</element>

Notice that <optional> can be used with <attribute> to specify an optional attribute, just as it is used with <element> to specify an optional element. This uniform treatment of attributes and elements gives TREX the power to express complex grammars with a compact vocabulary. (For all the details, check out the a complete TREX pattern that uses attributes and the XML News Story that uses an image.) In the TREX file, the xmlns:xsd specification has been placed in the outermost <grammar> tag so that it's available throughout the file.

Just as it was possible to create a reusable element specification, it's possible to create a set of attributes that can be reused by many tags. For example, both table body (<tbody>) and table header (<th>) elements have identical attributes for determining their horizontal and vertical alignment. This makes those attributes a perfect candidate for a definition,

<define name="alignment">
    <optional>
        <attribute name="align">
            <choice>
            <string>left</string>
            <string>center</string>
            <string>right</string>
            </choice>
        </attribute>
    </optional>
</define>

which may be used in different elements:

<element name="tbody">
<ref name="alignment"/>
</element>

<element name="th">
    <anyString/>
    <ref name="alignment"/>
    <optional>
        <attribute name="rowspan">
           <data type="xsd:positiveInteger">
        </attribute>
    </optional>
    
    <optional>
        <attribute name="colspan">
           <data type="xsd:positiveInteger">
        </attribute>
    </optional>

</element>

Note that the <th> tag has attributes in addition to those included via the reference to the definition.

Merging Grammars

As with DTDs, TREX lets you write a grammar in one file and include it in another file. We could take the block_item definition that we wrote earlier and put it in a file named block_spec.trex:

<grammar>
    <define name="block_item">
        <zeroOrMore>
            <choice>
                <element name="p"><anyString /></element>
                <ref name="unordered_list"/>
                <ref name="location_element"/>
                <ref name="image_element"/>
            </choice>
        </zeroOrMore>
    </define>
</grammar>

Our main TREX file would use it like

<grammar>
   <include href="block_spec.trex"/>
   <start>
      <!-- remainder of specification -->

Now let's get a bit more advanced. Let's say that we want to use this block element specification for both XML News Story and XHTML verification. The problem is that news stories have <location> and <copyrite> block elements and XHTML doesn't; XHTML has a <blockquote> element, but news stories don't. So, we'll modify our include file as follows.

<grammar>
    <define name="block_item">
        <zeroOrMore>
            <choice>
                <element name="p"><anyString /></element>
                <ref name="unordered_list"/>
                <ref name="image_element"/>
                <ref name="custom_elements"/>
            </choice>
        </zeroOrMore>
    </define>
    
    <define name="custom_elements"/>
       <notAllowed/>
    </define>
</grammar>

The <notAllowed/> is a pattern placeholder that can never match anything. This would be a problem if the include file were to be used by itself, but our XMLNews-Story TREX pattern will replace the no-op pattern with this pattern:

<define name="custom_elements" combine="replace">
    <choice>
        <element name="location">
            <mixed>
                <zeroOrMore>
                <choice>
                    <element name="city"><anyString/></element>
                    <element name="state"><anyString/></element>
                    <element name="region"><anyString/></element>
                </choice>
                </zeroOrMore>
            </mixed>
        </element>
        <element name="copyrite">
            <anyString/>
        </element>
    </choice>
</define>

A TREX pattern to validate a subset of XHTML would replace it as follows.

<define name="custom_elements" combine="replace">
    <element name="blockquote">
        <mixed>
            <ref name="block_item"/>
        </mixed>
    </element>
</define>

Please consult the include file, the TREX pattern for a subset of XMLNews-Story, a sample XMLNews story, the TREX pattern for a subset of XHTML, and a sample XHTML file. for more details.

This include-and-override capability lets you develop a set of core patterns that can easily be modified for validating a wide variety of markup languages. Other options for the combine attribute are choice and group, which let you add to a definition without entirely replacing it.

Another advanced feature of TREX is the <concur> element, which lets you verify that your XML satisfies all of a number of patterns.

Summary

TREX is a powerful markup language that permits you to specify how other XML documents are to be validated. As with other specification languages, you can

  • specify an element with an ordered sequence of sub-elements;
  • specify an element with a choice of sub-elements;
  • permit mixed content (text outside of tags); and
  • specify attributes for tags.

Advanced features of TREX allow you to combine externally-defined grammars in highly sophisticated ways. For more information, consult James Clark's extensive TREX tutorial or the formal specification.