TREX Basics
by J. David Eisenberg
|
Pages: 1, 2
Attributes
XML elements can have attributes, and TREX allows you to specify
them in great detail. A news story, like HTML, can include an
<img> tag which has a required src and
optional align, width, and
height attributes. The alignment can have only three
possible values, so we specify them explicitly with the
<string> element.
The width and height must be positive integers. Since TREX doesn't have any default type system, the current implementation of TREX reaches out to XML Schema and uses its type system. That means we need to specify a namespace when we write the pattern for an image element.
<element name="img" xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
<attribute name="src">
<anyString/>
</attribute>
<optional>
<attribute name="align">
<choice>
<string>left</string>
<string>center</string>
<string>right</string>
</choice>
</attribute>
</optional>
<optional>
<attribute name="width">
<data type="xsd:positiveInteger"/>
</attribute>
</optional>
<optional>
<attribute name="height">
<data type="xsd:positiveInteger"/>
</attribute>
</optional>
</element>
Notice that <optional> can be used with
<attribute> to specify an optional attribute, just
as it is used with <element> to specify an optional
element. This uniform treatment of attributes and elements gives TREX
the power to express complex grammars with a compact vocabulary. (For
all the details, check out the a complete
TREX pattern that uses attributes and the XML News
Story that uses an image.) In the TREX file, the
xmlns:xsd specification has been placed in the outermost
<grammar> tag so that it's available throughout the
file.
Just as it was possible to create a reusable element
specification, it's possible to create a set of attributes that can be
reused by many tags. For example, both table body
(<tbody>) and table header
(<th>) elements have identical attributes for
determining their horizontal and vertical alignment. This makes those
attributes a perfect candidate for a definition,
<define name="alignment">
<optional>
<attribute name="align">
<choice>
<string>left</string>
<string>center</string>
<string>right</string>
</choice>
</attribute>
</optional>
</define>
which may be used in different elements:
<element name="tbody">
<ref name="alignment"/>
</element>
<element name="th">
<anyString/>
<ref name="alignment"/>
<optional>
<attribute name="rowspan">
<data type="xsd:positiveInteger">
</attribute>
</optional>
<optional>
<attribute name="colspan">
<data type="xsd:positiveInteger">
</attribute>
</optional>
</element>
Note that the <th> tag has attributes in
addition to those included via the reference to the definition.
Merging Grammars
As with DTDs, TREX lets you write a grammar in one file and include
it in another file. We could take the block_item
definition that we wrote earlier and put it in a file named
block_spec.trex:
<grammar>
<define name="block_item">
<zeroOrMore>
<choice>
<element name="p"><anyString /></element>
<ref name="unordered_list"/>
<ref name="location_element"/>
<ref name="image_element"/>
</choice>
</zeroOrMore>
</define>
</grammar>
Our main TREX file would use it like
<grammar>
<include href="block_spec.trex"/>
<start>
<!-- remainder of specification -->
Now let's get a bit more advanced. Let's say that we want to use
this block element specification for both XML News Story and XHTML
verification. The problem is that news stories have
<location> and <copyrite> block
elements and XHTML doesn't; XHTML has a
<blockquote> element, but news stories don't. So,
we'll modify our include file as follows.
<grammar>
<define name="block_item">
<zeroOrMore>
<choice>
<element name="p"><anyString /></element>
<ref name="unordered_list"/>
<ref name="image_element"/>
<ref name="custom_elements"/>
</choice>
</zeroOrMore>
</define>
<define name="custom_elements"/>
<notAllowed/>
</define>
</grammar>
The <notAllowed/> is a pattern placeholder that
can never match anything. This would be a problem if the include file
were to be used by itself, but our XMLNews-Story TREX pattern will
replace the no-op pattern with this pattern:
<define name="custom_elements" combine="replace">
<choice>
<element name="location">
<mixed>
<zeroOrMore>
<choice>
<element name="city"><anyString/></element>
<element name="state"><anyString/></element>
<element name="region"><anyString/></element>
</choice>
</zeroOrMore>
</mixed>
</element>
<element name="copyrite">
<anyString/>
</element>
</choice>
</define>
A TREX pattern to validate a subset of XHTML would replace it as follows.
<define name="custom_elements" combine="replace">
<element name="blockquote">
<mixed>
<ref name="block_item"/>
</mixed>
</element>
</define>
Please consult the include file, the TREX pattern for a subset of XMLNews-Story, a sample XMLNews story, the TREX pattern for a subset of XHTML, and a sample XHTML file. for more details.
This include-and-override capability lets you develop a set of core
patterns that can easily be modified for validating a wide variety of
markup languages. Other options for the combine
attribute are choice and group, which let
you add to a definition without entirely replacing it.
Another advanced feature of TREX is the <concur>
element, which lets you verify that your XML satisfies all of a number
of patterns.
Summary
TREX is a powerful markup language that permits you to specify how other XML documents are to be validated. As with other specification languages, you can
- specify an element with an ordered sequence of sub-elements;
- specify an element with a choice of sub-elements;
- permit mixed content (text outside of tags); and
- specify attributes for tags.
Advanced features of TREX allow you to combine externally-defined grammars in highly sophisticated ways. For more information, consult James Clark's extensive TREX tutorial or the formal specification.