XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Learning to RELAX

October 16, 2000

Table of Contents

A RELAX document's outer parts
The Basics: Specifying Elements
Re-using Specifications
Context-sensitive Elements
Controlling Element Models with Attributes


In this article, we'll explore some of the more advanced features of the RELAX schema language by using it to create a schema for the XMLNews-Story Markup Language. Although the XMLNews-Story markup language has been superseded by the News Industry Text Format, I've chosen it because it's simple, quite widely used, looks a great deal like HTML, and its RELAX specification will use most of the features we want to focus on.

A RELAX Document's Outer Parts

A RELAX document is enclosed within a <module> element. The opening tag looks like


The module tag is followed by the <interface> element which specifies the root element of the document that this RELAX schema is intended to validate. In the case of a news story, this is

      <export label="nitf"/>

If you have two types of documents that are the same except for their root element, you can have one RELAX document that will validate either kind of document.

      <export label="account-receivable"/>
      <export label="account-payable"/>

After the interface is specified, you specify the types of elements to be validated.

The Basics: Specifying Elements

Every element in the target markup language is described by an <elementRule>, which describes its content, and a corresponding <tag>. Each <elementRule> plays a role in the document's structure and has a label to which other elements can refer.

Empty Elements

An empty element, such as <br/>, is specified thus:

   <elementRule role="br" label="br">
   <tag name="br"/>

If the label is omitted, it is presumed to be the same as the role.

Sub-Elements and Element Types

A news story has a <byline>, which in turn includes a <bytag> element that declares who wrote the story. The <bytag> element consists of string data. RELAX specifies this as follows.

   <elementRule role="byline">
      <ref label="bytag"/>
   <tag name="byline"/>
  <elementRule role="bytag" type="string"/>
  <tag name="bytag"/>

The type attribute specifies an element's datatype. The valid values for a datatype include those taken from the XML Document Type Definition, such as NMTOKEN and ID, as well as those introduced by XML Schema Part 2: Datatypes, such as string, float, nonPositiveInteger, etc.

Multiple Sub-Elements

As in HTML documents, an <nitf> news story element contains <head> and <body> elements respectively. In a news story, they are both required.

   <elementRule role="nitf" label="nitf">
         <ref label="head"/>
         <ref label="body"/>

On the other hand, some elements may contain sub-elements in any order. News stories, like HTML, can have tables, and their table rows (<tr>) can contain cells (either <td> or <th>) in any order:

   <elementRule role="tr">
      <choice occurs="*">
         <ref label="td"/>
         <ref label="th"/>

The occurs attribute can be used with a <choice>, <sequence>, or <ref> tag. It has three possible values.

  *  occurs zero or more times
  +   occurs one or more times
  ?   occurs zero or one times

You write a news story description list (<dl>), containin an optional list header, followed by one or more optional description titles, and required descriptive data entries thus:

   <elementRule role="dl">
         <ref label="lh" occurs="?"/>
         <sequence occurs="+">
            <ref label="dt" occurs="?"/>
            <ref label="dd"/>

Mixed Content

Some information requires text that isn't between tags. For example, a news story <location> can look like

   The movie was filmed in
    <city>Pekin</city>, a
    small city in

The bold red text above is inside the <location> tag, but it isn't part of any sub-element. That makes <location> a mixed element, specified (in part) like

  <elementRule role="location">
        <choice occurs="*">
           <ref label="city"/>
           <ref label="state"/>
           <ref label="region"/>
           <!-- etc. -->

Re-using Specifications

If you look at the specification for a news story, you'll find that a paragraph (<p>) is a mixed element that can contain, among others, <br>, <person>, <location>, and quoted phrase (<q>) sub-elements.

Likewise a quoted phrase can contain exactly the same sub-elements. Rather than specify the common sub-elements twice, RELAX allows you to specify a hedge rule:

   <hedgeRule label="common.elements">
         <ref label="br"/>
         <ref label="person"/>
         <ref label="location"/>
         <ref label="q"/>

Once a hedge rule is established, other specifications can refer to it by using the <hedgeRef> tag.

   <elementRule role="p">
         <hedgeRef label="common.elements" occurs="*"/>

   <elementRule role="q">
         <hedgeRef label="common.elements" occurs="*"/>

Note that the <hedgeRule> can only describe elements; it cannot be <mixed>. That's why each <elementRule> has its own <mixed> tag in the example above.

You may have noticed that the specification says that a quoted phrase can contain another quoted phrase. While this may be unusual in a news story, it's technically possible, and RELAX is not bothered by this at all.

Of course, XML does not consist of elements alone, as we'll see in the next section.

Pages: 1, 2

Next Pagearrow